arXiv:1309.0276v2 [cs.NI] 3 Sep 2013 oyih 0XAMXXXXX-/XX ...$15.00. X-XXXXX-XX-X/XX/XX ACM 20XX re Copyright lists, fee. cop to To a redistribute and/or to page. or first permission servers the on on post citation to an full republish, advantage thi the commercial and th of or notice provided part this profit fee bear for or without distributed all or granted of made is not copies use hard classroom or or personal digital make to Permission fail- handshake those Consequently, peers. other to tempts ntn rmape nodrto order in peer a or from communication the BitTorrent inating address inspects To ef- project the this tools. reduces issues, classification handshakes Similarly network failed of on fectiveness attacks. DPI net- network of and inadequacy profile researchers the to security administrators for harder work it makes detectors traffic for (DPI) inspection deep-packet labeling. makes use also to connections impossible such ab- it in handshake The information failed payload of attempts. of number sence port-scanning a resemble in which impos- results or attempts This difficult reach. peers to these sible of number significant nu a large makes a to connections of initiate ber to ap- tries P2P many a In plications communication. peer-to-peer time. is of port-scans span number large short a considerably probes a attacker in port-scans an hosts for that of fact methods the Detection for account damage. early eliminat ensuing or attack phe- reducing the for an ing This measures protective identify port-scan take port. to to a enough open administrators Detecting an network “port-scanning”. finding allows called of is goal nomenon the with dresses Goal 1.1 INTRODUCTION 1. accuracy. IDS of loss reduce any co- be without BitTorrent can 80% positives by at the false BitTorrent-related looking behavior of by port-scanning number modelling that and shown traffic is ordination fro It (attempts) connection port-scans. BitTorrent differentiat respective and their hosts detec- BitTorrent ing port-scan of of traffic analyzing accuracy by the tors improve to aims paper This ABSTRACT h as-oiie rgee yPPtaci port-scan in traffic P2P by triggered false-positives The to similar behavior exhibits which traffic of form Another ad- of number a scanning by begin attacks Many r fcaaye o ifrnitn iTrethandshake BitTorrent differentiating for analyzer Traffic peers oee,tepeec fNT n firewalls and NATs of presence the However, . irsf Corporation Microsoft emn,W,USA WA, Redmond, arnRa Khan Riaz Kamran predict alrsfo port-scans from failures t oncinat- connection its urspirspecific prior quires tews,to otherwise, y tcpe are copies at htcopies that d yNtLb NUCES Lab, SysNet saaa,Pakistan Islamabad, okfor work s fa .Syed A. Affan m- ig- m se d - - Padessin addresses IP dniyn h takr cndtcoslo for look by a detectors across attack interest Scan an of against attacker. measures further the pre-emptive identifying is take which to service ability a hosting some port attacker. while a the service by at vulnerable exploited land any them them find of some not of fail, do probes to but the order of in succeed Some machines defined of hosts. number be susceptible a find can on ports It several dis- probing port-scanning. for as is method hosts popular tractable A covering weaknesses. their exploiting fore Problem improvement detector. 1.2 for scan statistics default test- its and the upon developing and based for analyzer are used traffic is the IDS Bro ing accura The classification achieved. traffic are enhancing and false-positives between traffic. differentiating P2P help and can attempts find which scan to ratio order in port/peer port-scanners the of behavior the models project detectors. pre- port-scan are of traffic results P2P influencing as labeled from confidently vented be can which ures iia oteNMagrtm ttigr lr fasource a if alarm triggers It contacts address algorithm. IP NSM by the stealth-scanning to for similar used mal those detects as It such packet-oriented. packets is first formed The detection. scan Padesswti pcfidtime-window. specified a destinatio within distinct addresses fifteen than IP more contacted which cious [ (NSM) Monitor Security onthv xesv nweg ftentoktopology uses network it the Therefore, of configuration. knowledge system extensive and have not do naami triggered. is alarm an al o l te rffi tkestako l connections all threshol dis- particular of of a track number crosses the keeps addresses When destination it tinct establishment. traffic their other attempt of connection pop- all regardless the Bro For if list, only configurable fail. metrics using a port-scan in traffic its For specified ulates services port-scans. the of of indicators one as attempts tion .. Port-scanning 1.2.1 h r D [ IDS Bro The eetn otsaspoie ewr diitaosthe administrators network provides scans port Detecting be- victims potential identify to need attacks cyber Many port-scanning reducing of goals the features these using By this connections, BitTorrent the predicting to addition In h nr D [ IDS Snort The Network of that was literature in algorithm such first The l fteemtosfc ubro iclisi effec- in difficulties of number a face methods these of All h eodmto scneto-retdadis and connection-oriented is method second The . Z 11 seconds. ulso h bevto htscanners that observation the on builds ] unvl,C,USA CA, Sunnyvale, 12 T X szdtm idw[ window time -sized sstodffrn r-rcsosfor pre-processors different two uses ] ydAiKhayam Ali Syed ubro ot across ports of number LMrdInc. PLUMgrid 5 .I agdtoehssa mali- as hosts those flagged It ]. 9 ]. failed Y ubrof number N connec- events d cy N n s - Figure 1: A typical port scan tive port-scan detection. Temporal as well as spatial con- siderations impose limitations in the amount of connections that are tracked. With the explosive growth of Internet Figure 2: Outbound NAT connections many legitimate user activities also resemble behavior simi- lar to port-scans. For example, it is not uncommon for web users to initiate connections to dozens of different websites shows a NAT box receiving incoming data but does not know at once. While such usage is catered for by Bro’s strategy which internal [A, D] should the data be forwarded to. of tracking only failed connections, an issue remains in the Firewalls are devices which inspect traffic passing through case of P2P traffic. them and filter it based on a set of rules. In basic implemen- P2P clients generally contact a large number of hosts in tations unsolicited inbound connections are rejected while a short span of time and unlike other well-known services outbound connections are allowed. use arbitrary ports for their communication [6]. This makes Hosts which are behind NAT and/or firewalls have re- their behavior not only similar to port-scans but also harder stricted connectivity in terms of receiving inbound connec- to classify than other well-known services without the help of tions. In P2P terminology they are called unconnectable deep-packet inspection. An example of the Bro scan detector peers. The ratio of such peers is considerably high, ranging raising false alarm is shown below: from 60% to 90% of all P2P users [2, 3]. When a P2P client needs to contact other peers it uses $ bro -r .pcap scan.bro a number of methods to obtain a list of their addresses. 1317250200.756488 AddressScan 192.168.0.5 has scanned 20 hosts(32833/tcp) The most primitive of these methods is to contact a central 1317250222.395888 ShutdownThresh shutdown threshold reached tracker and request peer lists. Other de-centralized methods for 192.168.0.5 such as distributed hash tables also exist for storing peer 1317250222.425419 AddressScan 192.168.0.5 has scanned 101 hosts (45612/tcp) lists. 1317250288.527706 ScanSummary 192.168.0.5 scanned However, in popular protocols such as BitTorrent the peer a total of 326 hosts lists returned to a client via these methods usually contains 1317250288.527706 PortScanSummary 192.168.0.5 scanned a total of 224 ports information about other peers regardless of their connectiv- ity status. As a result, unconnectable peers are often present in these lists. 1.2.2 P2P Handshake Failures Unconnectable peers result in a large number of failed con- Network Address Translator (NAT) are devices that allow nection attempts which can negatively influence the false- multiple hosts on a private network to communicate with the positive rate of scan detectors such as Bro. The similarity Internet using a single public IP. This is achieved by modify- of connectivity behavior of P2P hosts and a scanner is shown ing network layer information on-the-fly. When an internal in Figure 4 host initiates a connection to an outsider a dynamic map- ping is created which allows the response to be forwarded 1.3 Solution to that particular host as shown in Figure 2. Unsolicited in- bound connections are dropped since NAT simply does not 1.3.1 BitTorrent Communication Analysis know which internal host should receive the data. Figure 3 BitTorrent is the most popular P2P file-sharing protocol, Figure 3: Inbound NAT connections

Figure 4: Connections of a P2P host and a port accounting for more than half of all P2P file sharing traffic scanner [10]. Files shared via BitTorrent are divided into equal-sized chunks. Peers download these pieces from each other while uploading them simultaneously to other peers as well. All tervals about the list of active peers they know about [13]. the peers which are sharing a particular file at a time are There is no official specification for the proto- collectively called a swarm. col in BitTorrent. The de facto standard for Peer Exchange Before peers start to download a file however, they need to is µTorrent PEX. coordinate first in order to locate each other. The BitTorrent All these methods of peer discovery can be interpreted by coordination lines in Figure 5 take place before the actual a traffic analyzer, with the exception of trackers that serve traffic and hence can be used to predict the successful and on HTTPS. By reading the peer lists returned via HTTP failed connections originating from a peer. tracker, DHT or PEX it is possible to “predict” the BitTor- The earliest method for BitTorrent coordination was via rent connection attempts that are going to originate from centralized trackers. Centralized trackers keep track of which a given source IP. This shall in turn make it possible to la- peers are downloading a file. When a new peer wants to bel failed connection attempts and differentiate them from join the swarm it requests a peer list from the tracker. The port-scans. tracker responds with a list of k randomly picked peers, con- taining both connectable and unconnectable hosts. Con- 2. IMPLEMENTATION nection attempts to unconnectable hosts then subsequently results in failed TCP connections [8]. 2.1 Bro IDS In order to alleviate the load on trackers as well as increase resilience against take-down attempts two de-centralized meth- The Bro IDS was used to develop and test the traffic ods exist for obtaining peer lists: Distributed analyze along with its scan detector for providing baseline (DHT) and Peer Exchange (PEX). statistics. During the course of development Bro’s core was A DHT is a distributed database overlayed over a network modified to provide the following helper functions: of computers called nodes. All nodes can store (key,value) • raw_ns_bytes_to_uint16 pairs on the DHT and search for values via key-lookups. In Listing 1: order to join a DHT a peer has to first contact a bootstrap Interpret the first two bytes of a string as a 2-byte in that DHT. Currently, the BitTorrent community unsigned integer in network-byte order. uses two major DHTs: the Azureus DHT and the Main- • Listing 2: raw_nl_bytes_to_uint32 line DHT. The Azureus DHT is in use only by the Azureus Interpret the first four bytes of a string as a 4-byte clients while the Mainline DHT is used by other clients such unsigned integer in network-byte order. as µTorrent, BitComent and Mainline. Both DHTs are im- plementations of the protocol [1]. • Listing 3: sub_bytes_sane The second de-centralized method of peer discovery is Bro’s default sub_bytes function had the idiosyncratic PEX. In PEX peers “gossip” with each other in regular in- behavior in that it if the indices were greater than Figure 7: Central tracker

ets to extract peer lists whenever a specific regular expres- sion is matched in the packet. This regular expression con- firms the presence of peer list in the benc format that used by BitTorrent. Any peers that are found in the peer lists are added to a global peer_mappings table which is indexed by source IPs and yields the (IP, Port) tuple for targets extracted from the peer list. Figure 5: BitTorrent coordination For example, if client X gets a peer list in which a peer A is listed as listening on port 6668, (A, 6668) is added as a possible target for X. Listing 6 shows the code used by the HTTP analyzer to process each TCP packet. 2.2.2 UDP Tracker The BitTorrent ecosystem is moving on to UDP track- ers because of less overhead compared to HTTP trackers. Many major tracker websites such as and have already switched to UDP trackers. Bro did not have a built-in analyzer for the UDP tracker commu- nication so we analyzed UDP traffic for packets of particu- lar lengths and matched the action fields of the packet for Figure 6: Various methods for BitTorrent coordina- annount_response. Once the UDP tracker communication tion was identified, announce_response helper function is called which in turn extracts the peer list and stores it in a global peer_mapping table. 0 it decremented them by 1 before slicing the string. Listing 7 shows the processing of each UDP packet by the This function works around this behavior by retaining UDP tracker analyzer. uniformity of indices. 2.3 Distributed Hash Tables • Listing 4: raw_byte Returns the raw byte at a particular index in a string 2.3.1 Azureus DHT as an unsigned integer. Table 3 and Table 4 show the packet structures for head- ers of ADHT packets. ADHT requests start with a random • Listing 5: raw_bytes connection ID and do not have a fixed length. To identify Returns the raw bytes in a string as a vector of un- these requests, the 16th byte of each UDP packet is checked signed integers. for a valid ADHT protocol version. Based on the found pro- tocol version, various offsets are added to a running counter 2.2 Central Trackers corresponding to different fields. A central tracker keeps track of peers that are currently Eventually, the originating port of the packet should ap- downloading a file and transfers this list to new peers that pear in network-byte order at a specific offset. If that check are trying to join the swarm. does succeed, the packet is a valid ADHT request and bytes There are two types of central tracker used by BitTorrent [8, 12] are checked for ACTION field of the request. clients. Upon finding a valid ACTION, either a helper find_*_request function is called which in turn extract the TRANSACTION_ID 2.2.1 HTTP Tracker and store it in a global table. A built-in analyzer was available in Bro for the HTTP Similarly, all packets are checked for valid TRANSACTION_IDs tracker communication. However, using the peer lists ex- from requests that have already been seen. If one is found, tracted from this analyzer did not yield any connection hits. a helper find_*_response function is called which extracts Therefore we manually analyzed the contents of TCP pack- the peer list out of the DHT values. Figure 8: Peer Exchange

Listing 8 shows the code used by the ADHT analyzer to process each UDP packet. Figure 9: A horizontal scan 2.3.2 Mainline DHT For extracting Mainline DHT peer lists regular expres- sions are matched inside UDP packets for“nodes”and“values” responses. Upon matching the regular expressions the num- ber of peers is determined and then each peer’s IP address and port number is extracted in a sequential manner. List- ing 9 shows the code used by the MDHT analyzer to extract peers out of DHT respones. 2.3.3 Other DHTs During the course of development we observed that certain UDP traffic flows between BitTorrent hosts before they ini- tiate TCP connections. Matching this UDP traffic allows us to predict the TCP connections since the destination port remains same on both transport layer protocols. The sig- natures for matching miscellaneous DHT traffic were taken from libprotoident’s database, which is an open-source li- brary for detecting application layer protocols from length of a payload combined with its first four bytes. Listing 10 shows the code used by BTUDP analyzer to Figure 10: A vertical scan match signatures across each UDP packet. Once a BTUDP packet is detected, the target IP and port are added to safe peer_mappings. The second scan type is “vertical” in which the attack probes a single host for multiple ports as shown in Figure 10. 2.4 Peer Exchange The motivation for such a scan lies in attacker targetting a Compatible BitTorrent clients “gossip” with each other specific victim to find any vulnerability he can find. about the clients they know exist in their swarm. This form A hybrid or block scan is a combination of both horizontal of coordination is called Peer Exchange and is shown in Fig- and vertical scans and is shown in Figure 11. ure 8. In contrast, a P2P host contacts a large number of hosts on ports that are fairly random and almost unique since most of 2.4.1 UTPEX the hosts are behind NAT and have set up port-forwarding µTorrent PEX or UTPEX is the de facto standard of Peer on non-standard ports. This is shown in Figure 12 Exchange for BitTorrent clients. The packets are searched To use these observations, Port/Peer Ratio P P R of each for regular expressions matching “added” responses which host is kept track of. Ports correspond to unique ports while are sent by a client to its peers existing when it learns about peers correspond to number of unique IP addresses that a new clients in the swarm. Listing 11 shows the code used hose has probed. From the discussion above: by UTPEX analyzer for processing each TCP packet. • If P P R is close to 0, it indicates a horizontal scan. 2.5 Port/Peer Ratio • If P P R is close to 1, it indicates P2P behavior. Port/Peer Ratio is a feature which relies on behavioral dif- ferences between port scanners and BitTorrent downloaders. • If P P R is greater than 1, it indicates a vertical or Before explaining these differences it is important to outline hybrid scan. various categories in which a scanner’s behavior can be clas- Listing 12 shows code modifications to Bro’s scan detector sified. for suppressing alarms when P P R of the attacker lies be- There are three kinds of port scans [7]. The first scan type tween a lower and upper threshold. By default these thresh- is “horizontal” in which the attacker probes multiple hosts olds are specified as 0.75 and 1 respectively. on the same port as shown in Figure 9. The motiviation for such a scan stems from the fact that a known vulnerability would exist on that port. 3. RESULTS Scanners = 51 BitTorrent Hosts = 49 55 5 5 5

50

45 100

40

100 100

35 True Positives True

30 1000

1000

25 Bro Scan Detector Figure 11: A hybrid scan BitTorrent Analyzer 1000 BitTorrent Analyzer (with Port/Peer Ratio) 20 0 10 20 30 40 50 60 False Positives

Figure 13: ROC curves for port scanner perfor- mance

were flagged.

Conclusion Figure 12: Port connectivity behavior of a P2P The original Bro scanner had a curve which bent signifi- client cantly towards the lower-right corner of the graph. Adding the BitTorrent analyzer engine improved the performance of scan detector significantly without any loss of IDS accuracy 3.1 Controlled Experiment since the curves do not intersect at any threshold. At the For this experiment labeled data sets containing port scan- default Bro threshold, i.e., 100, the false positive rate is de- 44 1 ners and BitTorrent hosts were used to gauge the IDS per- creased from 49 = 89% to 49 = 2%. By using the Port/Peer formance. With support of Dr. Ali Khayam we were able to Ratio feature, the false positive rate is further improved to 0 obtain traces for 41 TCP port scanners which were gener- 49 = 0%. ated at various rates for the RAID ’10 paper [4] on impact of P2P on anomaly detection. In addition to the RAID data sets, nmap was used to generate traffic for 10 further 3.2 Live Traffic scanners. The combined traces had 51 port scanners which For live traffic experiments a 400 GB trace was used which scanned at rates varying from 0.01/s to 1000/s. was captured at a Nayatel B-RAS over a duration of 24 In order to process P2P traffic, BitTorrent traces were hours. generated on 49 different virtual hosts. Various BitTor- We first ran BitTorrent Analyzer to figure out which con- rent clients including Azureus, BitComet, , µTorrent, nections could have been “predicted” by looking at the Bit- Transmission and were used to generate the traf- Torrent coordination traffic. Then we calculated the du- fic on both Windows and Linux platforms. For each client ration each BitTorrent host took to contact 100 predicted application the default settings were used which resulted in connections before being flagged as a scanner. So if host unencrypted traffic for all hosts. It was ensured that the A was flagged as a scanner at 9:05 AM and 100 predicted hosts were not infected at the time of traffic generation. BitTorrent connections originated from A between 9:00 and The traffic for both the scanners and BitTorrent hosts 9:05 AM the duration would be 5 minutes. A histogram was were aligned temporally to start at the same time while IP generated with each bin denoting 15 minutes of time. As addresses were rewritten to avoid spatial collisions. The IP shown in Figure 14 most of the BitTorrent hosts generated addresses were reassigned according to the criterion which 100 predicted BitTorrent connections in the first 15 minutes assigned BitTorrent hosts and port scanners separate net- bin before being flagged as a scanner. The results indicate a works for ease of labeling. strong influence of predicted BitTorrent connections on the To compare the performance of scan detector and its im- scan flags. provements ROC curves were used. The results are shown in Bro originally detected 202 scanners in the 400 GB trace. Figure 13. Each point on the graph shows different thresh- By filtering connections through the BitTorrent Analyzer olds for the number of addresses a host can probe within the the number of alarms was reduced to 32 as 170 flags were 15 minute time window before being flagged as a scanner. At suppressed. We manually analyzed the remaining 32 hosts more relaxed thresholds, e.g., 1000, the points concentrated to observe that 26 of them were actual scanners while 6, in the lower-left corner of the graph. Similarly, at stricter or 3% of the original flags, were false alarms that were still thresholds, e.g., 5, the points moved towards the upper-right being raised. These results are summarized in the pie-chart corner of the graph as more scanners and BitTorrent hosts shown in Figure 15. Time taken by hosts to initiate 100 predicted BitTorrent connections before scan flag 120 [4] Haq, I. U., Ali, S., Khan, H., and Khayam, S. A. What is the impact of P2P traffic on anomaly detec- 100 tion? In Proceedings of the 13th international confer- 80 ence on Recent advances in intrusion detection (Berlin,

60 Heidelberg, 2010), RAID’10, Springer-Verlag, pp. 1–17. Number of hosts 40 [5] Heberlein, L. T., Dias, G. V., Levitt, K. N.,

20 Mukherjee, B., Wood, J., and Wolber, D. A monitor. Security and Privacy, IEEE 0 0 15 30 45 60 75 90 105 120 135 150 Time (minutes) Symposium on (1990), 296. [6] Karagiannis, T., Broido, A., Brownlee, N., Claffy, K. C., and Faloutsos, M. Is P2P dying or Figure 14: Histogram depicting duration taken by just hiding? In Proceedings of the GLOBECOM 2004 BitTorrent hosts to initiate 100 predicted connec- Conference (Dallas, Texas, November 2004), IEEE tions before being flagged as a scanner Computer Society Press. [7] Lee, C. B., Roedel, C., and Silenok, E. Detection Hosts originally flagged as scanners: 202 and characterization of port scan attacks, 2003. [8] Liu, Y., and Pan, J. The impact of NAT on BitTorrent-like P2P systems. In Proceedings of 2009 9th IEEE International Conference on Peer-to-Peer Com- Suppressed (170) puting (2009), pp. 242–251. 84% [9] Northcutt, S., and Novak, J. Network Intrusion Detection: An Analyst’s Handbook, 3rd ed. New Riders 3% False Alarms (6) Publishing, Thousand Oaks, CA, USA, 2002.

13% [10] opaque. Internet study 2008/09, 2009.

Scanners (26) [11] Paxson, V. Bro: A system for detecting network in- truders in real-time. Computer Networks 31 (December 1999), 2435–2463. [12] Roesch, M. Snort: Lightweight intrusion detection for networks. In Proceedings of LISA ’99: 13th Systems Administration Conference (1999), USENIX, pp. 229– 238. Figure 15: Analysis of scan flags on the 400 GB live [13] Wu, D., Dhungel, P., Hei, X., Zhang, C., and traffic trace Ross, K. W. Understanding peer exchange in Bit- Torrent systems. In Proceedings of 2010 10th IEEE International Conference on Peer-to-Peer Computing Conclusion (2010), pp. 1–8. We concluded that almost all BitTorrent hosts generate false alarms because of their aggressive connectivity behavior. The BitTorrent Analyzer was able to reduce the number APPENDIX of false alarms significantly, however about 3% of the total flags still generated false positives. A. BRO MODIFICATIONS

1 function raw n s bytes t o uint16%(b: string%): 4. References count %{ 3 uint32 u = 0; [1] Crosby, S. A., and Wallach, D. S. An analysis of BitTorrent’s two Kademlia-based DHTs, 2007. 5 if ( b−>Len() < 2 ) builtin error(”too short a string as input to [2] D’Acunto, L., Meulpolder, M., Rahman, R., r a w bytes t o uint16()”) ; 7 Pouwelse, J., and Sips, H. Modeling and analyz- else ing the effects of firewalls and NATs in P2P swarming 9 { systems. In IEEE IPDPS 2010 (HotP2P 2010) (April const u char∗ bp = b−>Bytes() ; 11 u = (bp[0] << 8) | bp[1]; 2010), pp. 1–8. } 13 [3] D’Acunto, L., Pouwelse, J., and Sips, H. A mea- return new Val(u, TYPE COUNT) ; surement of NAT & firewall characteristics in peer-to- 15 %} peer systems. In Proc. 15-th ASCI Conference (June 2009), Advanced School for Computing and Imaging Listing 1: raw_ns_bytes_to_uint16 helper function for (ASCI), pp. 1–5. Bro Offset Size Name

1 function raw nl bytes t o uint32%(b: string%): 0 32-bit integer action 4 32-bit integer transaction_id count 8 32-bit integer interval %{ 12 32-bit integer leechers 3 uint32 u = 0; 16 32-bit integer seeders 20 + 6 × n 32-bit integer IP address 24 + 6 × n 16-bit integer TCP port 5 if ( b−>Len() < 4 ) builtin error(”too short a string as input to r a w bytes t o uint32()”); Table 1: Packet structure for a UDP tracker an- 7 nounce_response containing n peers else 9 { const u char∗ bp = b−>Bytes() ; 11 u = (bp[0] << 24) | (bp[1] << 16) | (bp[2] << if ( i%6==0 ) 8) | bp[3]; 13 { local peer compact: string = sub bytes sane( } peers compact , i − 6, 6); 13 15 local peer: TrackerPeer; return new Val(u, TYPE COUNT) ; 17 peer$h = raw bytes t o v 4 addr(peer compact ); 15 %} peer$p = raw ns bytes t o uint16(sub bytes sane( peer compact, 4, 2)); 19 Listing 2: raw_nl_bytes_to_uint32 helper function for local orig: addr = ( is orig ? c$id$resp h : c$id$orig h ); Bro 21 if ( orig !in peer mappings ) 23 { peer mappings[orig] = set(); 1 function sub bytes sane%(s: string , start: count, 25 } n: int%): string add peer mappings [ orig ][ peer ]; %{ 27 } 3 BroString ∗ ss = s−>AsString ()−>GetSubstring( 29 i = i + 1; start , n); } 31 } } 5 if ( ! ss ) ss = new BroString(””); 7 Listing 6: tcp_contents event handler for HTTP return new StringVal(ss); analyzer 9 %}

Listing 3: sub_bytes_sane helper function for Bro A.2 UDP Tracker A.2.1 Packet structure 1 function raw byte%(s: string , index: count%): count The following information was retrieved from BitTorrent %{ Enhancement Proposal 15 1. 3 return new Val(s−>Bytes() [index] , TYPE COUNT) ; %} A.2.2 Code

Listing 4: raw_byte helper function for Bro event udp contents(u: connection , is orig: bool, contents: string) 2 { local len: count = byte len(contents); function raw bytes%(s: string%): index vec 4 2 %{ if ( ( len − 20 )%6 != 0 ) VectorVal ∗ result v = 6 { return ; 4 new VectorVal(new VectorType(base type( 8 } TYPE COUNT) ) ) ; 10 if ( sub bytes sane(contents , 0, 4) != ”\ x00\x00 \x00 \01” ) 6 int i; { for ( i =0; i < s−>Len(); ++i ) 12 return ; } 8 { 14 result v −>Assign(i , new Val(s−>Bytes()[ i ], announce response(u, is orig , contents); TYPE COUNT) , 0) ; 16 } 10 }

12 return result v ; Listing 7: udp_contents event handler for UDP %} Tracker analyzer

Listing 5: raw_bytes helper function for Bro A.3 Azureus DHT A.1 HTTP Tracker A.3.1 Packet structures The following information was retrieved from the 1 event tcp contents(c: connection , is orig: bool, seq: 2 count , contents: string) Wiki . { 3 for ( s in find all(contents , /5 \ : peers[0 −9]+\:/) ) { A.3.2 Code 5 local sv: string vec = str split(s, vector(7)); local len: count = to count(sub bytes sane(sv[2], 0, byte len(sv[2]) − 1)); event udp contents(u: connection , is orig: bool, contents: 7 string) local i: count = 0; 2 { 9 local peers compact : string = sub bytes sane(contents , 1 strstr(contents , s) + byte len(s) − 1, len); http://bittorrent.org/beps/bep 0015.html for ( dummy in peers compact ) 2 11 { http://wiki.vuze.com/w/Distributed hash table Name Width Note byte 1 byte Single byte short 2 bytes Big endian int 4 bytes Big endian long 8 bytes Big endian boolean 1 byte False = 0; True = 1 address 7 bytes or 19 bytes First byte indicates length of the IP address (4 for IPv4, 16 for IPv6); next comes the address in network byte order; the last value is port number as short contact 9 bytes or 21 bytes First byte indicates contact type, which must be UDP (1); second byte indicates the contact’s protocol version; the rest is an address

Table 2: Serialization for ADHT values

Name Type Protocol Version Note CONNECTION_ID long always Random number with most significant bit set to 1 ACTION int always Type of the packet TRANSACTION_ID int always Unique number used through the communication; it is randomly generated at the start of the application and increased by 1 with each sent packet PROTOCOL_VERSION byte always Version of protocol used in this packet VENDOR_ID byte ≥VENDOR_ID ID of the DHT implementator; 0 = Azureus, 1 = ShareNet, 255 = unknown NETWORK_ID int ≥NETWORKS ID of the network; 0 = stable version; 1 = CVS version LOCAL_PROTOCOL_VERSION byte ≥FIX_ORIGINATOR Maximum protocol version this node supports; if this packet’s protocol version is

Table 3: Header for ADHT requests

local protocol version: count; { 4 local action: string; 3 for ( s in find all(contents , /6 \ : valuesl6 \ :/) ) { 6 protocol version = raw byte(contents , 16); 5 local offset: count = strstr(contents , s) + 8; if ( protocol version in valid protocol versions ) 8 { 7 for ( c in contents ) local offset: count = 17; { 10 if ( protocol version >= protocol v e r s i o n s [ ”VENDOR ID 9 if ( sub bytes sane(contents , offset , 2) != ”6:” ) ”] ) { { 11 break ; 12 offset += 1; } } 13 offset += 2; 14 if ( protocol version >= protocol v e r s i o n s [ ”NETWORKS”] ) 15 local peer compact: string = sub bytes sane (contents { , offset , 6); 16 offset += 1; local peer: DhtPeer; } 17 18 if ( protocol version >= protocol versions [” peer$h = raw bytes t o v 4 addr(peer compact ) ; FIX ORIGINATOR ”] ) 19 peer$p = raw ns bytes t o uint16(sub bytes sane( { peer compact, 4, 2)); 20 offset += 4; } 21 if ( u$id$orig h !in peer mappings ) 22 { offset += 5; 23 peer mappings [ u$id$orig h] = set(); 24 } local p: count; 25 add peer mappings [ u$id$orig h ][ peer ]; 26 p = raw ns bytes t o uint16(sub bytes sane(contents , offset , 2)); 27 offset += 6; if ( p == port t o count(u$id$orig p ) ) if ( offset > byte len(contents) ) 28 { 29 { action = sub bytes sane(contents , 8, 4); break ; 30 31 } if ( action == ”\ x00\x00 \x04 \x04” ) } 32 { 33 } find node request(u, is orig , contents); } 34 } if ( action == ”\ x00\x00 \x04 \x06” ) 36 { find value request(u, is orig , contents); Listing 9: mdht_find_values helper function for 38 } MDHT analyzer } 40 }

42 if ( sub bytes sane(contents , 4, 4) in transaction ids ) { A.5 Other DHTs 44 local a: DhtAction = transaction ids [sub bytes sane( contents , 4, 4)]; event udp contents(u: connection , is orig: bool, contents: string) 46 action = sub bytes sane(contents , 0, 4); 2 { local matched: bool = F; 48 if ( a == FindNode && action == ”\ x00 \x00\ x04\x05” ) 4 local msb4: string = sub bytes sane(contents , 0, 4); { local len: count = byte len(contents); 50 find node response(u, is orig , contents); 6 } if ( msb4 == /d1:[areqt]/ ) 52 if ( a == FindValue && action == ”\ x00 \x00\x04 \x07” ) 8 { { matched = T; 54 find value response(u, is orig , contents); 10 } } 56 } 12 if ( msb4 == /d1.:/ ) } { 14 matched = T; } Listing 8: udp_contents event handler for ADHT 16 # other checks for signature matching ommitted for analyzer brevity ... 18 if ( !matched ) 20 { A.4 Mainline DHT return ; 22 }

1 function parse values(u: connection , is orig: bool, 24 local peer : BTUDPPeer; contents: string) local orig: addr; Name Type Protocol Version Note ACTION int always Type of the packet TRANSACTION_ID int always Must be equal to TRANSACTION_ID from the request CONNECTION_ID long always Must be equal to CONNECTION_ID from the request PROTOCOL_VERSION byte always Version of protocol used in this packet VENDOR_ID byte ≥VENDOR_ID Same meaning as in the request NETWORK_ID int ≥NETWORKS Same meaning as in the request INSTANCE_ID int always Instance id of the node that replies to the request

Table 4: Header for ADHT replies

26 ! outbound && orig !in Site::neighbor nets && if ( is orig ) 14 ( pn < port peer ratio thresh lower | | pn > 28 { port peer ratio thresh upper) ) orig = u$id$orig h ; { 30 peer$h = u$id$resp h ; 16 shut down thresh reached[orig] = T; peer$p = port t o count(u$id$resp p); local msg = fmt(”shutdown threshold reached for %s”, 32 } orig); else 18 NOTICE([ $note=ShutdownThresh , $src=orig , 34 { $identifier=fmt(”%s”, orig) , orig = u$id$resp h ; 20 $p=service , $msg=msg]) ; 36 peer$h = u$id$orig h ; } peer$p = port t o count(u$id$orig p); 22 38 } # lines omitted for brevity ... 24 40 if ( orig !in peer mappings ) } { 42 peer mappings[orig] = set(); } 44 add peer mappings [ orig ][ peer ]; Listing 12: check_scan alarm suppression for PPR }

Listing 10: udp_contents helper function for BTUDP analyzer

A.6 UTPEX

event tcp contents(c: connection , is orig: bool, seq: count , contents: string) 2 { for ( s in find all(contents , /5 \ : added[0 −9]+\:/) ) 4 { local sv: string vec = str split(s, vector(7)); 6 local len: count = to count(sub bytes sane(sv[2], 0, byte len(sv[2]) − 1));

8 local i: count = 0; local peers compact : string = sub bytes sane(contents , strstr(contents , s) + byte len(s) − 1, len); 10 for ( dummy in peers compact ) { 12 if ( i%6==0 ) { 14 local peer compact: string = sub bytes sane( peers compact , i − 6, 6); local peer: PexPeer; 16 peer$h = raw bytes t o v 4 addr(peer compact ) ; 18 peer$p = raw ns bytes t o uint16(sub bytes sane( peer compact, 4, 2));

20 local orig: addr = ( is orig ? c$id$resp h : c$id$orig h );

22 if ( orig !in peer mappings ) { 24 peer mappings[orig] = set(); } 26 add peer mappings [ orig ][ peer ]; } 28 i = i + 1; 30 } } 32 }

Listing 11: tcp_contents event handler for UTPEX analyzer

A.7 Port/Peer Ratio

function check scan(c: connection , established: bool , reverse: bool): bool 2 {

4 # lines omitted for brevity ...

6 local n = | distinct peers [orig ] | ; local p = orig in distinct ports ? | distinct ports [orig ] | ∗ 1.0 : 1.0; 8 local pn = p / n;

10 # Check for threshold if not outbound. if ( ! shut down thresh reached[orig ] && 12 n >= shut down thresh &&