A Methodology for P2P File- Traffic Detection ∗

Angelo Spognardi,† Alessandro Lucarelli, Roberto Di Pietro Universita` di Roma “La Sapienza” Dipartimento di Informatica Via Salaria, 113, 00198-Roma, Italy {spognardi, dipietro}@di.uniroma1.it, ale [email protected]

Abstract centralized and decentralized P2P protocols, as well as the characterization of encrypted traffic, and highlight a new Since the widespread adoption of peer-to-peer (P2P) research direction in the identification of P2P traffic. networking during the late ’90s, P2P applications have multiplied. Their diffusion and adoption are witnessed by the fact that P2P traffic accounts for a significant fraction 1 Introduction of Internet traffic. Further, there are concerns regarding the use of these applications, for instance when they are P2P networking can be seen as a network of computers employed to copyright protected material. Hence, in that does not use /server paradigm but is based on the many situations there would be many reasons to detect P2P notion of peers. Peers may differ in processing capabilities, traffic. In the late ’90s, P2P traffic was easily recognizable connection speed, local network configuration or operating since P2P protocols used application-specific TCP or UDP systems. P2P networks can offer the functionalities required port numbers. However, P2P applications were quickly em- to implement a generic application as in [3, 12]. Lack of powered with the ability to use arbitrary ports in an attempt centralized authorities in P2P networks reflects in a totally to go undetected. Nowadays, P2P applications explicitly distributed configuration of directly connected peers. Some try to camouflage the originated traffic in an attempt to go P2P networks also have a small set of special nodes, known undetected. as super nodes [9, 8] that usually perform some special Despite the presence of rules to detect P2P traffic, no tasks, such as queries handling, typically requiring major methodology exists to extract them from applications with- resources availability. One common application of P2P net- out the use of reverse engineering. In this paper we develop works can be identified as file sharing among users. Down- a methodology to detect P2P traffic. It is based on the anal- load operations typically involve two phases: ysis of the protocol used by a P2P application, extraction of specific patterns unique to the protocol, coding of such Signaling phase: a peer searches for the content and de- a pattern in rules to be fed to an Intrusion Detection Sys- termines which peers are eligible to provide the de- tem (IDS), and validation of the pattern via network traffic sired content. In many protocols this phase does not monitoring with SNORT (an open source IDS) fed with the involve any direct communication with the peer which devised rules. In particular, we present a characterization will eventually provide the content. of P2P traffic originated by the OpenNap and WPN proto- Download phase: The requester contacts one or multiple cols (implemented in the WinMx application) and FastTrack peers among the eligible ones to directly download the protocol (used by ) obtained using our methodology, desired content. that shows the viability of our proposal. Finally, we con- clude the paper exposing our undergoing efforts in the ex- Detecting P2P file sharing traffic can be required in sev- tension of the methodology to exploit differences between eral contexts. For instance, in an enterprise network ad- ministrators would like to provide a degraded service (via ∗This work was partially funded by the PRIN 2003 Web-based Man- rate-limiting, service differentiation, blocking) to P2P traf- agement and Representation of Spatial and Geographic Data project, sup- fic to ensure good performance for enterprise critical appli- ported by the Italian MIUR and by the WEB-MINDS project supported by cations, and/or enforce corporate rules regulating the P2P the Italian MIUR under the FIRB program. Roberto Di Pietro is also with CNR-ISTI, WNLab-Pisa. Angelo Spognardi is the contact author. usage [17]. Broadband ISPs would like to limit the P2P traf- †Authors are in reverse alphabetical order fic to limit the cost they are charged by upstream ISPs. All

Proceedings of the 2005 Second International Workshop on Hot Topics in Peer-to-Peer Systems (HOT-P2P'05) 0-7695-2417-6/05 $20.00 © 2005 IEEE 0-7695-2426-5/05 $20.00 © 2005 IEEE Authorized licensed use limited to: Universita degli Studi di Roma La Sapienza. Downloaded on July 15, 2009 at 13:29 from IEEE Xplore. Restrictions apply. these activities require the capability to accurately identify OpenNap, WPN and FastTrack protocols, run by the WinMx P2P network traffic. Further, identification of the users per- and KaZaA applications. In this section we also report the forming file sharing inside a network can be useful to sup- rules for the SNORT IDS to catch the protocols signatures port forensics investigations. However, application identifi- patterns. Section 5 reports our conclusion and a few re- cation inside IP networks, in general, can be difficult. First- search directions. generation P2P applications used well-defined port numbers to send file sharing traffic, hence the identification of P2P 2 Related work traffic was a relatively easy task. In response to this, P2P applications acquired the capability to utilize any port num- ber. Furthermore, recent P2P networks tend to intention- Early research on P2P traffic characterization were based ally camouflage their generated traffic [11] to circumvent on the addressing of default network ports [18],[16]. Re- both filtering firewalls as well as possible legal litigation. cent work [7], uses application signatures to characterize There are also some P2P applications that support encryp- the workload of KaZaA downloads, while in [17] signatures tion, while others adopt file fragmentation; these applica- for a wide range of P2P applications are provided. How- tions split the file to be sent into chunks, where each chunk ever, these studies do not provide evaluation of accuracy, is eventually sent by a different peer. scalability or robustness features of their signature, or lack There are some projects in the area of P2P traffic detec- to highlight the methodology adopted, or do not consider tion: the same SNORT project group proposes some rules some interesting protocols. Signature based traffic classifi- for the detection of P2P traffic and there exist some com- cation has been mainly performed in the context of network mercial applications (like p2pwatchdog [13]) that have the security such as intrusion and anomaly detection (e.g. [2], only purpose to catch and to monitor P2P traffic. However, [1]) where one typically seeks to find a signature for an at- neither the SNORT community nor the p2pwatchdog devel- tack. opers say how to write rules for all P2P file-sharing pro- In [19], [1], [10] research focuses on aggregated data grams; p2pwatchdog, furthermore, is neither open-source traffic to distinguish regular one from the one originated by nor free. What it is lacking, then, is a methodology to write P2P applications. These works provide a view of local P2P IDS rules for P2P traffic detection or, more in general, a usage, while in [18] is reported a complementary backbone flexible methodology to be able to identify any application- view, that is, the analysis of data gathered from a tier-1 In- specific traffic. ternet Service Provider. Our approach is similar to that reported in [4],[17] in the 1.1 Main Contributions and Road-map sense that as a final result we provide a set of signatures to identify P2P file sharing traffic. Our approach differentiates from [4],[17] in the sense that the methodology proposed is In this paper, we provide a methodology to identify P2P clearly depicted and combines both signature and intrusion traffic. The methodology is based on the following steps: detection techniques. analysis of the protocol of interest; identification of patterns specific to the P2P protocol that can be revealed by an IP packet level analysis; coding of these patterns in rules that 3 Methodology can be fed to an IDS; network monitoring of the identified patterns with an effective IDS fed with the devised rule. In this section we provide the methodology employed Note that following the IDS-like approach does not intro- to detect P2P file sharing traffic. Note that the proposed duce any delay in the network, while requiring only little methodology is general enough to be easily adopted to any overhead on the checking-point where it is installed. Fur- P2P file sharing protocol. To show its flexibility, we have ther, the proposed methodology is showed to be extensible applied the methodology to the following P2P protocols: to the analysis of P2P protocols that encrypt their gener- OpenNap, WPN and FastTrack Protocols. Once specific ated traffic as well and to efficiently leverage characteris- pattern for the protocol of interest have been find out, it is tics introduced by decentralized P2P file sharing applica- possible to feed any IDS with the appropriate rules to iden- tions. Our P2P traffic detection tool has been successfully tify such patterns. In our specific case, we have expressed deployed and is currently running in a corporate LAN. such pattern in terms of SNORT rules. Note that SNORT The remainder of this paper is organized as follows. Sec- is only one of the possible IDS: our choice was SNORT tion 2 reports related work in the field. Section 3 depicts because it is the most popular IDS, due of its history, be- the working hypothesis as well as the methodology to delve cause it is open source and also because its rules are easy with P2P traffic detection. Section 4 highlights the techni- to understand. Moreover, SNORT has a large community cal issues involved in identifying P2P traffic in real time of developers, it is extensible with plug-in and add-ons and inside the network. The methodology is applied to the it works on every operating system (Windows, Unix/

Proceedings of the 2005 Second International Workshop on Hot Topics in Peer-to-Peer Systems (HOT-P2P'05) 0-7695-2417-6/05 $20.00 © 2005 IEEE Authorized licensed use limited to: Universita degli Studi di Roma La Sapienza. Downloaded on July 15, 2009 at 13:29 from IEEE Xplore. Restrictions apply. and Solaris). However, other sniffers (like Bro [14]) can be modified version of HTTP1.1 and DNS protocol, to solve equally used for the phase of writing rules. Also, the focus server names to IP addresses. Analyzing the payloads, we of this paper is not to write SNORT rules, but to propose could catch, for instance, the lists of shared files as well as a methodology to study applications that can lead to write the login and the welcome messages triggered by OpenNap IDS rule, within different peer operating contexts such as protocol. Since some of these elements are recurrent and encryption and decentralization. fixed, they were used to generate IDS rules to recognize file-sharing client generated traffic. However, recognition 3.1 Working Hypothesis of clear text is not always possible. For instance, FastTrack and WPN use some techniques to encrypt messages. Never- Our focus is in the protection of a company LAN from theless, the analysis of the generated traffic is still possible unauthorized network traffic, in particular network traffic and effective, as will be seen in Section 4.2.2. originated by P2P file sharing applications. Our operating testbed was composed of different systems running Linux 3.2.1 Objective of traffic generation and Windows Operating systems. Network was partitioned From the analysis of the generated traffic, we strove to un- using layer-3 switches and Internet connections were fil- derstand in which way a client acquires knowledge of other tered by a firewall. To simulate the network traffic, we have peers in the network and which type of connections it estab- set up two different systems: the first one was a computer lishes with them. We used “what-caused-what” relations: running Linux Red Hat7, equipped with a network adapter for every action we requested the client to perform, there Eth0 configured in promiscuous mode so as to intercept all was a subsequent analysis of the triggered messages. The the traffic on the portion of LAN it was connected to. In- message analysis was done step-by-step: first, we tried to stalled on this machine we had WinDump [22] to analyze understand the structure of every packet composing a mes- incoming data packet, as well as SNORT [20, 15] to check sage; then, we tried to discover the effective content of the out whether our protocol analysis had correctly identified whole message; as last step (Section 4.1.2), we tried to un- patterns of P2P file sharing; the second system was running derstand the rationales that caused its generation. Windows XP. On this second system there was installed To trigger the client, we analysed the client graphical in- WinMX [23] and KaZaA [9] as well as Ethereal [5]. The terface. Then, with netstat, we looked for the established data collected from this system would have revealed the P2P connections. Finally, we used the sniffer to analyse the op- protocol architecture. erations of these connections: for every client action (like client start-up or query submission) we sniffed the traffic to 3.2 Approach identify packet format and message payloads. With this type of analysis, we developed a model de- There exist different approaches to characterize the traf- scribing how the client works and studied the different pro- fic of a protocol. One of these is to operate a reverse engi- tocols used by the same client (for example, WinMx uses neering of an application that uses the protocol. An attempt both OpenNap and WPN protocols). We were able to cate- to reverse engineering the KaZaA application has been per- gorize the client actions in these recurrent phases: formed by the giFT-FastTrack community [6], to identify Discovering and Booting: in this phase a starting client the proprietary schemes of the FastTrack protocol. How- finds a network to log in and searches active peers over it. ever, this approach can disclose details that are not relevant Information about active peers is provided consulting a pool to traffic detection, while being oblivious with report to fea- of central servers. tures that can allow the straight characterization of P2P traf- Sharing: in this phase, clients send the list of their files. fic. Querying and Lookup: in this phase, a peer searches for a We maintain a high level approach, that is we focus on peer storing a file of interest. The result of a search is the the interface provided by the client adopting the protocol address of those clients that share the requested file. of interest. In particular, we analyse the messages gen- Downloading: in this phase, two clients exchange a file. erated by user-triggered actions. Analysing the network traffic originated by triggering the client interface, it was 3.3 Issues possible (using the netstat command) to acquire informa- tion about network protocols used, open connections, IP 3.3.1 Encryption addresses and ports. In this way, we limited the research space only to those protocols used by the client. With the As introduced, file-sharing client can use messages encryp- use of Ethereal, WinDump and netstat, we observed for ex- tion techniques. In this way, the clients can hide shared ample that OpenNap uses TCP/IP protocols and sometimes or requested files. To overcome this difficult, the analy- UDP protocol. At the Application Level, we observed a sis focused on the identification of recurrent sizes of TCP

Proceedings of the 2005 Second International Workshop on Hot Topics in Peer-to-Peer Systems (HOT-P2P'05) 0-7695-2417-6/05 $20.00 © 2005 IEEE Authorized licensed use limited to: Universita degli Studi di Roma La Sapienza. Downloaded on July 15, 2009 at 13:29 from IEEE Xplore. Restrictions apply. and UDP payloads, bound to some behavior of the appli- information can be used to devise ad hoc IDS rules (Rule 2, cations. Another technique was to change some user in- Section 4.1.2). formation and watch the modification of the packets size. Server→Client: answer to login message For instance, about the protocol FastTrack we noticed a re- To answer to a client login request, the server replies current string of 26 bytes when establishing a connection with a message that contains the strings VERSION , to the network. Further, the modification of the username SERVER and other information (like the and the subsequent observation of the change in size of the string Welcome and some statistics on active users and packet allows to bind a specific packet among those sent by shared files ). This information is sent over several pack- the client to the notification of the username. Note that a ets, because TCP protocol used on Ethernet limits the MSS successful way to recognize this type of traffic is to catch (Maximum Segment Size) to 1460 bytes. sequences of recurrent strings of fixed sizes. “statistics”

3.3.2 Firewall A fragment of traffic can be found in [21]. The first Ethernet packet of the answer has a well defined structure and can P2P protocols show a different behavior whether the P2P be used by an IDS as a recurrent element, to identify an client is firewall protected or not. Then, the analysis has to OpenNap connection over the network. In the next section, take into account both possibilities. That is what we did for in fact, we report a rule for SNORT, that searches into the OpenNap, WPN and FastTrack protocols. payload of the TCP messages the two strings VERSION and SERVER. 4 Experimental Results Client→Server: list of shared files After the reply of the server, the client sends the list of its In this section we show the results of our analysis, that own shared files, according to this simple format: is we describe the OpenNap, WPN and FastTrack protocols during their execution. Moreover, we report how to iden- tify specific network traffic patterns that detect file-sharing An example of this kind of traffic is reported in [21]. This activities originated by this protocol and how to write IDS kind of messages can be used to write IDS rules, meant to rules that detects such activities. show the filenames of the files shared by a peer (see Rule 4, Section 4.1.2). 4.1 OpenNap Client→Server: search request To submit a query to the server, the user must fill-in a form 4.1.1 Protocol analysis of the graphical interface of WinMx, with a few words con- The OpenNap protocol is based on a pool of central servers: cerning the requested file: those words are the criteria used all the peers that want to join to an OpenNap network es- by the server to perform the lookup in the file list. The tablish a TCP connection with one of these servers. A server, in fact, returns every file with the the requested Central Server maintains a list of all the files shared by words included in the name of the file (e.g. Vasco Rossi users, but does not store any file. Following a client-server Generale). Other search criteria can be specified, like in- model, every user can ask to the server which peers store formation about the performances of the storing peers. The the requested file, while the download is performed between structure of the query sent by the client to the server follows. peers (the requesting peer and the storing peer), via a direct reconstructed step-by-step, trigging a single action on the client and analysing the generated traffic. The structure of this messages (see [21] for an example) Client→Server: connection and login can be used to write IDS rules to detect OpenNap protocol Before starting a download session, the user has to spec- traffic (see Rule 4, Section 4.1.2). ify some information, such as user name, password and in Server→Client: search response particular the central server list. To establish a TCP con- The answer to a query is a list of all the known files that sat- nection with a server, the OpenNap protocol sends a login isfy the search criteria. In addiction to the file-name, a list message; this message contains information about the user: element contains also: the IP address of the storing peer, the user nick-name, password, listening port, client type and complete remote-path, the file format, its size and other in- connection-line speed. An example of captured fragment formation on the file type (for instance: bit-rate, frequency can be found in [21]. The traffic generated by this phase and duration for an mp3 file). The structure of the server contains the name and the version of the software used. This response is the following:

Proceedings of the 2005 Second International Workshop on Hot Topics in Peer-to-Peer Systems (HOT-P2P'05) 0-7695-2417-6/05 $20.00 © 2005 IEEE Authorized licensed use limited to: Universita degli Studi di Roma La Sapienza. Downloaded on July 15, 2009 at 13:29 from IEEE Xplore. Restrictions apply. <00..> Send the byte "1" Send 3 bytes "GET" + Requesting Storing A traffic fragment is reported in [21]. Observe that the Peer Send Peer server does not filter the returned list: the list contains all File exchange the files which have in their names the requested words. All the refinements on the search (like on the file-type .iso, Figure 1. OpenNap: download without the .mp3, .doc,...) areperformed by the requesting client. We presence of a firewall believe this strategy is adopted to decrease the overhead on the server side.

Client→Server: download notification 1. When the client receives the server response, the list pro- Requesting 2. Port:0 vided by the server is shown on the WinMx graphical in- Peer 3. Port:0 terface. If the user selects one of the elements, the client

begins the download request phase for the selected file. The 7.Connection open first operation is the generation of a message for the central 4. server with this format: Storing 5. Server Peer 6.

The information required to generate such a message are (a) taken from the list provided by the server. → Send the byte "1" Server Client: storing peer complete IP address Send 4 bytes "SEND" +

The response to the download notification is a message that Storing Requesting uniquely identifies the complete address of the peer that Peer Send Peer stores the requested file: IP address and port on which a File transfer requesting peer can establish a TCP connection. (b) Firewall When a storing peer is on a firewall-protected network, the Figure 2. OpenNap: Download with the pres- protocol differs from the once followed when the same peer ence of a firewall is not firewall-protected. Hence, we have analysed both sce- narios. On one hand, a peer not firewall-protected can re- ceive every incoming TCP connection. On the other hand, if a peer is firewall-protected, the firewall will possibly block Download with firewall: TCP connection establish- all the incoming TCP connections (including the file down- ment load). To solve this problem, the two peer benefit from the In this scenario, a requesting peer wants to download a help of the server. The server notifies the firewall-protected file stored in a firewall-protected peer. The presence of the peer to establish a TCP connection with the requesting peer: firewall prevents the opening of a TCP connection from the the firewall-protected peer performs a “passive connection requesting-peer to the storing-peer. Note that the informa- open” toward the requesting peer and an upload of the re- tion that the storing peer is firewall-protected is notified to quested file. Since the TCP connection is outgoing, the fire- the server by the peer itself during the Booting Phase (Sec- wall is bypassed and the file-exchange is possible. Details tion 4.1.1). This information is sent by the server to the are in Paragraph 4.1.1. Note that if both peers are firewall requesting peer along with the complete address of the stor- protected, the file-exchange is impossible. ing peers (Section 4.1.1, Figure 2 (a), messages 1 and 2). Download without firewall The information that the storing peer is firewall-protected is In absence of firewall, the requesting peer can establish a coded with a value 0 assigned to the port number (Figure 2 direct TCP connection with the storing peer, using the IP ad- (a), messages 1 and 2). A captured fragment with these val- dress received from the server. After the 3-way-handshake, ues is shown in [21]. the storing peer sends over the connection one byte, con- Once the requesting peer is notified by the server that the taining the value “1” (Figure 1). When the requesting peer storing peer is firewall-protected, it sends back to the server receives this byte, it sends a byte-string, that contains the a copy of the received message (Figure 2 (a), message 3). word “GET” followed by the name of the requested file and Then, the server sends to the storing peer a message with the offset (in bytes) from where to start the download. Af- the nickname of the requesting peer and the name of the ter this exchange, the transmission of the file starts. This requested file (Figure 2 (a), message 4). The storing is used in Section 4.1.2 to write an IDS rule. answers to the server with a copy of this last message (Fig-

Proceedings of the 2005 Second International Workshop on Hot Topics in Peer-to-Peer Systems (HOT-P2P'05) 0-7695-2417-6/05 $20.00 © 2005 IEEE Authorized licensed use limited to: Universita degli Studi di Roma La Sapienza. Downloaded on July 15, 2009 at 13:29 from IEEE Xplore. Restrictions apply. ure 2 (a), message 5). Finally (Figure 2 (a), message 6), the This rule extracts from the network traffic the list of server sends to the storing peer the complete IP address of the files shared by the client. This rule must be activated the requesting peer. This allows to establish a TCP connec- after the alert raised by Rule 1 and must be written for tion between the storing and the requesting peers. every hard-disk letter to be monitored (like C:\). It looks Download with firewall: file exchange for TCP payloads containing ASCII strings like “22 43 3a When the connection is established, the requesting peer 5c” (i.e. “C:\), with flags ACK or PUSH set, sent to the sends over the connection one byte with value “1” (Figure 2, server. The rule tells to the IDS to save the ID number of (b)). The storing peer answers sending a byte-string that the frame recognized and the next 5 frames: these frames contains the word “SEND”, followed by the name and the contain part of the list of shared files. size of the requested file. When the requesting peer receives this information, it sends the offset from where to start the Rule 4 file transfer. After this message, the file exchange starts. # alert on query submit The sequence of the first two messages of Figure 2 (b) can alert tcp $HOME NET any -> $EXTERNAL NET any (content:"FILENAME CONTAINS"; offset: be used to write a rule to identify the traffic generated by 4; depth: 18; flow:established; flags:PA; the OpenNap protocol (Rule 5 in Section 4.1.2). msg:"Query submitting "; ) This rule alerts when an entity submits to a central 4.1.2 SNORT rules server a request. It recognizes the word “FILENAME CONTAINS” in the TCP payloads, with flags PUSH and As a result of the previous analysis of the OpenNap proto- ACK set. col, this section contains the rules for SNORT.

Rule 1 Rule 5 # alert on download requests # catch the server welcome answer alert tcp $HOME NET any <-> $EXTERNAL NET any alert tcp $HOME NET any -> $EXTERNAL NET (content:"GET"; offset:0; depth:3; dsize:3; any (content:"VERSION"; offset:4; flow:established; flags:PA; msg:"GET OpenNap depth:12; content:"SERVER"; offset:11; Downloading "; tag:session,2,packets;) depth:18; flow:from server; flags:A*PA; msg:"OpenNap Server Connection "; classtype:policy-violation;) This rule must be activated after the alert raised by Rule 1: it allows to catch the name of the file to be This rule allows to identify any software that uses the stored and the address of the peer at the other end of the OpenNap protocol. It catches the message sent by the established TCP connection. It looks for the word “GET” server as an answer to the client login (Section 4.1.1). in the TCP payloads with flags PUSH and ACK set and it With this rule, the IDS looks for the strings VERSION and saves the next 2 frames, which contain the name of the file SERVER in the TCP payloads incoming from the server, to download. with flags PUSH and ACK set.

Rule 6 Rule 2 # alert on upload requests # catch the client login message alert tcp $HOME NET any <-> $EXTERNAL NET any alert tcp $HOME NET any -> $EXTERNAL NET (content:"SEND"; offset:0; depth:4; dsize:4; any (content :WinMX; offset: 4; flow:established; flags:PA; msg:"SEND OpenNap nocase ; flow:established; flags:PA; Downloading "; tag:session,2,packets;) msg:"WinMx Connection to OpenNap Server "; classtype:policy-violation;) This rule is similar to the Rule 5: it looks for the This rule analyses the network traffic to check if there string “SEND”, used by the request of a file stored by a are TCP payloads containing the word “WinMx”, with client over a firewall (Section 4.1.1). When this rule is flags PUSH and ACK set . This rule tries to capture the matched, it allows to know the name of the requested file. login messages (Section 4.1.1) sent from the client to the server, originated by the WinMx application, with default 4.2 WinMX Peer Network (WPN) parameters. 4.2.1 Protocol analysis Rule 3 # catch the name of the files shared by the client The WPN protocol is more complex then OpenNap: it is alert tcp $HOME NET any -> $EXTERNAL NET decentralized and it uses encrypted messages. To analyze any (msg:"Shared file list, Client --> ServerOpenNap "; flow:established;flags:!S; WPN, we studied the WinMx application, version 3.11. In flags:!SA; content:"|22 43 3a 5c|"; nocase; the following, we present how the protocol works. The first offset:4; depth:9 ; tag:host,5,packets,src;) option required to the user is the type of connection to use

Proceedings of the 2005 Second International Workshop on Hot Topics in Peer-to-Peer Systems (HOT-P2P'05) 0-7695-2417-6/05 $20.00 © 2005 IEEE Authorized licensed use limited to: Universita degli Studi di Roma La Sapienza. Downloaded on July 15, 2009 at 13:29 from IEEE Xplore. Restrictions apply. 1. Hello (16 byte UDP packet) 1. Four UDP packets exchange 2. Hello reply (17 byte UDP packet) 2. TCP connection open PC 3. List request (17 byte UDP packet) Cache Serv. PC 3. Send the byte "1" PCa 4. List of PC (145 byte UDP packet) 4. Enctypted TCP dialog

5. UDP flooding toward other PCs Figure 4. WPN: Primary Connection peer→Active Primary Connection peer Figure 3. Primary Connection peer→Cache Server

the network. This sequence of messages is similar to the (primary or secondary). The peers connected with a pri- sequence exchanged with the Cache Server: it is composed mary connection: of packets of size 15, 17, 17, while the last message is at • are the backbone of the WPN networks and receive con- least 140 bytes. These messages are alternatively sent by nections from others peers; the booting PC-peer and by the contacted PC-peer. If a con- • manage users query, lists of shared files and chat; tacted PC-peer answers to all the messages sent by the boot- • use a mechanism of mapping of the TCP and UDP ports ing PC-peer, then the booting PC-peer establish a TCP con- (default are respectively 6257 and 6669). nection with it. Using WinMx application with the default An essential condition for a primary connection is that the settings, the used UDP port was 6257 and the TCP port was peer is not firewall protected. Moreover, a primary con- 6699. The listening ports of the Cache Servers were 7940 nected (PC) peer should have a fast internet connection (ca- and 7941 respectively. ble, dsl) and good processing performances. The booting phase ends when the PC-peer has established The peers active with a secondary connection: at least four TCP connection with four other PC-peers. Af- • are connected to peers with a primary connection; ter opening the connection (Figure 4), the contacted PC- • operate as file repository; peer sends the byte “1”, receives a string of 16 bytes from • do not map TCP and UDP ports; the other peer and sends again 16 bytes. After this ex- • can join on a network also if they are firewall protected, change, the two peer continues to exchange encrypted mes- or use a Network Address Translator (NAT) mechanism. sages: we suppose that these messages contain information Secondary connected (SC) peers use primary connected about queries and list of shared files, obtained from sec- peers to perform queries and share their list of files with ondary connected peers, in a way that will be explained in other peers. Section 4.2.3. PC-peer→PC-peer: search request When a PC-peer wants to perform a search for a shared 4.2.2 WPN: primary connections (PC) analysis file, it asks for it through the encrypted TCP connections, PC-peer→Cache Server: booting sequence to its PC-peers it is connected with. The peers that receive a When a PC wants to join a WPN network, it makes an ex- search request, also, propagate the search toward their own change of UDP packets with a Cache Server, known by the connected PC-peers. All the PC-peers that receive the re- application (Figure 3). This exchange is a static sequence quest and that know a peer with the requested file, send the of four messages of size (in order) 16, 17, 17 and 145 bytes, address of it to the searching peer. alternatively sent by the booting PC-peer and by the Cache Download Server. An example of this sequence is reported in [21]. The The download phase of the WPN protocol is the only phase payload of the messages is encrypted, hence we cannot in- that uses plain text messages. During our analysis we no- spect directly its content. However, analysing the exchange ticed that the process of a download is identical to that of sequence, we argue that they are the forward of PC-peer OpenNap protocol (Sections 4.1.1 and 4.1.1): establishment credential toward the Cache Server and the forward of a list of TCP connection between requesting and storing peers, of other active PC-peers, from the Cache Server to the join- forwarding of a GET or SEND message (depending on the ing PC-peer. presence of a firewall) and file exchange. Since the PC- We noticed that a booting PC-peer makes a DNS query to peer is not firewall protected, the download session can be solve the hostname “.com” or “frontcode.com”, to always performed. receive the IP address of a Cache Server. → PC-peer PC-peer: booting sequence 4.2.3 WPN: secondary connections (SC) analysis After the UDP exchange with the Cache Server, a booting PC-peer exchanges a sequence of four UDP packets with SC-peer→Cache Server: Booting Sequence a variable number (between 4-6) other PC-peers active in When a new peer wants to join a WPN network with a

Proceedings of the 2005 Second International Workshop on Hot Topics in Peer-to-Peer Systems (HOT-P2P'05) 0-7695-2417-6/05 $20.00 © 2005 IEEE Authorized licensed use limited to: Universita degli Studi di Roma La Sapienza. Downloaded on July 15, 2009 at 13:29 from IEEE Xplore. Restrictions apply. 1.TCP Connection Open possible. SC 2.Primary Conn. List (149 bytes) Peer Cache 3.TCP Connection Close (16 bytes) 4.2.4 SNORT rules Figure 5. WPN: Secondary connection Like for OpenNap protocol, in this section we present some peer→Cache Server rules for SNORT that can be used to detect WPN traffic.

Rule 1 1. TCP Connection Open # alert on primary connection contacting WPN cache servers 2. ’’1’’ (1 byte) alert udp $WINMXCACHESERVER any -> $HOME NET SC 3. Encrypted data (16 bytes) Primary Conn. any (msg:"WPN Primary connection detected"; 4. Encrypted data (16 bytes) dsize:145; classtype:policy-violation;) This rule allows to detect a primary connection that Figure 6. WPN: Connection SC→PC tries to contact the Cache Server of an WPN network. It detects the last of the four packets exchanged between a booting PC-peer and a Cache Server, that is the one that secondary connection, it opens a TCP connection with the has the fixed size of 145 bytes. This rule uses the variable dedicated Cache Server and sends towards it an encrypted $WINMXCACHESERVER that contains an IP address list of login message (message 1, Figure 5). The Cache Server the Cache Servers. Moreover, it is possible to write a less answers with a TCP message with an encrypted payload restrictive rule by considering also the third packet of the of 149 bytes size. Like for the UDP packets exchange of sequence and activate the alarm on the detection of both the the Booting Sequence of the primary connection, we argue third and the fourth UDP packets. that the joining peer receives from the server a list of active peers with a primary connection. The last step performed Rule 2 by the SC-peer is to send 16 byte toward the Cache Server, # alert on secondary connection contacting WPN cache servers followed by a TCP packet with the FIN and ACK flags set. alert tcp $WINMXCACHESERVER any -> $HOME NET any (msg:"Probably WPN Secondary connection The Cache Server closes the connection and sends back a detected"; flow: established; flags:PA; TCP packet with the RST flag set. An example of Booting tag:host,1,packets,src;) Sequence can be found in [21]. This rule alerts on detecting TCP traffic between a → SC-peer PC-peer: booting sequence peer and a Cache Server. After the communication with the Cache Server, a SC-peer receives a list of primary connected peers and can estab- 4.3 FastTrack lish a TCP connection with one of them (Figure 6). The scheme for the establishment of the connection is similar 4.3.1 Protocol analysis to the Booting Sequence of a PC-peer: the joining SC-peer opens a TCP connection and receives back from the PC- FastTrack is one of the most used P2P protocol for file- peer contacted the byte “1”. After this exchange, the SC- sharing: its network has a number of users greater then the peer sends and receives from the PC-peer an encrypted mes- network on its apogee. Its diffusion is due to the sage of 16 bytes. Notice that this kind of message exchange capability to download different parts of the same file from is recurrent and can be used to write IDS rules. An example several different peers, to stop a download and resume it of this message exchange is shown in [21]. later. The basic structure of a FastTrack network is quite SC-peer→PC-peer: search request/response similar to that of WPN: there are some peers (called su- The queries within the WPN network are similar to the pernodes) that, besides to share their files, build also the protocol OpenNap, with the remarkable difference that the backbone of the network. All the other peers only share messages are encrypted. Moreover, a SC-peer instead of their files. The supernodes are required to have good per- sending the search query to the Central Server, it forwards formances in terms of both bandwidth availability and com- the request to the PC-peer it is connected with. The back- putational speed. They can correctly operate even if they bone network of PC-peers performs the search for the file are firewall protected. Moreover, the supernodes operate and answers to the requesting SC-peer with the addresses as a distribute server for all the peers of the network, with of those peers that share the searched file. the task to manage the list of the shared files and answer to Download the queries. To test the FastTrack protocol, we studied the This phase is almost identical to that for primary connec- KaZaA file-sharing application. The protocol steps follow. tion. The only difference is that if both peers involved in UDP packet flooding: booting sequence the download are firewall-protected, the download is im- To join a FastTrack network, a new peer sends several UDP

Proceedings of the 2005 Second International Workshop on Hot Topics in Peer-to-Peer Systems (HOT-P2P'05) 0-7695-2417-6/05 $20.00 © 2005 IEEE Authorized licensed use limited to: Universita degli Studi di Roma La Sapienza. Downloaded on July 15, 2009 at 13:29 from IEEE Xplore. Restrictions apply. packets toward different IP host. The port used for this exchange can be chosen by the user. Each of these pack- Supernode 3. Download request of requesting peer ets (called ping) is in plain text, has 12 bytes size, begins 0x27 with the hexadecimal value and contains the word 1. Download request 2. Informations about “KaZaA” (the name of the application that uses FastTrack from storing peer storing peer ("Port:0")

protocol). The contacted hosts that answers to the peer, 4. Connection open: "GIVE" Storing can send back two kinds of different UDP packets (called Requesting 5. HTTP "GET" Peer pong): one begins with a byte with value 0x28 and the Peer 6. File transfer other begins with the value 0x29. The first type of pack- ets (pong1) has 17 bytes size and contains also the world Figure 7. FastTrack: download from a firewall “KaZaA”. The peers that answer with pong1 are able to es- protected peer tablish a TCP connection with the joining peer, in a way described in the next section. The second type of message (pong2) has 21 bytes size. Upon reception of a pong2 mes- sage, the joining peer exchanges another unpredictable se- establishment of a TCP connection between the requesting quence of UDP packets with the remote peer, but does not and the storing peer: if one of the peer is firewall-protected, establish a TCP connection with it. The fixed structure of this one will open the connection. If both the peer are fire- this UDP messages exchange will be object of an IDS rule wall protected, then the transfer is not possible. in Section 4.3.2. If the storing peer is not firewall protected, the connection TCP connection: booting sequence is open by the requesting peer. This peer sends a message The joining peer tries to establish a TCP connection with all similar to an HTML request (see frame in [21]): the GET the peers that answer to its ping with a pong1. At the end of ... HTTP/1.1 command followed by the hash value of the this phase only one connection is kept, while all the other name of the searched file and some other information like are reset. the username, the version of the software, the IP address of After the opening of the TCP connection, the joining peer the peer and the address of the supernode. sends a messages of 12 bytes size and receives back an an- If the storing peer is firewall protected (Figure 7), then it re- swer of 14 bytes (both the messages are encrypted). Af- ceives an encrypted message from the supernode. We argue ter this exchange, the joining peer sends a message toward that this message contains the IP address of the requesting the other peer, communicating its username. To derive this peer since, after receiving this message, the peer establishes knowledge, we changed several times the username and a TCP connection with the storing peer and sends a message analysed the change in the size of the message. In response with the word GIVE. Then, the requesting peer sends back to the username, the remote peer sends back a message of to the storing peer a GET ... HTTP/1.1 request. 1460+298 bytes not in plain text. We argue that this mes- After the opening of the TCP connection and the request, sage is a fresh list of active supernodes. the storing peer performs the same operations in both the Searching peer→Supernode: search request cases (with or without firewall). It sends a reply similar The search request is performed by sending toward the con- to an HTTP/1.1 ... OK response (as shown in [21]): sta- nected supernode an encrypted request. We recognized a tus code of the answer, followed by meta-information about fixed size for the request: 21 byte plus the number of the the requested data (like the size of the data and the content- letters of the requested file. We recognized this pattern by type), as well as information about the storing peer. After changing several times the length of the requested file and this exchange the file transfer starts. looking for the modification of the size of the message. The supernode answers to a search request with a series of 4.3.2 SNORT rules frames, that we suppose are the list of IP address of the active peers that share the requested file. After this mes- Rule 1 sages, in fact, the peer is able to start a download session # alert on supernodes that answer to joining peer alert udp $EXTERNAL NET any -> $HOME NET with another peer, without receiving other messages from any (dsize:17; content:"|28|"; offset:0; the supernode. depth:1; rawbytes; content: "|4b 61 5a 61 41 00|"; offset:11; depth:17; msg:" Supernode Download Response";) We noticed that the download request is the only phase of This rules allows to catch the UDP flooding of answering the FastTrack protocol that is performed in plain text. Sim- supernodes. It will alert on catching the pong1 from an ilar to the OpenNap and WPN protocols, the download pro- active supernode. The rule searches for the value 0x28 cedure changes if one of the implicated peer is firewall- on the first byte and for the string “KaZaA” in the 11th byte. protected. The download request of a shared file causes the

Proceedings of the 2005 Second International Workshop on Hot Topics in Peer-to-Peer Systems (HOT-P2P'05) 0-7695-2417-6/05 $20.00 © 2005 IEEE Authorized licensed use limited to: Universita degli Studi di Roma La Sapienza. Downloaded on July 15, 2009 at 13:29 from IEEE Xplore. Restrictions apply. Rule 2 [3] F. Dabek, M. F. Kaashoek, D. Karger, R. Morris, and I. Sto- # alert on sending a positive response to a request for a shared file ica. Wide-area cooperative storage with CFS. In Proceed- alert tcp $EXTERNAL NET any -> $HOME NET 18th any (flow:from client; content:"|48 54 54 ings of the ACM Symposium on Operating Systems 50 2f 2f 31 2e 3120 32 30 30 20 4f 4b|"; Principles, 2001. offset:0; depth:15; content:"KazaaClient"; [4] C. Dewes, A. Wichmann, and A. Feldmann. An analysis session:printable; msg:"Request of a shared file with KaZaA";) of internet chat systems. In IMC ’03: Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement, This rule alerts when a TCP connection receives the pages 51–64. ACM Press, 2003. message containing the string HTTP/1.1 200 OK, that is [5] http://www.ethereal.com/. when the peer is starting a download session. To reduce [6] http://gift-fasttrack.berlios.de/. [7] K. P. Gummadi, R. J. Dunn, S. Saroiu, S. D. Gribble, H. M. false positives, the string “KaZaA” is searched also. Levy, and J. Zahorjan. Measurement, modeling, and anal- ysis of a peer-to-peer file-sharing workload. In SOSP ’03: th 5 Conclusion and future work Proceedings of the 19 ACM symposium on Operating sys- tems principles, pages 314–329. ACM Press, 2003. [8] A. Gupta, B. Liskov, and R. Rodrigues. One hop lookups for In this paper we have exposed a methodology to detect peer-to-peer overlays. In Proceedings of the 9th Workshop P2P file sharing traffic based on: analysis of the P2P pro- on Hot Topics in Operating Systems (HotOS-IX), pages 7– tocol; identification of patterns specific to the P2P protocol 12, Lihue, Hawaii, may, 2003. that can be revealed by an IP packet level analysis; coding [9] http://kazaa.com. [10] A. Klemm, C. Lindemann, M. K. Vernon, and O. P. Wald- of these patterns in rules that can be fed to an IDS; verifi- horst. Characterizing the query behavior in peer-to-peer file cation of the pattern identified via network monitoring with sharing systems. In IMC ’04: Proceedings of the 4th ACM the IDS feed with the devised rule. Our preliminary results SIGCOMM conference on Internet measurement, pages 55– exposed in this paper lead to a complete characterization of 67. ACM Press, 2004. the traffic generated by the OpenNap, the WPN and Fast- [11] N. Leibowitz, M. Ripeanu, and A. Wierzbicki. Deconstruct- rd Track protocols. The devised rules allow to identify the ing the kazaa network. In Proceedings of the 3 IEEE IP of the systems inside a network that is performing file Workshop on Internet Applications (WIAPP’03), June, 2003. [12] A. Mei, L. V. Mancini, and S. Jajodia. Secure dynamic sharing. Note how this can be helpful in the accountability fragment and replica allocation in large-scale distributed file process required by a judiciary disputes or, better, to disin- systems. IEEE Trans. on Parallel and Distributed Systems, centive not law abiding behavior. Further, the identification 14(9):885–896, 2003. of the P2P traffic does not introduce any delay in the net- [13] http://www.p2pwatchdog.com/. work. [14] V. Paxson. Bro: a system for detecting network intruders The proposed methodology has shown its flexibility: we in real-time. Computer Networks (Amsterdam, Netherlands: have been able to analyse standard protocols (OpenNap)as 1999), 31(23–24):2435–2463, 1999. [15] R. Rehman. Intrusion Detection with SNORT: Advanced well as protocols that encrypt their traffic and are full de- IDS Techniques Using SNORT, Apache, MySQL, PHP, and centralized (WPN and FastTrack). Finally, note that a new ACID. Prentice Hall, 2003. research area is still to be addressed: traffic detection in [16] S. Saroiu, K. P. Gummadi, R. J. Dunn, S. D. Gribble, and multipath protocols. H. M. Levy. An analysis of internet content delivery sys- tems. SIGOPS Oper. Syst. Rev., 36(SI):315–327, 2002. [17] S. Sen, O. Spatscheck, and D. Wang. Accurate, scalable in- Acknowledgements network identification of p2p traffic using application signa- tures. In WWW ’04: Proceedings of the 13th international The authors would like to thank Prof. Luigi V. Mancini conference on World Wide Web, pages 512–521. ACM Press, 2004. for his insightful comments and valuable discussions. [18] S. Sen and J. Wang. Analyzing peer-to-peer traffic across large networks. IEEE/ACM Trans. Netw., 12(2):219–232, 2004. References [19] S. Sen and J. Wang. Analyzing peer-to-peer traffic across large networks. In ACM SIGCOMM Internet Measurement [1] P. Barford, J. Kline, D. Plonka, and A. Ron. A signal anal- Workshop. Proceedings, November, 2002. ysis of network traffic anomalies. In IMW ’02: Proceedings [20] http://www.snort.org/. of the 2nd ACM SIGCOMM Workshop on Internet measur- [21] A. Spognardi, A. Lucarelli, and R. Di Pietro. TR- ment, pages 71–82. ACM Press, 2002. WEBMINDS-46: A methodology for p2p file-sharing traf- fic detection. Technical report, Web-Minds, CINI-Unit of [2] P. Barford and D. Plonka. Characteristics of network traffic 1st Rome, May 2005. flow anomalies. In IMW ’01: Proceedings of the ACM [22] http://windump.polito.it/. SIGCOMM Workshop on Internet Measurement, pages 69– [23] http://www.winmx.com/. 73. ACM Press, 2001.

Proceedings of the 2005 Second International Workshop on Hot Topics in Peer-to-Peer Systems (HOT-P2P'05) 0-7695-2417-6/05 $20.00 © 2005 IEEE Authorized licensed use limited to: Universita degli Studi di Roma La Sapienza. Downloaded on July 15, 2009 at 13:29 from IEEE Xplore. Restrictions apply.