SPAM over Internet Telephony and how to deal with it

Diploma thesis - Rachid El Khayari Supervisor: Prof. Dr. Claudia Eckert, Dr. Andreas U. Schmidt, Nicolai Kuntze Fraunhofer Institute for Secure Information Technology ’O misery, misery, mumble and moan! Someone invented the telephone, And interrupted a nation’s slumbers, Ringing wrong but similar numbers.’ Ogden Nash (1902 - 1971 / USA) 2

Acknowledgements

I want to thank

• Prof. Dr. Claudia Eckert for giving me the opportunity to work on this thesis.

• Dipl. Inform Nicolai Kuntze and Dr. Andreas U. Schmidt for their great support and trust into my work.

• my whole family including my parents Mohamed and Yamina, my brother Soufian, my brother Samir and his wife Nadya, my little niece Sara and last but not least my best friend Inesaf and all others who supported me on my way. 3

Affidavit

I hereby declare that the following diploma thesis "SPAM over Internet Telephony and how to deal with it" has been written only by the undersigned and without any assistance from third parties. Furthermore, I confirm that no sources have been used in the preparation of this thesis other than those indicated in the thesis itself.

Place, Date Signature 4

Introduction

In our modern society telephony has developed to an omnipresent service. People are avail- able at anytime and anywhere. Furthermore the Internet has emerged to an important com- munication medium. These facts and the raising availability of broadband internet access has led to the fusion of these two services. Voice over IP or short VoIP is the keyword, that describes this combina- tion. The advantages of VoIP in comparison to classic telephony are location independence, simpli- fication of transport networks, ability to establish multimedia communications and the low costs. Nevertheless one can easily see, that combining two technologies, always brings up new chal- lenges and problems that have to be solved. It is undeniable that one of the most annoying facet of the Internet nowadays is email spam. According to different sources email spam is considered to be 80 to 90 percent of the email traffic produced. Security experts suspect that this will spread out on VoIP too. The threat of so called voice spam or Spam over Internet Telephony (SPIT) is even more fatal than the threat that arose with email spam, for the annoyance and disturbance factor is much higher. As instance an email that hits the inbox at 4 p.m. is useless but will not disturb the user much. In contrast a ringing phone at 4 p.m. will lead to a much higher disturbance. From the providers point of view both email spam and voice spam produce unwanted traffic and loss of trust of customers into the service. In order to mitigate this threat different approaches from different parties have been devel- oped. This thesis focuses on state of the art anti voice spam solutions, analyzes them to the core and reveals their weak points. In the end a SPIT producing benchmark tool will be im- plemented, that attacks the presented anti voice spam solutions. With this tool it is possible for an administrator of a VoIP network to test how vulnerable his system is. 5

Contents

Acknowledgements 2

Affidavit 3

Introduction 4

1 Basics 8 1.1 The history of telecommunication ...... 8 1.2 Voice over IP ...... 9 1.3 ...... 9 1.4 Real-time Transport Protocol ...... 11 1.4.1 RTP Control Protocol ...... 13 1.5 Session Initiation Protocol ...... 13 1.5.1 SIP Transport ...... 13 1.5.2 SIP Messages ...... 13 1.5.3 Client/Server ...... 14 1.5.4 SIP URIs ...... 14 1.5.5 SIP Requests ...... 14 1.5.6 SIP Responses ...... 16 1.5.7 SIP session establishment ...... 19 1.5.8 SIP transactions/ dialogs ...... 19 1.5.9 SIP Message layout ...... 21 1.5.10 Session Description Protocol ...... 26 1.5.11 User Agent ...... 27 1.5.12 Registrar ...... 27 1.5.13 Proxy Server ...... 29 1.5.14 SIP security mechanisms ...... 30 1.5.14.1 SIP Digest Authentication ...... 31 1.5.14.2 SIPS (SIP Security) ...... 33 1.5.14.3 S/MIME ...... 34 1.5.14.4 IPSec ...... 35

2 SPAM over Internet Telephony 36 2.1 SPIT versus SPAM ...... 36 2.2 Intuitive SPIT definition ...... 36 2.3 SPIT analysis ...... 37 2.3.1 Information gathering ...... 37 2.3.2 SPIT session establishment ...... 39 6

2.3.3 SPIT media sending ...... 40 2.3.4 SPIT summary ...... 40

3 SPIT countermeasures and their weaknesses 41 3.1 Device Fingerprinting ...... 41 3.1.1 Passive Fingerprinting ...... 41 3.1.2 Active Fingerprinting ...... 42 3.1.3 Weakness of Device Fingerprinting ...... 45 3.2 White Lists, Black Lists, Grey Lists ...... 46 3.2.1 Weaknesses of White Lists, Black Lists, Grey Lists ...... 46 3.3 Reputation Systems ...... 47 3.3.1 Weakness of Reputation Systems ...... 48 3.4 Turing tests, Computational Puzzles ...... 48 3.4.1 Weakness of Turing tests and Computational Puzzles ...... 49 3.5 Payments at risk ...... 50 3.5.1 Weakness of Payment at risk ...... 50 3.6 Intrusion Detection Mechanisms, Honey phones ...... 51 3.6.1 Weakness of Intrusion Detection Mechanisms, Honey phones ...... 52 3.7 Summary ...... 53

4 SIP XML Scenario Maker 54 4.1 Technical Basis ...... 54 4.1.1 Message Editor ...... 55 4.1.1.1 SIPp message format ...... 56 4.1.2 Scenario Editor ...... 63 4.1.3 Shoot Mode ...... 64

5 Using SXSM as attack tool 68 5.1 Device Spoofing ...... 69 5.2 SIP Identity Spoofing ...... 71 5.3 SIP Header Spoofing ...... 71 5.4 Call Rate Adaption ...... 72 5.5 Account Switching ...... 72 5.6 Reputation Pushing or Pulling ...... 72 5.7 SIP Identity Hijacking ...... 73 5.8 CAPTCHA Relay Attack ...... 73

6 Conclusions and Outlook 74

Glossary 75

List of figures 76

List of tables 77 7

References 78 8

1 Basics of presented technology

1.1 The history of telecommunication

Ever since people searched for opportunities to communicate over long distances. Optical telegraphs are viewed as the first practical applications of communication over distance and can be dated back to prehistoric times [22]. In order to send out messages, optical signals like light or smoke were sent with a specified code, so that the recipient could see them from far. The electric telegraph based on that principle and was used to transmit messages over elec- tric wires. In the mid 1800s Samuel Morse and Alfred Vail invented a telegraph system in combination with an easy to use code (Morse code)[27]. This led to the success of telegraphy in America and long distance lines were constructed and spread over the country [9]. Only few decades after telegraphy revolutionized telecommunications, telephony began its history in the early 70s of the 19th century with the invention of the telephone[29]. The forefathers of the telephone Antonio Meucci[18], Johann Philipp Reis, Alexander Graham Bell[8] and Elisha Gray, amongst others had a clear vision in common of people being able to talk to each other over distance. Philipp Reis’ first prototype of a telephone was built as an attachment to the existing telegraphy network. The telegraphy network was the common data communica- tion network and with Reis’ invention it was possible to alternatively transport voice through the same electrical wires[29]. Analog telephony is as old as the invention of the telephone itself. The first devices were physically connected through a wire. The voice was transported through modulation of electric signals on this wire. The first telephone exchange started in 1878 in New Haven[29]. The central office had a very simple switchboard and the connections had to be set manually by an operator. In central offices with manual switching, the operator asked the caller for the destination of the call and connected the line of caller and callee. Switching the connections manually reached its limit soon as the number of participants grew. This led to the development of automated switching systems at the turn of the century.[22] The automated switching systems replaced the operators and had to fulfil the same tasks. The caller signalized call initiation by picking up the phone and dialling the number of the destination. According to the pulses generated by the dialled numbers the electromechanical switches selected, which lines had to be connected to establish the call[46] . This type of negotiation is referenced as in-band signalling, because the signalling for call establishment and the voice are sent over the same wire. Parallel to the analog telephone network telex (teleprinter exchange) systems were devel- oped. With this technology written messages could be transported over wire lines. The tele- phone network and the telex network coexisted and in Germany e.g. end users had to have two connections, one for telephone and one for telex. The further evolution of the telephone network proceeded from electromechanical switching systems to digital electronic switching 9 systems in the late 1970s[9]. The transition from analog to digital techniques in telephony led to the development of ISDN (Integrated Service Digital Network ) a telephone network system which upgraded the existing analog system. End to end digital transmission could be realized and voice and data services could be transmitted over the same network. Neverthe- less the Public Switched Telephony Network (PSTN) remained a circuit switched network as far as the communication channels are concerned. A fixed bandwidth channel was reserved between the communication partners, as if they were physically connected through a wire [9]. As the Internet technology arose telephony made the step from the circuit switched to the packet switched communication paradigm and this led to the development of Voice over IP.

1.2 Voice over IP

Voice over IP is a generic term for multimedia services, that perform signalling and media transport over the Internet Protocol[31]. Multimedia sessions are communication sessions like Internet Telephony, conferences and similar applications, where different media like au- dio, video, text messages or data is transmitted. A multimedia session via the Internet or other IP-based networks (an IP-based communication) can only be achieved with the trans- mission of IP-packets via the Internet Protocol. The main challenge in that scenario is, that the Internet Protocol works connectionless, whereas telephony is connection oriented per definition. This means, that in order to enable two or more participants to communicate with each other, a session has to be established, then media has to be exchanged and in the end the session has to be terminated. It is clear, that this can only be achieved with the aid of other protocols for media transport and session handling. A complete (vertical) communication stack covers all layers of the Open Systems Interconnec- tion Basic Reference Model (OSI Reference Model). Typically, these architectures will include protocols such as the Real-time Transport Protocol (RTP) (RFC 1889, 3550)[42][43], User Datagram Protocol (UDP) (RFC 768)[30], Internet Protocol (IP) (RFC 791)[31] and at least one layer 2 and layer 1 protocol. As far as call signalling and bearer control is concerned addi- tional protocols are needed. In our scenario Session Initiation Protocol (SIP) (RFC 3261)[39] and the Session Description Protocol (SDP) (RFC 2327, 4566)[20] for describing multimedia sessions are included into the communication stack. The orchestration of all the protocols above, (which will be discussed in detail later) is called SIP-Protocol-Stack as displayed in figure 1.1 on page 10. As an analog the figure implicates the usage of applications on basis of Hyper Text Transfer Protocol (HTTP) (RFC 2616)[17].

1.3 User Datagram Protocol

The User Datagram Protocol (RFC 768) [30] is a simple connectionless working transport protocol on top of the Internet Protocol. As a transport protocol it can be assigned to the Transport Layer of the OSI Reference Model. UDP datagrams are transported as fast as possi- 10

Figure 1.1: SIP-Protocol-Stack

ble without guarantee of delivery or delivery in correct order [47]. Therefore it is especially useful for realtime communication. In the scenario of Telephony e.g. dropped packets are preferable to delayed packets. Looking back at figure 1.1 on page 10 we can see, that RTP is set on top of UDP, this means, that media transport is fulfilled by RTP via UDP. We can also see, that SIP can be used with UDP or alternatively with TCP, but UDP in fact is the better choice, because SIP already provides techniques for retransmission and sequence control, so even call-signalling and bearer control messages are sent with SIP via UDP[47]. Main tasks of UDP are the partitioning of data into datagrams, checksumming of header and payload and session multiplexing. In order to fulfil session multiplexing port numbers are used. We differ three types of ports: Well Known Ports (ports that are fixed to protocols of higher layers e.g. Port 53 corresponds to Domain Name Service (DNS), Registered Ports (ports that can be registered by companies) and Dynamic Ports (ports that are not bound to a special protocol and can be used dynamically). Well Known Ports are only valid at server side, this means e.g., that a DNS server listens on the Well Known Port 53 (UDP), so if a client wants to send a request to a DNS Server, the client sends his request to UDP Port 53 of the server. In order to receive the response to his request, the client sends a dynamic bound port number within the request, so that the server sends his response to the dynamic bound port of the client. This makes it possible for a client to handle several parallel connections to the same server [47]. In order to guarantee for the server, that he can distinguish between different clients, the IP is used as a differentiating factor [47]. In figure 1.2 we can see how an UDP Datagram is built. The datagram contains 4 header elements: 11

Figure 1.2: UDP Datagram

• Source Port: The first and second octet are reserved for the source port of the sending process. Replies will be sent to this port in the absence of any other information.

• Destination Port: Octet 3 and 4 are reserved for the destination port of the targeted machine.

• Length: Octet 5 and 6 are reserved for the length of the whole UDP Datagram including the headers. The length is computed in numbers of octets.

• Checksum: Octet 7 and 8 are reserved for a calculated checksum. The checksum value is computed from a pseudo header, that includes the whole UDP Datagram and a part of the IP Header.

1.4 Real-time Transport Protocol

The Real-time Transport Protocol (RFC 3550) [43] is a connectionless working transport protocol. As a transport protocol it can be assigned to the Transport Layer of the OSI Ref- erence Model. Since it typically uses UDP and is tightly linked to the application, it is often assigned to the Application Layer of the OSI Reference Model. RTP provides end-to-end de- livery services for data with real-time characteristics, such as interactive audio and video and is therefore predestined for media transport in VoIP scenarios. Those services include pay- load type identification, sequence numbering, timestamping and delivery monitoring [43]. Nevertheless RTP does not provide any mechanisms, that guarantee in order delivery or any other quality aspect. RTP just helps the receiver to detect in which order the datagrams were initially sent, so that the receiving application can put them back in correct order. With RTP it is possible to transfer data between one sender and one receiver (unicast) as well as between one sender and several receivers (multicast). Therefore it is simple to es- tablish conferences (audio/video) with RTP. For every direction of transfer a so called RTP session is established, that is characterized by an identifier, that is called Synchronization Source (SSRC) and a UDP Port[47]. RTP does not use a special Well Known Port, but only a Dynamic Port of even number. In figure 1.3 we can see, that an RTP Datagram contains the following header information: 12

Figure 1.3: RTP Datagram

• Version (V): The first 2 bits contain information about the used RTP version. The correct value for RFC 3550 RTP is 2 (decimal).

• Padding (P): This one-bit value shows, if the payload is followed by padding bytes.

• Extension (X): The extension bit indicates, if the RTP header is followed by an optional extension header.

• CSRC Count (CC): This 4 bit value contains the number of Contributing Sources that follow in the CSRC Identifier header (0...15).

• Marker (M): The interpretation of the marker is defined by a profile. It is intended to allow significant events such as frame boundaries to be marked in the packet stream [43].

• Payload Type (PT): This field indicates of which type the transported payload data is. It is necessary for the receiver to know of which type the payload is in order to decode it in the right way. Some formats are predefined in RFC 3551 [41], e.g. the Payload Type 8 corresponds to ’PCMA: A-law’ coded voice with 64 kbit/s.

• Sequence Number: This field contains a randomly generated number at the beginning of an RTP session and is incremented by 1 with every sent packet. It is used for the detection of packet loss or packet delivery in false order.

• Timestamp: This header reflects the sampling instant of the first octet in the RTP data packet.

• Synchronization Source (SSRC) Identifier: This header contains an identifier, that is randomly generated at the beginning of an RTP session.

• Contributing Source (CSRC) Identifier: This header field is optional and usually empty (in unicast scenario). In case of a multicast transaction the CSRC field contains informa- 13

tion about the participating entities, while the SSRC header contains only information about the RTP Mixer.

1.4.1 RTP Control Protocol

The RTP Control Protocol (RFC 3550)[43] is a protocol, that completes RTP with Quality of Service information. As QoS aspects are not relevant in our scenario, RTCP will not be discussed.

1.5 Session Initiation Protocol

SIP is a standardized signalling protocol, that bases on the Standard Request for Comments (RFC) 3261[39] developed by the Internet Engineering Task Force (IETF) and replaces the predecessor RFC 2543[40]. It is an application layer protocol and is used for the estab- lishment, the termination, the management and coordination of multimedia sessions over the Internet or other IP-based networks[39]. It establishes a connection between two or more participated User Agents (UA). Text based messages are exchanged between clients and servers in order to achieve the establishment of connections.

1.5.1 SIP Transport

It is possible to transport SIP Messages via UDP or TCP.In most implementations the transport via UDP is preferred, as SIP itself provides handshake-, replay- and timeout functions in order to keep communication up. For that reason it is possible to reduce time and overhead by using the stateless UDP as transport protocol instead of TCP[47].

1.5.2 SIP Messages

As SIP is a text based protocol, session establishment and negotiation of session constraints is established via sending of so called SIP Messages. The signalling information is exchanged according to the client server principle. In that scenario two types of SIP-Messages are distin- guished: SIP Requests and SIP Responses. Both types of messages consist of a start line, one or more header fields, an empty line indicating the end of the header fields, and an optional message body. The difference is, that a request starts with a request line as start line, while a response starts with a status line as start line. 14

1.5.3 Client/Server

Requests are sent from a client to a server. Responses in contradiction are generated from a server and sent to a client. A communication endpoint can act as a User Agent Client (UAC) or as a User Agent Server (UAS)[47]. In other words every UA must be able to generate requests and responses. So you can see, that the terms User Agent Client and User Agent Server do not refer to network elements. They define the role in that an endpoint acts in the communication.

1.5.4 SIP URIs

A SIP URI (Uniform Resource Identifier) describes the contact address of a SIP endpoint. The syntax of a SIP URI corresponds to the following scheme: sip:User@Host. The user part of the SIP URI is built of an individual user name and the host part of the URI is an IP-address or a domain name[39]. We can distinguish two types of SIP-URIs: temporary SIP URIs and permanent SIP URIs. The temporary SIP-URI corresponds to the address, where the SIP endpoint can be reached directly. Therefore the host part of the temporary URI is dependent on the network where the endpoint resides, so the temporary URI can be something like sip:[email protected]. A permanent URI in contradiction is independent from the network where the endpoint resides and is usually generated by a SIP provider. When a user registers with a SIP Registrar, a permanent URI is generated like e.g. sip:[email protected]. The relation between permanent and temporary SIP URI is usually stored in a Location Server, so if e.g. a SIP Proxy needs to know the address where an endpoint can be reached directly, it gets the information from the Location Server and can then send SIP Messages directly to the endpoint[47].

1.5.5 SIP Requests

SIP Requests are SIP Messages, that introduce the transactions, that are necessary for a com- munication and are characterized with special methods. The following lists gives an overview over the main methods defined in RFC 3261 [39]:

• INVITE: The ’INVITE’ method initiates the establishment of a SIP session between two communication endpoints. This method contains (in combination with the SDP body) information about session parameters, like e.g. preferred codec. Sending an ’INVITE’ request initiates the process that leads to session establishment via sending and receiv- ing of other SIP Messages. Sending an ’INVITE’ request during an already established session, is a common technique for changing session parameters within a communica- tion. 15

• BYE: Sending a ’BYE’ request terminates an existing session.

• OPTIONS: With an ’OPTIONS’ request it is possible to ask for an endpoint’s abilities without establishing a session.

• CANCEL: The ’CANCEL’method can be used for cancelling any SIP transaction while the transaction is being established.

• ACK: The ’ACK’ (Acknowledgement) method in fact isn’t really a request, because it is used for confirming the receipt of a final status information, that has answered an initial INVITE. It is the only request that is never answered.

• REGISTER: The ’REGISTER’ method is used by a SIP UA for registering itself with a SIP Registrar.

In order to complete the methods, that are supported by SIP, the following list shows the extended methods, that are not part of RFC 3261:

• SUBSCRIBE: The ’SUBSCRIBE’ method is described in RFC 3265[35] and is used to request current state and state updates from a remote node [35]. A subscription can be used e.g for presence functions (determine online status of users).

• NOTIFY: Even the ’NOTIFY’ method is described in RFC 3265[35] and is the logical an- swer to a ’SUBSCRIBE’ or ’REFER’ request and contains the current state of the requested remote node.

• REFER: The ’REFER’ method is described in RFC 3515[44] and indicates, that the re- cipient (identified by the Request-URI) should contact a third party using the contact information provided in the request (Third Party Call Control, 3PCC) . [44].

• MESSAGE: The ’MESSAGE’ method is described in RFC 3428[10] and can be used for sending a short text message to the communication partner. The main purpose is Instant Messaging (IM) .

• PRACK: The ’PRACK’ method is described in RFC 3262[38] and is the short form for Pro- visional Response Acknowledgement. It is used as an answer to Provisional Responses.

• UPDATE: The ’UPDATE’ method is described in RFC 3311[36] and is used for changing session parameters, while the session initiation has not yet been finished.

• INFO: The ’INFO’ method is described in RFC 2976[12] and is used for communicating mid-session signalling information along the signalling path for the call. The ’INFO’ request is not used in order to change the state of SIP calls, nor does it change the state of sessions initiated by SIP.Rather, it provides additional optional information which can further enhance the application using SIP [12]. One of the potential uses of the ’INFO’ request is carrying mid-call PSTN signalling messages between PSTN gateways. 16

• PUBLISH: The ’PUBLISH’ request, that is described in RFC 3903[28], can be used for publishing status changes of remote nodes without an initial subscription.

1.5.6 SIP Responses

SIP responses are the answer to SIP requests, which means, that the response contains the information, that was requested and acknowledges the receipt of a request. In contradiction to SIP requests, SIP responses are not characterized with a method, but with a three digit status code. In addition to the status code SIP responses contain a standard reason phrase, that displays the information in words[47]. SIP responses are categorized in six different types, which are distinguished by the first digit of the status code. The following listings contain an overview of status codes, that can be used within SIP responses.

1xx status codes (provisional responses): This type of responses are sent as answers to requests, that are initiated, but not yet finshed[39].

Status code Reason phrase 100 Trying 180 Ringing 181 Call is being forwarded 182 Queued 183 Session progress Table 1.1: 1xx status codes

2xx status codes (successful): This type of responses are sent as answers to requests, that are received and handled successfully[39].

Status code Reason phrase 200 OK 202 Accepted Table 1.2: 2xx status codes 17

3xx status codes (redirection): This type of responses are sent as answers to requests, that could not be fulfilled completely. The status information may contain additional information about the user’s location[39].

Status code Reason phrase 300 Multiple choices 301 Moved permanently 302 Moved temporarily 305 Use proxy 380 Alternative service

Table 1.3: 3xx status codes

4xx status codes (request failure): If a request could not be fulfilled by a UAS because of the content of the request, 4xx responses are used as answers[39].

Status code Reason phrase Status code Reason phrase 400 Bad Request 401 Unauthorized 402 Payment required 403 Forbidden 404 Not found 405 Method not allowed 406 Not acceptable 407 Proxy Authentication required 408 Request timeout 410 Gone 413 Request Entity too large 414 Request URI too long 415 Unsupported Media Type 416 Unsupported URI Scheme 420 Bad Extension 421 Extension required 423 Interval too brief 480 Temporarily unavailable 481 Call/Transaction does not exist 482 Loop detected 483 Too many Hops 484 Address incomplete 485 Ambiguous 486 Busy here 487 Request terminated 488 Not acceptable here 489 Bad Event 491 Request pending 493 Undecipherable Table 1.4: 4xx status codes 18

5xx status codes (Server Failure): This type of responses are sent as answers to requests, that could not be fulfilled successfully, because of internal server failure[39].

Status code Reason phrase 500 Server internal error 501 Not implemented 502 Bad Gateway 503 Service unavailable 504 Server Timeout 505 Version not supported 513 Message too large Table 1.5: 5xx status codes

6xx status codes (Global Failure): If the contacted UAS has knowledge, that the request cannot be fulfilled at any server a 6xx response is generated[39].

Status code Reason phrase 600 Busy everywhere 603 Decline 604 Does not exist anywhere 606 Not acceptable

Table 1.6: 6xx status codes 19

1.5.7 SIP session establishment

The typical SIP session establishment is fulfilled in a three way handshake manner.

Figure 1.4: SIP Three Way Handshake

As you can see in figure 1.4 User Agent A initiates the session establishment, by sending an ’INVITE’ request to User Agent B. The ’INVITE’ request is the first component of the three way handshake. User Agent B reacts and sends the provisional response ’100 Trying’ back to User Agent A, followed by the provisional response ’180 Ringing’, which indicates, that the phone of user B rings. As ’100 Trying’ and ’180 Ringing’ are both provisional (optional) responses, they are not considered to be part of the three way handshake[47]. As soon as user B picks up the phone, response ’200 OK’ is generated by User Agent B and sent to User Agent A. User Agent A answers with the sending of an ’ACK’, which indicates that he is still willing to communicate. As the messages ’200 OK’ and ’ACK’ are second and third element of the three way handshake, and all session parameters are exchanged, the session is established. In our example User Agent B terminates the session with a ’BYE’ request, which is answered by User Agent A with a ’200 OK’ response.

1.5.8 SIP transactions/ dialogs

We can distinguish two main types of communication relations between SIP entities: transac- tions and dialogs. A SIP transaction is a sequence of SIP messages, that is sent between SIP entities and includes one SIP request and all responses to that request. The initiator of a SIP 20 transaction sends a SIP request. The targeted UA reacts and sends one or more responses. The transaction is usually terminated with a final response [47]. SIP network elements group requests and responses with special transaction identifiers to a SIP transaction. The differentiators are the ’branch’ parameter in the ’Via’ header field and value of the ’CSeq’ header field (header fields and their meanings will be discussed later). The SIP session establishment as displayed in figure 1.4 is a special case regarding SIP trans- actions. As we can see in figure 1.5 the ’ACK’ response is not part of the transaction, that was initiated by the ’INVITE’ request, if the ’INVITE’ was answered by a ’2xx’ response. In other

Figure 1.5: SIP transactions words if a transaction is initiated with an ’INVITE’ request, and the response is a ’200 OK’, the ’200 OK’ is considered to be a final response and the transaction is terminated. The ’ACK’ that follows mandatory, builds a transaction on its own. In contradiction to this special case the ’ACK’ response is considered to be part of the transaction, that is initiated by an ’INVITE’, if session establishment is not successful and is e.g. answered with a ’486 Busy’ response as we can see on the right side of figure 1.5. A SIP dialog represents a peer-to-peer SIP relationship between two User Agents, that per- sists for some time. The dialog facilitates sequencing of messages between the User Agents and proper routing of requests between both of them [39]. This means, that a SIP dialog is a connection oriented communication state between two UAs and, that a dialog is initiated with one or more transactions and can be manipulated and terminated with SIP transactions. SIP messages are grouped to dialogs with special dialog identifiers. The differentiators are the value of the ’Call-ID’ header, the value of the ’tag’ parameter in the ’From’ header and the value of the ’tag’ parameter in the ’To’ header. 21

1.5.9 SIP Message layout

In general SIP requests and SIP responses have the same structure. Both of them consist of the start line, the message header and the message body as we can see in figure 1.6. The start

Figure 1.6: SIP message structure line of a message is either a request line (in case of a SIP request) or a status line (in case of a SIP response). As examples we will take a look at the start line of the ’INVITE’ request in comparison to the start line of a ’200 OK’ response.

Method Request URI SIP version crlf INVITE sip:[email protected] SIP/2.0 crlf

Table 1.7: Request line of an INVITE

SIP version Status Code Reason Phrase crlf SIP/2.0 200 OK crlf

Table 1.8: Status line of a 200 OK

The start lines are constructed as you can see in table 1.7 and table 1.8, which means, that the entries of the start line are separated with a single space and the start line is ended with a carriage return line feed (crlf). As an example we will take a deeper look at the start line of the ’INVITE’ request. The first entry of the request line is the method (in our case ’INVITE’), which indicates the purpose of the request. The second entry of the request line is the Request URI (in our case [email protected]). The Request URI is the URI, where the targeted SIP entity is available and may be a SIP URI, a domain name or an IP address[47]. The last entry of the status line is the SIP-Version, which is in our case SIP/2.0 (the correct version for 22

RFC 3261 SIP). The SIP message header is constructed with the following syntax:

field-name: field-value *(;parameter-name=parameter-value)

Table 1.9: SIP header syntax

Note, that a parameter is optional and that semicolon is only used, if a parameter is appended to the header. Table 1.10 gives an overview of the basic header fields, that are defined in RFC 3261.

Accept Content-Encoding Min-Expires Route Accept-Encoding Content-Language MIME-Version Server Accept-Language Content-Length Organization Subject Alert-Info Content-Type Priority Supported Allow CSeq Proxy-Authenticate Timestamp Authentication-Info Date Proxy-Authorization To Authorization Error-Info Proxy-Require Unsupported Call-ID Expires Record-Route User-Agent Call-Info From Reply-To Via Contact In-Reply-To Require Warning Content-Disposition Max-Forwards Retry-After WWW-Authenticate

Table 1.10: SIP header fields

As an example and, because it is the most important Request in our scenario, we will take a deeper look at the ’INVITE’ request in figure 1.11 and its header fields. As we already dis- cussed the start line, we will just pass over to the first line of the message header. The ’Via’ Header consists of the SIP version number, the used transport protocol (e.g.UDP) the socket (IP:Port) or domain name of the targeted UA and the additional ’branch’ param- eter. The ’branch’ parameter consists of the magic cookie (z9hG4bK) followed by a random number. The ’branch’ parameter serves as transaction identifier. Every SIP network entity, that is passed by the SIP message (on its way to the destination) adds an own ’Via’ header above the last one. With this technique it is guaranteed, that all messages of a transaction pass the same way[39]. The ’From’ Header includes information about the initiator of a SIP message (SIP-URI). The SIP URI can be preceded by an optional name (in our example ’Person-A’) and the ’From’ header is followed by the tag parameter. The ’tag’ parameter is a randomly generated num- ber and serves as dialog identifier.

The ’To’ header consists of the SIP-URI of the targeted UA. In case a UA responds a request, he takes over the ’To’ header and appends the ’tag’ parameter as dialog identifier. The ’Call-ID’ header is a random number (often followed by the host name of the initiator) and serves as dialog identifier. All messages of the same dialog contain the same ’Call-ID’. The Command Sequence header ’CSeq’ consists of a random number followed by the method 23

Start-Line INVITE sip:[email protected] SIP/2.0 Message header Via: SIP/2.0/UDP 192.168.0.5:5060;branch=z9hG4bK-2468 From: "Person-A" ;tag=4532 To: "Person-B" Call-ID: [email protected] CSeq: 1 INVITE Max-Forwards: 70 Contact: Content-Type: application/sdp Content-Length: 210 Blank line Message Body v=0 (e.g. SDP) o=Person-A 2345 0 IN IP4 192.168.0.5 s=Test c=IN IP4 192.168.0.5 t=0 0 m=audio 2410 RTP/AVP 0 8 3 4 a=rtpmap:0 PCMU/8000 a=rtpmap:8 PCMA/8000 a=rtpmap:3 GSM/8000 a=rtpmap:4 G723/8000 Table 1.11: INVITE message name of the initial request and serves as transaction identifier. Each message of a transaction increments the random number by one. The ’Max-Forwards’ header defines how many hops a SIP message may pass during its way to the destination. Every SIP network entity reduces the value by one, when the value reaches zero the message will be discarded. With this technique it is possible to avoid in- finite loops[47]. The ’Contact’ header contains the address where the initiating UA can be reached (his tem- porary SIP URI). The ’Content-Type’ header informs, what kind of information is presented in the message body (in our case application/sdp). The ’Content-Length’ header contains the length of the message body in byte. In case of a SIP message without message body, the value is set to zero. The message body (SDP) will be discussed later. The following tables, will now give an overview over all existing message header fields and their relation to messages and proxies as presented in [39]. The ’where’ col- umn describes the request and response types, in which the header field can be used. Values in this column are:

• R: header field may only appear in requests.

• r: header field may only appear in responses. 24

• 2xx, 4xx, etc.: A numerical value or range indicates response codes with which the header field can be used.

• c: header field is copied from the request to the response.

An empty entry in the ’where’ column indicates, that the header field may be present in all requests and responses. The ’proxy’ column describes the operations a Proxy may perform on a header field:

• a: A proxy can add or concatenate the header field if not present.

• m: A proxy can modify an existing header field value.

• d: A proxy can delete a header field value.

• r: A proxy must be able to read the header field, and thus this header field cannot be encrypted.

The next six columns relate to the presence of a header field in a method:

• c: Conditional: requirements on the header field depend on the context of the message.

• m: The header field is mandatory.

• m*: The header field should be sent, but clients/servers need to be prepared to receive messages without that header field.

• o: The header field is optional.

• t: The header field should be sent, but clients/servers need to be prepared to receive messages without that header field. If a stream-based protocol (such as TCP) is used as a transport, then the header field must be sent.

• *: The header field is required if the message body is not empty.

• -: The header field is not applicable.

Header field where Proxy ACK BYE CANCEL INV OPT REG Accept R - o - o m* o Accept 2xx - - - o m* o Accept 415 - c - c c c Accept-Encoding R - o - o o o Accept-Encoding 2xx - - - o m* o Accept-Encoding 415 - c - c c c Accept-Language R - o - o o o Accept-Language 2xx - - - o m* o Accept-Language 415 - c - c c c 25

Alert-Info R ar - - - o - - Alert-Info 180 ar - - - o - - Allow R - o - o o o Allow 2xx - o - m* m* o Allow r - o - o o o Allow 405 - m - m m m Authentication- 2xx - o - o o o Info Authorization R o o o o o o Call-ID c r m m m m m m Call-Info ar - - - o o o Contact R o - - m o o Contact 1xx - - - o - - Contact 2xx - - - m o o Contact 3xx d - o - o o o Contact 485 - o - o o o Content- o o - o o o Disposition Content- o o - o o o Encoding Content- o o - o o o Language Content-Length ar t t t t t t Content-Type * * - * * * CSeq c r m m m m m m Date a o o o o o o Error-Info 300- a - o o o o o 699 Expires - - - o - o From c r m m m m m m In-Reply-To R - - - o - - Max-Forwards R amr m m m m m m Min-Expires 423 - - - - - m MIME-Version o o - o o o Organization ar - - - o o o Priority R ar - - - o - - Proxy- 407 ar - m - m m m Authenticate Proxy- 401 ar - o o o o o Authenticate Proxy- R dr o o - o o o Authorization Proxy-Require R ar - o - o o o Record-Route R ar o o o o o - Record-Route 2xx,18x mr - o o o o - Reply-To - - - o - - Require ar - c - c c c 26

Retry-After 404, - o o o o o 413, 480, 486 500, - o o o o o 503 600, - o o o o o 603 Route R adr c c c c c c Server r - o o o o o Subject R - - - o - - Supported R - o o m* o o Supported 2xx - o o m* m* o Timestamp o o o o o o To c1 r m m m m m m Unsupported 420 - m - m m m User-Agent o o o o o o Via R amr m m m m m m Via rc dr m m m m m m Warning r - o o o o o WWW- 401 ar - m - m m m Authenticate WWW- 407 ar - o - o o o Authenticate

Table 1.12: Message headers and their appearance

1.5.10 Session Description Protocol

The Session Description Protocol (SDP) is defined in RFC 4566[20] and is used in SIP mes- sage bodies for the purpose of media description. SDP was not developed for SIP but is predestined for the description of media parameters, that need to be exchanged and nego- tiated between SIP UAs[47]. In order to stick with our example we take a look back at the message body in figure 1.11. The ’v’ (Protocol Version) parameter stands for the used SDP version (0 for RFC 4566 SDP). The ’o’ (Origin) parameter contains information about the UA that initiated the media ses- sion. The first part of the o parameter is the name of the initiating UA followed by a random number, that identifies the session. The ’0’ which follows the random number indicates the session version and is incremented every time the session is changed. The value ’IN’ indi- cates, that the initiating UA is addressable via an IP based network address and is followed by the IP4 qualifier that describes the type of address (IPv4) and is appended by the network address itself. The ’s’ (Session Name) parameter is a simple name, that can be used to describe the session. The ’c’ (Connection Data) parameter describes the address, where the initiating UA accepts the receipt of media data and is composed analogue to the ’origin’ parameter. 27

The ’t’ (Timing) parameter describes the start and end time of a media session. As this pa- rameter is not useful for phone calls it is set to ’0 0’. The ’m’ (Media Descriptions) parameter contains information about the elements, that are part of the media session. The first part of the ’m’ parameter is the media type (in our case audio) followed by the port, that should be used[47]. The next part describes the transport protocol (in our case ’RTP/AVP’ (Real-time Transport Protocol/Audio Video Profile)). The last component of the ’m’ parameter is the codec list. The numbers (in our case ’0 8 3 4’) represent the codecs, that are supported by the initiating UA in the preferred order.

Payload Type Encoding name Media type Clock rate Channels 0 PCMU A 8000 1 1 reserved A 2 reserved A 3 GSM A 8000 1 4 G723 A 8000 1 5 DVI4 A 8000 1 6 DVI4 A 16000 1 7 LPC A 8000 1 8 PCMA A 8000 1

Table 1.13: rtpmap parameters for RTP/AVP

The last entries in the message body are the ’a’ (Attributes) parameters and serve as detailed description for the codec list of the ’m’ parameter. In Figure 1.13 we can see an excerpt of the ’rtpmap’ table of the ’RTP/AVP’, which shows the codec mapping of the most used codecs.

1.5.11 User Agent

The term User Agent has been used several times as a term for SIP communication endpoints. In fact a SIP UA is either a SIP software telephone (softphone) or a hardware telephone (hardphone)[47]. As we will see later in the chapters, that discuss SPIT in detail, the hard- phones are more interesting than the softphones in the SPIT scenario. The most common hardphone in Germany is a combination of a DSL-Modem, (W)LAN-Router and a VoIP/PSTN Private Branch Exchange (e.g. AVM Fritz!Box Fon) with an analogue telephone attached as VoIP UA. As we can see in figure 4.5 the analogue phone is attached to the ’Phone 1’ port of the Fritz!Box and the Fritz!Box acts as a VoIP/PSTN PBX. If a VoIP call is initiated, the integrated PBX initiates the SIP session.

1.5.12 Registrar

A Registrar Server is a SIP network entity, which enables location and device independent availability. This means, that a SIP UA sends a request to a SIP Registrar Server in order to 28

Figure 1.7: An analogue telephone attached to a Fritz!Box Fon as SIP UA generate a binding between the temporary SIP URI of a UA (e.g. sip:[email protected]) and the permanent SIP URI that is given by the provider (e.g. sip:[email protected]). The User Agent includes these information into a ’REGISTER’ request.

Start-Line REGISTER sip:example.com SIP/2.0 Message header Via: SIP/2.0/UDP 77.77.77.77:5060;branch=z9hG4bK-2468 From: "Someone" ;tag=4532 To: "Someone" Call-ID: [email protected] CSeq: 1 REGISTER Max-Forwards: 70 Expires: 1800 Contact: Content-Length: 0 Blank line Table 1.14: REGISTER message

We can see in table 1.14, that the ’REGISTER’ request contains several special cases. The first one is, that the Request URI in the request line does not contain a host part, as it is just the domain name of the targeted Registrar Server (sip:example.com). The permanent SIP URI, that the UA wants to register is found in the ’To’ and in the ’From’ header, while the temporary SIP URI is found in the ’Contact’ header. With this information it is possible for the Registrar Server to establish a binding between the temporary and the permanent SIP URI, that he sends to the so called Location Server. The Location Server is not considered to be a SIP network entity because it doesn’t process SIP messages. The Location Server 29

Figure 1.8: Registration process of a SIP UA usually is a database, that administrates the bindings and is accessible via server protocols (e.g. LDAP)[47] as we can see in figure 1.8.

1.5.13 Proxy Server

A SIP Proxy Server is a network entity, that is responsible for routing of SIP messages to their destinations. We generally differ two types of Proxy Server, the so called Stateless Proxies and the Stateful Proxies. A Stateless proxy forwards every request it receives downstream and every response it re- ceives upstream [39]. A Stateless Proxy is not able to generate SIP messages, nor to resend them and does not save any information about SIP messages, states or transactions. A Stateful Proxy in contradiction saves information about SIP messages and acts as a UAC and as well as a UAS. This means, that a Stateful Proxy doesn’t just forward requests and responses, but also generates SIP messages [47]. In figure 1.9 we can see how a Stateless Proxy and a Stateless Proxy behave, when receiving an ’INVITE’ request from User Agent A to User Agent B. The Stateless Proxy simply forwards any SIP message from User A to User B and vice versa. The Stateful Proxy in contradiction generates a ’100 Trying’ response directly after receiving the first ’INVITE’ request, then he forwards the ’INVITE’. The Stateful Proxy does not forward the ’100 Trying’ response, that User Agent B generates, because the Stateful Proxy already sent a ’100 Trying’ response. The messages, that follow after the ’100 Trying’ are simply forwarded by the Stateful Proxy. 30

Figure 1.9: Stateless vs. Stateful Proxy

1.5.14 SIP security mechanisms

In the following sections we will discuss the basics of SIP security mechanisms as displayed in figure 1.10. The focus will be set on security of signalling messages (SIP), as media security (e.g. security of transported voice) is out of scope for the SPIT scenario. The fundamental security services, that are required for the SIP protocol are described in RFC 3261[39] as the following:

• preserving the confidentiality and integrity of messaging,

• preventing replay attacks

• preventing message spoofing

• providing for the authentication and privacy of the participants in a session

• preventing denial-of-service attacks

• providing confidentiality, integrity, and authentication of message bodies

As described in RFC 3261[39] full encryption of messages is viewed as the best technique to preserve the confidentiality of signaling and guarantees, that messages are not modified by any malicious intermediaries. Due to the concept of SIP, requests and responses cannot be naively encrypted end-to-end in their entirety, because message fields such as the Request- URI, ’Route’ and ’Via’ headers need to be visible to proxies in most network architectures, so that SIP requests are routed correctly. Some proxy servers even need to modify some components of a messages as well (such as adding ’Via’ header field values). As a result, lower-layer security mechanisms for SIP are recommended, which encrypt the entire SIP requests or responses on a hop-by-hop basis[39]. In figure 1.10 we can see an overview of the security mechanisms, that we will discuss in the following sections. 31

Figure 1.10: SIP security mechanisms

1.5.14.1 SIP Digest Authentication

SIP provides an Authentication method based on the authentication in HTTP (RFC 2617 [19]). With this stateless, challenge based authentication method it is possible for a Proxy Server, a Registrar or any UA to assure the identity of the initiator of an incoming request [39]. Since the release of RFC 3261 the only allowed authentication method derived from HTTP is the so called Digest authentication scheme. The predecessor RFC 2543 adapted both, the Basic and the Digest scheme of HTTP but the Basic scheme was dropped in RFC 3261 due to its weakness, therefore we will not discuss it. When a UA sends e.g. a ’REGISTER’ request to a SIP Registrar or an ’INVITE’ to a SIP Proxy, the Registrar/Proxy neglects the Request with a request failure response (in case of a Proxy with ’407 Proxy Authentication required’ response and in case of a Registrar with the ’401 Unauthorized’ response). The request failure response contains a challenge in an additional header field (in case of a Proxy in the ’Proxy-Authenticate’ header and in case of a Registrar in the ’WWW-Authenticate’ header)[16]. The header fields contain the following parameters:

• Authentication Scheme: The Authentication Scheme is in our case Digest, as we only take a look at the scheme presented in RFC 3261.

• Realm: The realm is the validity scope of the authentication (usually a domain name e.g. example.com).

• Nonce: The Nonce value is a decimal or base64[24] coded value, which is generated by the UA, that challenges the Authentication (e.g. Registrar or SIP Proxy).

The Authentication header fields may have more parameters e.g. information about which encryption algorithm to use. If no algorithm is specified, MD5[33] is assumed as encryption algorithm. The UA, that initiated the request, answers the challenge by calculating a response based on the scheme in figure 1.11. The following components are used for the calculation of the response: 32

Figure 1.11: Calculated response for Digest Authentication

• Username: The username is a name, that is known to the User Agent, that initiates the request and the SIP Server, that triggers the authentication. Usually it is the username part of the permanent SIP URI of the UAC (e.g. if the SIP URI is sip:[email protected], the authentication username could be ’someone’ or ’[email protected]’).

• Realm: The realm is the validity scope of the authentication (usually a domain name e.g. example.com) and is generated by the , that initiated the challenge.

• Password: The password is the preshared secret known by both parties of the Challenge Response authentication. The password has to be generated out of band, which means, that SIP does not provide a mechanism for predefining a password.

• Nonce: The Nonce value is the decimal or base64 coded value, which is generated by the UA, that challenges (e.g. Registrar or SIP Proxy).

• SIP method: The initial SIP request, that triggered the authentication is used for the calculation of the response (e.g. ’INVITE’ or ’REGISTER’).

• Request URI: The Request URI is the URI of the targeted end point. If e.g. the request is a ’REGISTER’ the Request-URI usually is the domain name of the Registrar.

The response is calculated by the client and sent back to the challenging server. The Server computes the response with the same parameters and compares the results. The SIP Digest mechanism (as described above) is a common technique in inter domain sce- narios, where user and provider already exchanged a preshared secret, like the combination of username and password. Nevertheless the SIP Digest mechanism does not guarantee in- tegrity and authenticity of a whole SIP Message, but only of the header fields (’URI’ and ’Method’), that are included into the calculation of the response[39]. 33

1.5.14.2 SIPS (SIP Security)

SIPS or SIP over TLS (, RFC 2246)[11] provides a mechanism for the hop-by-hop transfer of encrypted SIP messages and is adopted from HTTPS (RFC 2818). TLS acts on the Transport Layer above TCP and consists of several parts [16]:

• Handshake Protocol: Negotiation and coordination of communication parameters (encryption- or hash algorithm, symmetric key)

• Record Protocol: Fragmentation and compression of datagrams, hashing and encryp- tion

During a TLS connection a UAC demands a valid certificate from the UAS and this certificate must be provided by a trustful Certificate Authority (CA). It is recommended, that certificates are verified with a root certificate on client side and therefore a PKI (Public Key Infrastruc- ture) is needed.[16] As the whole message is encrypted in TLS scenario, the communication can only be secured hop-by-hop. This means, that a client, that uses SIPS enters a secured communication session wit the first SIP Proxy. The SIP message is transferred encrypted be-

Figure 1.12: SIP over TLS tween them. The Proxy then decrypts the received message and enters a secured session with the next SIP network element. The network elements need to decrypt the messages in order to access header fields such as ’Via’ or ’From’ and ’To’ headers, otherwise the message couldn’t be routed. SIPS therefore guarantees confidentiality and integrity of SIP messages only between two network elements and not on an end to end basis[16]. 34

SIPS URI Scheme

With the SIPS URI Scheme (analog to HTTPS) a UAC can demand a secured transmission throughout the whole signalling path. The SIPS URI is constructed analog to the SIP URI in the following way:

sips:[email protected]

Table 1.15: SIPS URI syntax

If a requested URI is a SIPS URI it is guaranteed, that any network element that handles the request, transfers the data secured. If the request has reached the requested domain, local security and routing policies are guaranteed, but it is not mandatory, that TLS is used[16].

1.5.14.3 S/MIME

Secure/Multipurpose Internet Mail Extensions (S/MIME) was developed as a security exten- sion of the MIME Standard for the encryption and signature of email message bodies and is described in the RFC 3851 (version 3.1)[32]. As SIP messages can contain MIME bodies

Figure 1.13: SIP and S/MIME

(MIME type ’application/sdp’) like email messages and as S/MIME is not bound to email, it can also be used with SIP. S/MIME allows a SIP UA to encrypt the MIME message bodies of SIP requests and responses on an end-to-end basis, without affecting the message head- ers, thus keeping the messages routable, but disabling network entities to manipulate the 35 message body content. So With S/MIME it is possible to provide end-to-end confidential- ity and integrity for the message body. The usage of S/MIME in SIP context is described in RFC 3261[39].The message body is signed with the private key of the sender and encrypted with the public key of the recipient [47]. S/MIME can use different encryption algorithms like RSA[34] or 3DES and different hash algorithms for the signature like MD5[33] or SHA- 1[14]. In order to make encryption and signature possible, the sender must have knowledge about the public key of the recipient. Additionally the recipient must have knowledge about the public key of the sender in order to verify the signature, so private keys and certificates are needed. Although SIP already includes methods for key exchange a PKI is needed as the sender must know the public key of the recipient in order to encrypt the message. [16]. Note, that S/MIME secured message bodies can only be decrypted by the targeted recipient and by no other SIP element (e.g. proxy servers). Some network elements (not typical proxy servers) that rely on viewing or modifying the bodies of SIP messages (especially SDP), are prevented from doing so by the usage of S/MIME [39]. Another possible S/MIME scenario is the so called SIP Tunneling, where the whole SIP message is copied and capsuled within the S/MIME body. With this method it is possible to secure the whole SIP message, as the recipient can decrypt the message body, that contains a copy of the header fields and compare the encrypted header fields with the unencrypted ones and detect manipulations[16].

1.5.14.4 IPSec

IPSec (Internet Protocol Security )is a set of network layer protocol tools described in RFC 4301 [25], that collectively can be used as a secure replacement for IP[39]. IPSec allows confidential and authenticated transport of IP packets[15] and is most commonly used in architectures in which a set of hosts or administrative domains have an existing trust rela- tionship with one another. IPSec is usually implemented at the operating system level in a host, or on a security gateway that provides confidentiality and integrity for all traffic it receives from a particular interface (as in a VPN architecture). IPSec can also be used on a hop-by-hop basis[39]. The security services are provided by two IPSec protocols, the Authen- tication Header Protocol (AH) and the Encapsulating Security Payload Protocol (ESP)[15]. Tasks of AH are guaranteeing the authenticity of the origin of an packet, the integrity of an connectionless sent packet and optionally protection against replays. The ESP enables an confidential conectionless communication and contains simple mechansisms against traffic analysis. AH and ESP can be used in combination or exclusively[15]. In [39] it is stated, that IPSec is perhaps best suited to deployments in which adding security directly to SIP hosts would be difficult. UAs, that have a pre-shared keying relationship with their first-hop proxy server, are also good candidates to use IPSec. Any deployment of IPSec for SIP would require an IPSec profile describing the protocol tools, that would be required to secure SIP,which is not given. 36

2 SPAM over Internet Telephony

In the following chapter we will discuss the problematic of SPAM over Internet Telephony. The first section of this chapter will deal with a general SPIT explanation and classification followed by a scientific SPIT threat analysis.

2.1 SPIT versus SPAM

The first aspect to mention is, that although SPIT contains the phrase ’SPAM’ and has some parallels with email spam, it also has major differences. The similarity of email spam and SPIT is, that in both cases, senders (or callers) use the Internet to target recipients (or callees) or a group of users, in order to place bulk unsolicited calls [13]. The main difference between email spam and SPIT is, that an email arrives at the email server before it is accessed by the user. This means, that structure and content of an email can be analyzed at the server before it arrives at the recipient and so SPAM can be detected before it disturbs the recipient. As in VoIP scenarios delays of call establishment are not wished, session establishment messages are forwarded immediately to the recipients. Besides this fact the content of a VoIP call is exchanged not until the session is already established. In other words if the phone rings it is too late for SPIT prevention and the phone rings immediately after session initiation, while an email can be delayed and even, if it is not delayed, the recipient can decide if he wants to read the email immediately or not. In addition to these aspects another main difference between email spam and SPIT is the fact, that the single email itself contains information, that can be used for spam detection. The header fields contain information about sender, subject and content of the message. A single SPIT call in contradiction is technically indistinguishable from a call in general. A SPIT call is initiated and answered with the same set of SIP messages as any other call.

2.2 Intuitive SPIT definition

SPIT is described very similar in different publications and the descriptions can be summa- rized as ’unwanted’ , ’bulk’ or ’unsolicited’ calls. In [21] e.g. SPIT is defined as ’unsolicited advertising calls’, which is of course already a special form of SPIT (namely advertising calls). In [13] SPIT is defined as ’transmission of bulk unsolicited messages and calls’ which is a more general definition than the first one, as it doesn’t characterize the content and includes also messages. Note that with this definition it is not clear, if the term ’messages’ is used in order to generalize the type of messages that are sent (e.g. ’INVITE’ or ’OPTIONS’ messages) or, if 37 it is used in order to include SPAM that is sent over Instant Messages (SPIM = SPAM over In- stant Messages). Nevertheless the most precise definition is found in [37] where ’Call SPAM’ (as the authors call it) is defined as ’a bulk unsolicited set of session initiation attempts (e.g., ’INVITE’ requests), attempting to establish a voice, video, instant messaging, or other type of communications session’. The authors of [37] go even one step further and classify that ’if the user should answer, the spammer proceeds to relay their message over the real-time media.’ and state that this ’is the classic telemarketer spam, applied to SIP’. We can easily see, that the presented definitions so far are very similar, but differ in their deepness.

2.3 SPIT analysis

The problem with the definitions above is, that they are either to specific or to general. In order to find a more precise definition, we have to take one step back and analyze how SPIT is put into execution. What is the goal of the initiator of SPIT and how does he or she achieve this goal? In practice the initiator of SPIT has the goal to establish a communication session with

Figure 2.1: Three steps of SPIT as much victims as possible in order to transfer a message to any available endpoint. The attacker can fulfil this via three steps. The first step in order to achieve this goal is the systematic gathering of the contact addresses of victims. The second step is the establishment of communication sessions with these victims and the third step is the sending of the message. In the following we will not only discuss, why the process of information gathering is part of the SPIT process, but we will also see, that it is the basis of any SPIT attack.

2.3.1 Information gathering

In order to contact a victim, the attacker must know the SIP URI of the victim. As we already know we can differ two types of SIP URIs: permanent SIP URIs like ’sip:[email protected]’ and temporary SIP URIs like ’sip:[email protected]’. At first we will take a look at infor- mation gathering of permanent SIP URIs. If an attacker wants to reach as many victims as possible he must catalogue valid assigned SIP URIs. The premises for the Scan attack are the possession of at least one valid account and knowledge about the scheme of SIP URIs of the 38 targeted platform (e.g. provider). Let us assume the attacker has a valid SIP account at SIP provider ’example.com’ and he wants to scan the provider’s network, in order to achieve a list of as much valid permanent SIP URIs as possible. Let us also assume, that the provider ’example.com’ distributes SIP URIs that correspond to the following scheme: The user name of the SIP URI is a phone num- ber, that begins with the digits ’555’ followed by 4 more random digits. All phone numbers from ’5550000’ to ’5559999’ are valid user names of this provider. As the attacker has now knowledge about all valid user names, he must find out which of them are already assigned to customers and which of them are still unassigned. The attacker can now step through the whole list of valid SIP URIs and send adequate SIP messages to each URI and receive infor- mation about the status of the tested URI. The simplest way is sending an ’INVITE’ message to each SIP URI and analyze the answer of the SIP Proxy. If the SIP URI is not assigned, the SIP Proxy may answer with a ’404 Not found’ response, if the SIP URI is assigned but the user is not registered at the moment, the SIP Proxy may answer with a ’480 Temporarily unavailable’ response and, if the SIP URI is assigned and the user is registered, the call will be established and answered with a ’200 OK’ response. When the attacker has stepped through

Figure 2.2: Three cases in information gathering the whole list and marked all possible SIP URIs on the basis of the Proxy responses, he has a list of assigned SIP URIs, that can be used for future attacks. Note that it is not necessary, that the scan attack must be fulfilled with an ’INVITE’ message, we just discussed this way as the simplest way, because it already leads to the desired session establishment. The attacker could also use an ’OPTIONS’ request or a ’REGISTER’ request and analyze the reaction of the Proxy. Mainly the implementation of the targeted Proxy decides on which message will grant the desired information. Some Proxies e.g. respond to all ’OPTIONS’ requests with a ’200 OK’ message, even in case of an invalid or unassigned SIP URI. Now we will take a look at gathering of temporary SIP URIs. Temporary SIP URIs consist of the user name part and the host part. The user name part is usually a string or a phone number and the host part is the IP, where the endpoint can be reached directly. If an at- tacker has already generated a list of valid assigned SIP URIs, he now additionally needs the corresponding IP addresses of the SIP URIs. In some Proxy implementations the tempo- rary SIP URIs are published in the ’Contact’ header of the response message to a request. In this case the desired information is achieved in the same way as the permanent SIP URIs. 39

If the proxy does not provide the IP address in the SIP responses, the attacker must use a more complex method to achieve the desired information. Let us assume this time that ’ex- ample.com’ is an Internet Service and VoIP provider. The provider assigns IP addresses of the range 192.0.2.5-192.0.2.155 to his customers and SIP URIs with the same scheme as described above (555XXXX). Let us assume The customers have hardphones (e.g. analog telephone attached to VoIP ready router or Analog Telephony Adapter). With this knowledge the attacker can step through the list of IP addresses and try sending an adequate SIP request (’INVITE’,’OPTIONS’) directly to the endpoint (e.g. to UDP Port 5060) and analyze the re- sponses in the same way as described above. With this method the attacker can populate a list of temporary SIP URIs of the customers of the targeted provider. Note that the temporary SIP URIs are only valid for a short time period (max. 24 hours), as customers are usually forced to disconnect their internet connection after a certain period. Although this procedure is harder to fulfil than the first one, it has the major advantage, that the attacker doesn’t need valid accounts as premiss. Because the Proxy is not involved and SIP messages are sent directly to the victim, the attacker can use any SIP identity he wishes as source address. The client can’t verify the identity, as nearly all existing implementations of clients accept SIP messages from any source. A third way of obtaining information about any kind of valid assigned accounts, can be called Passive Scanning. In this case the attacker doesn’t search actively for users, but collects information about them by being called. An attacker could e.g. set up a website providing any kind of information service and attach an info hotline number to this service, then he can log information about the calling accounts and collect valid assigned accounts, without risking to be detected. As long as the provided service is interesting enough, he will sooner or later collect a large list of targets, that he can use for future attacks. Now, that we have seen how lists of permanent or temporary SIP URIs can be achieved, we will discuss the usage of them.

2.3.2 SPIT session establishment

When the attacker has collected a large number of contact addresses, he can begin session establishment to the victims. Which list he must use (temporary or permanent URIs) depends on the communication infrastructure he wants to use. We can distinguish two possible ways of session establishment: The attacker can establish a session with sending an ’INVITE’ mes- sage via Proxy, which we can call SPIT via Proxy or he can establish a session with sending an ’INVITE’ message directly to the endpoint without involving the Proxy, which we can call Direct IP Spitting. For SPIT via proxy the attacker only needs a list of permanent SIP URIs and for Direct IP Spitting he needs the list of temporary SIP URIs. Again for SPIT via Proxy the attacker needs at least one valid user account and for Direct IP Spitting he doesn’t need a valid account at all. 40

2.3.3 SPIT media sending

The last step of the SPIT process is the media sending after the session has been established. Which type of media is sent, depends on the scenario in which the SPIT attack takes place. The best scenario classification can be found in [21] and defines three types of SPIT scenar- ios:

• Call Centers: In Call centers a computer establishes a call to an entry of the catalog and then dispatches the call to a call center agent who will then talk to the callee.

• Calling Bots: A calling bot steps through the list of gathered information, establishes a session and then sends a prerecorded message.

• Ring tone SPIT: Some VoIP telephones come pre-configured in a way, that they accept a special SIP header information called ’Alert-info’ which may contain an URL pointing to a prerecorded audio file somewhere on the Internet. Obviously, this can be used to play advertising messages before the call has even been accepted by the user just as the phone is ringing. An adaption of this method, could be a SPIT attack where the attacker just wants to let the victims phone ring, in order to disturb the victim. In this special case no media is sent at all and the session is terminated as soon as the phone rings (e.g. when a ’180 Ringing’ is received). Obviously this is the most annoying facet of SPIT.

2.3.4 SPIT summary

As we can see now the SPIT process is very complex and has different aspects which have to be considered in order to develop countermeasures. The general definitions that we discussed in the first section are insufficient as a basis of discussion and do not cover all facets of the problem. In general we can say, that Spitting describes the systematic scanning of a VoIP network with the target of gathering information about available user accounts and the systematic session establishment attempts to as many users as possible in order to transfer any kind of message. 41

3 SPIT countermeasures and their weaknesses

In the following sections we will discuss state of the art SPIT prevention mechanisms in order to point out their advantages and disadvantages. The countermeasures are ordered by type and not by publication. As a matter of fact most publications define a set of countermeasures as a solution to mitigate SPIT. Nevertheless we will discuss every method on it’s own and not the orchestration of different mechanisms. Note that only those techniques are listed, that have crystallized in research.

3.1 Device Fingerprinting

The technique of active and passive device fingerprinting is presented in [50] and is based on the following assumption: Having knowledge about the type of User Agent that initiates a call, helps finding out whether a session initiation attempt can be classified as SPIT or not. The assumption is based on the analogy to e.g. HTTP based worms. As described in [50] these types of worms have different sets of HTTP headers and different response behavior, when compared to typical Web browsers. So if we can compare the header layout and order or the response behavior of a SIP User Agent with a typical User Agent, we can determine if the initiated session establishment is an attack or a normal call. The authors describe two types of techniques that can be used for that purpose ’Passive and Active Device Fingerprinting’.

3.1.1 Passive Fingerprinting

The e.g. ’INVITE’ message of a session initiation is compared with the ’INVITE’ message of a set of ’standard’ SIP clients. If the order or appearance of the header fields does not match any of the standard clients, the call is classified as SPIT. The fingerprint in this case is the appearance and the order of the SIP header fields. The authors of [50] present a list of collected fingerprints of ’INVITE’ messages of standard hard and soft phones:

SIP Component Header Fields in ’INVITE’ Message Cisco Phone Via,Record-Route,Via,From,To,Call-ID,Date,CSeq,User- Agent,Contact,Expires,Content-Type,Content-Length,Accept Pingtel Phone Via,Record-Route,From,To,Call-ID,CSeq,Contact,Content-Type,Content- Length,Accept-Language,Allow,Supported,User-Agent,Date,Via

Table 3.1: Passive fingerprints hardphones 42

SIP Component Header Fields in ’INVITE’ Message Adore Softphone Via,Max-Forwards,From,To,Call-ID,CSeq,Contact,User-Agent,Content- Type,Content-Length Express Talk Via,To,From,Call-ID,CSeq,Max-Forwards,User- Agent,Contact,Allow,Supported,Content-Type,Content-Length eyeBeam Via,Max-Forwards,Contact,To,From,Call-ID,CSeq,Allow,Content- Type,Supported,User-Agent,Content-Length KPhone Via,CSeq,To,Content-Type,From,Call-ID,Subject,Content-Length,User- Agent,Contact LinPhone Via,From,To,Call-ID,CSeq,Max-Forwards,User- Agent,Subject,Expires,Allow,Content-Length Phoner Via,From,To,Call-ID,CSeq,Contact,Max-Forwards,User- Agent,Allow,Content-Type,Content-Length Sipps Via,From,To,Call-ID,CSeq,User-Agent,Expires,Accept,Content- Type,Content-Length,Contact,Max-Forwards,Allow sipXphone From,To,Call-ID,CSeq,Contact,Content-Type,Content-Length,Date,Max- Forwards,User-Agent,Accept-Language,Allow,Supported,Via SJPhone Via,Content-Length,Contact,Call-ID,Content-Type,CSeq,From,Max- Forwards,To WinSip Via,Max-Forwards,From,To,User-Agent,Call- ID,CSeq,Contact,Allow,Accept,Accept-Language,Content-Type,Content- Disposition,Content-Length Yate Max-Forwards,Via,From,To,Call-ID,CSeq,User-Agent,Allow,Content- Type,Content-Length

Table 3.2: Passive fingerprints softphones

3.1.2 Active Fingerprinting

User Agents are probed with ’OPTIONS’ requests and the responses are analyzed and com- pared with the response behavior of standard clients. The fingerprint in this case is the returned response code and the value of the ’Allow’ header field. If the fingerprint doesn’t match any of the standard clients, the call is classified as SPIT. The authors recommend the sending of specially crafted standard compliant and non compliant ’OPTIONS’ requests, in order to analyze the response behavior of a client, but note, that the technique is not bound to ’OPTIONS’ requests. The ’OPTIONS’ requests are manipulated in the following ways:

• Invalid Version: The SIP Version number is set to an invalid value like ’99.99’ instead of the correct one ’2.0’.

• Invalid Via Address: Instead of the valid IP of the remote host, the ’Via’ header is filled with the value ’localhost’. 43

• Incorrect Content-Length: As the ’OPTIONS’ request doesn’t contain a message body, the ’Content-Length’ header should be zero if present. The manipulated ’OPTIONS’ request contains a non-zero value instead.

• Malformed CSeq: Normally the ’CSeq’ header is constructed with a decimal sequence number followed by the keyword ’OPTIONS’, but the manipulated request only contains a sequence number.

• Missing Call-ID: The mandatory ’Call-ID’ header is missing.

• Incompatible Transport Protocol: In a standard SIP message the ’Via’ header contains the used Transport Protocol (UDP, TCP). The manipulated request uses one protocol (e.g. UDP) but claims to use another one in the ’Via’ header instead (e.g. TCP).

If e.g. an attacker sends an ’INVITE’ message to a victim, he is probed with one standard compliant ’OPTIONS’ request and than one after the other with the manipulated requests. The responses are recorded and compared to the database of standard client fingerprints. The fingerprints, that the authors of [50] constructed from the answers of standard SIP clients can be viewed in the list below:

Component RFC- Invalid Incorrect Incorrect MalformedMissing Incorrect Compliant Version Via Content CSeq Call-ID Trans- Length port 3Com SIP 405 405 NR 405 NR NR NR Proxy Cisco Voice 200 400 200 NR NR 400 400 Gateway Cisco Voice 400 400 400 NR NR 400 400 Gateway Cisco SIP NR NR 400 400 NR 400 400 proxy MCI SIP 302 400 NR 302 NR NR 400 Proxy Microappliances403 400 403 403 NR NR 400 SIP Proxy SIP Express 404 NR 404 404 NR 404 NR Router Proxy

Table 3.3: Active fingerprints servers

Component RFC- Invalid Incorrect Incorrect MalformedMissing Incorrect Compliant Version Via Content CSeq Call-ID Trans- Length port Cisco 200 NR 200 400 NR 400 NR Phone (cisco.com) 44

Pingtel 200 505 200 200 NR NR NR Phone (ping- tel.com) Adore 200 481 NR 400 NR 400 NR Softphone (adoresoft- phone.com) Express 200 200 200 200 NR 200 200 Talk (nch.com.au) eyeBeam 200 200 200 200 405 NR 200 (counter- path.com) KPhone 200 200 NR 200 200 200 NR (wirlab.net) LinPhone 200 200 200 200 NR NR NR (lin- phone.org) Phoner 200 200 200 200 NR NR 200 (phoner.de) Sipps 200 200 200 200 400 NR NR (nero.com) sipXphone 200 505 200 200 NR NR 200 (sip- foundry.org) SJPhone 405 NR 405 NR NR NR NR (sjlabs.com) WinSip 200 NR 200 200 NR 481 481 (touchstone- inc.com) Yate 501 NR NR 501 NR 501 501 (yate.null.ro)

Table 3.4: Active fingerprints hard- and softphones

If a client is probed and the response behavior does not match any of the above, he will be ignored. As described additionally to the response behavior the ’Allow’ header of the response to an ’OPTIONS’ request is analyzed. A list of the ’Allow’ headers of standard clients has been created:

Component ’Allow’ header field in ’OPTIONS’ response Cisco Voice Gateway INVITE, OPTIONS, BYE, CANCEL, ACK, PRACK, COMET, REFER, SUBSCRIBE, NOTIFY, INFO, UPDATE, REGISTER Cisco Phone (cisco.com) OPTIONS, INVITE, BYE, CANCEL, REGISTER, ACK, NO- TIFY, REFER Pingtel Phone (pingtel.com) INVITE, ACK, CANCEL, BYE, REFER, OPTIONS, NOTIFY, REGISTER, SUBSCRIBE 45

Adore Softphone (adoresoft- INVITE, BYE, OPTIONS, MESSAGE, ACK, CANCEL, NOTIFY, phone.com) SUBSCRIBE, INFO, REFER Express Talk (nch.com.au) INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, NOTIFY eyeBeam (counterpath.com) INVITE, ACK, CANCEL, OPTIONS, BYE, REFER, NOTIFY, MESSAGE, SUBSCRIBE, INFO KPhone INVITE, OPTIONS, ACK, BYE, MSG, CANCEL, MESSAGE, SUBSCRIBE, NOTIFY, INFO, REFER LinPhone (linphone.org) INVITE, ACK, OPTIONS, CANCEL, BYE, SUBSCRIBE, NO- TIFY, MESSAGE, INFO Phoner (phoner.de) INVITE, ACK, CANCEL, BYE, NOTIFY Sipps (nero.com) INVITE, ACK, CANCEL, BYE, REFER, OPTIONS, NOTIFY, INFO SJPhone (sjlabs.com) INVITE, ACK, CANCEL, BYE, REFER, NOTIFY WinSip (touchstone-inc.com) INVITE, ACK, BYE, CANCEL, OPTIONS, MESSAGE, INFO Yate (yate.null.ro) ACK, INVITE, BYE, CANCEL

Table 3.5: Allow header fields

3.1.3 Weakness of Device Fingerprinting

The weakness of passive fingerprinting is described by the authors of [50] themselves. As passive fingerprinting only analyzes the order and existence of the header fields of an ’INVITE’ message, an attacker simply needs to order the header fields in the same way as one standard client. In that case the passive fingerprinting mechanism can’t detect the attack. We can state nearly the same for active fingerprinting, as an attacker only needs to behave like one standard client when receiving unexpected or non standard compliant SIP messages. It is very simple for an attacker to develop an attacking SIP client, that behaves exactly like a standard client, as he can use the same SIP Stack or imitate the behavior of SIP Stack of a standard client. We can call this attack Device Spoofing and any attacker, who is able to spoof a device can’t be identified by the Device Fingerprinting technique. As Device Fingerprinting is discussed as a server side anti SPIT mechanism, it is useless against Direct IP Spitting as the clients don’t have any chance to verify the fingerprint of the attacking client. In the end we will take a look on practical issues of Device Fingerprinting. When we take a look at today’s VoIP universe, we will find out, that there exist a vast variety of hard- and softphones. Each of this phones has it’s own SIP Stack and even within a product family header layouts and behavior differ even between two versions of the same device. The result is, that an administrator who uses Device Fingerprinting in order to protect his system, must always keep the list of fingerprints up to date. Comparing the ’INVITE’ message of a caller with an old or incomplete fingerprint list, can lead to blocking the call although the call is not a SPIT call. Let us e.g. assume, that a caller uses a standard client and, that the manufacturer sends out a firmware upgrade, that makes major changes to the SIP Stack. Any calls of this user are blocked or marked as SPIT, until the administrator of the VoIP network updates the fingerprint list and this procedure will repeat any time a new firmware version is rolled out 46 or new clients are released. Taking it even one step further, we can see, that as more and more clients and versions are released, the fingerprint list will become wider and wider and in the end nearly any combination of e.g. header fields will be present in the list. The main problem of device fingerprinting is, that it is derived from a HTTP security technique. In that scenario only few clients (web browsers) from few developers exist, in contradiction to the VoIP world.

3.2 White Lists, Black Lists, Grey Lists

The White List technique is presented e.g. in [21] [37] and works as follows: Each user has a list of users, that he accepts calls from and any caller who is not present in the list will be blocked. In addition the private White Lists can be distributed to other users. If e.g. a caller is not present in the White List of the callee, White Lists of other trusted users can be consulted and their trusted users (up to a certain level), however this technique needs additional mech- anisms. Black Lists are the contradiction of White Lists and contain only identities, that are already known as spammers. Any call from a caller whose identity is present in the callee’s Black List is blocked. Even Black Lists can be implemented as distributed Black Lists, where a callee can consult the Black Lists of other users. Grey listing works as follows: On initial request of an unknown user (not in White List) the call is rejected and the identity is put on the Grey List. As stated in [21] in case the caller tries calling back within a short time period, the call will be accepted. An adaption of this technique is described in [37] as Consent Based Communication. In case of Consent Based Communication the call of an unknown caller is initially blocked and put on the Grey List. The callee can consult the Grey List and decide, if he will accept future calls from this identity or block it permanently (e.g. put it on the Black List).

3.2.1 Weaknesses of White Lists, Black Lists, Grey Lists

Black Lists can not really be viewed as a SPIT countermeasure, because additional methods are needed to classify a caller as a Spitter. A Black List on server side would require e.g. statistical methods for classifying a caller as Spitter. In case of a client side Black List, the user must mark a caller as a Spitter, e.g. after receiving an initial SPIT call from this caller. Both server side and client side Black List are very useless against Direct IP Spitting for different reasons. Server sided Black Lists are bypassed by Direct IP Spitting, because the SIP messages are sent directly to the client. Client sided Black Lists are circumvented by Direct IP Spitting, because the caller can take on any identity in order to place calls. So if one identity is blocked he can simply switch the Identity. We can call this attack SIP Identity Spoofing and any attacker who can spoof SIP identities, can easily bypass Black Lists. White Lists are at first sight harder to circumvent than Black Lists, because the attacker has no knowledge about the entries of the White List of the victim. So even if he wants to spoof an identity, the attacker doesn’t know which identity he must take on, in order to place a successful call. In case of Direct IP Spitting the attacker could simply try out all existing 47 accounts with a brute force attack until he finds out which identities are not blocked. A less exhausting procedure can be performed in case of distributed or imported white lists [21]. In that scenario the attacker needs one valid account. After adding the victim to the attacker’s white list, he can now select, that he wants to import the white list of the victim. So he can get access to all entries of the victim’s white list and can spoof these identities e.g. in a Direct IP Spitting attack. The Grey List mechanism can be bypassed the same way as White List mechanisms, as it just represents a mechanism, that allows first time contact. All in all we can say, that any attacker who is able to perform SIP Identity Spoofing, can bypass Black Lists, White Lists and Grey Lists. In the end we will take again a look at the practical side of the presented mechanisms. The concepts of Black, White and Grey Listing are derived from the Instant Messaging world, where it is a matter of course, that users first ask for permission, before they are added to another user’s buddy list and only buddies can communicate with each other. When a user receives a communication request, he receives the profile of the other user containing e.g. nick name, email address, full name or even profile photo. On basis of this information, the user can decide and is able to decide, if he wants to accept messages in future from that party or not. Taken to the VoIP scenario this mechanism seems very impractical as the introduction problem has to be solved. Let us assume e.g. an employee of a bank wants to call one of his customers. In case of white listing the call can not be successfully routed to its target, as customers usually don’t have the phone numbers of employees of their home bank listed in the White List. The decision basis for accepting or rejecting a call is simply the phone number that is sent by the caller. If the call is rejected at first (Grey listing) the callee must decide if he wants to accept future calls and he must base this decision on the phone number. We can easily see, that this fact is very impractical.

3.3 Reputation Systems

Reputation based mechanisms are described in [45] or in [37] and can be summarized as fol- lows: After receiving a call, the callee can set a reputation value for the caller, that marks this caller as Spitter or not. This reputation value must be assigned to the identity of the caller and can be used for future session establishment requests. This technique can be used e.g. as attachment to Grey listing [37] in order to provide a better decision basis. The authors of [45] explain, that the user feedback can be used additionally for calls, that were not detected by other SPIT preventing components. The way the reputation value is generated can differ. The SPIT value can be e.g. an additional SIP header, or included in a special error response code or distributed via SIP event notification mechanism. Reputation systems can be either based on negative or positive reputation values. This means that in first case only Spitters are marked with negative values or in second case ’normal’ callers are marked with positive values. An adaption of this method can be found in [49] where user feedback is combined with sta- tistical values in order to calculate a reputation value. The reputation value is e.g. composed of a value representing the number of times an identity occurs in other users’ Black Lists, call 48 density, call length or similar statistic values. The assumption behind this approach is, that the calculated value will differ much between ’normal’ users and Spitters.

3.3.1 Weakness of Reputation Systems

Reputation systems, that are based on negative reputation can be bypassed in same way as Black Lists [37]. A user with a negative reputation can be viewed as globally blacklisted, as his calls are blocked e.g. for any user (this depends on the policy, that is used). Nevertheless an attacker, that is black listed simply needs to gain access to a new ’clean’ account. Either by changing to an unused valid account or in case of a SPIT value as SIP header, by spoofing the header value (e.g. with Direct IP Spitting). This attack can be called SIP Header Spoofing. The attacker can simply set or change values of header fields, when he uses Direct IP Spitting. In addition an attacker can create several accounts with the aim of pushing the SPIT value of one account up or down (depending on implementation). This attack can be called Repu- tation Pushing or Pulling and is referred in [1] as ’ballot stuffing’. In this scenario a seller (caller) colludes with a group of buyers (callees) in order to be given unfairly high ratings by them. Again we will also take a closer look of practical issues of the anti SPIT mechanism. At first we must admit, that Reputation systems are more auxiliary features than SPIT blocking mecha- nisms. The reason for this argumentation is, that the user must classify a call as SPIT via a button or by entering a value. This value is used for future decisions on that SIP identity. So initially SPIT is not prevented by this technique. Then the SPIT value of an identity has to be shown to callees, so that they can decide about accepting or rejecting the call. Let us assume a Spitter has achieved a SPIT value or SPIT probability of e.g. thirty percent and then calls a victim. What should happen now? When the call is forwarded to the user and the value is e.g. shown in the display of the callee’s phone, he can decide to accept or reject the call on a better decision basis. The problem is that anyhow his phone rings and, that is what should be prevented. He could have just picked up the call and listened the first 5 seconds, to know that it is SPIT. So the SPIT value didn’t just add one percent of benefit. On top of this fact, attackers could misuse the scoring system and create enough accounts in order to threaten ’normal’ users with collectively giving them negative reputation [37]. This attack is referred in [1] as ’bad mouthing’ and can be adapted to the VoIP scenario, as callees can simply give negative reputation values to callers without any reason.

3.4 Turing tests, Computational Puzzles

Turing test are tests, where the caller is given a challenge, that a human can solve easily and that is hard to solve for a machine. Therefore Turing tests or CAPTCHA (Completely Auto- mated Public Turing test to tell Computers and Humans Apart) are tests, that countermeasure Calling Bot attacks in VoIP scenarios. Turing tests in VoIP scenario work as follows: On ini- tial call establishment attempt, the caller is transferred to an interactive System, where he is challenged with a task e.g. dialing 5 digits, that he is hearing (so called Audio CAPTCHA). 49

While the numbers are read out background music or any other kind of noise is played, so that speech recognition systems can’t be used to solve the task. A human caller in contra- diction will solve the task without difficulties and only if the task is solved, the call will be forwarded to its destination. Turing tests can be used in combination with white lists, solving the introduction problem as described in [48]. Computational Puzzles seem at first sight very similar to the Turing tests concept. As de- scribed in [23] a SIP Proxy or User Agent Server can request from a User Agent Client (caller) to compute the solution to a puzzle. The goal of this method is to raise CPU costs of a call and so reduce the number of undesirable messages that can be sent. Turing test in contradiction have the goal to block non-human callers, as described above. According to [23] the puzzle, that has to be solved, could be finding a pre-image that will SHA1[14] hash to the correct image. This means that the UAC will be challenged with a SHA1 hash of a value and the UAC must find out (by computing it) which value has been hashed.

3.4.1 Weakness of Turing tests and Computational Puzzles

Turing tests seem at first sight very effective for SPIT prevention in combination with white lists, but nonetheless have weak points. The first approach of bypassing Audio CAPTCHA is relaying the CAPTCHA to human solvers. An attacker could pay cheap workers, who are only hired to solve Audio CAPTCHA. In countries with cheap labour this would raise the costs per call only marginally [37]. In order to reduce the costs, an attacker could even e.g. set up an adult hotline and could dispatch Audio CAPTCHA to the customers of this service. This technique is known from visual CAPTCHA where the images from CAPTCHA protected sites are copied and relayed to a high traffic site owned by the attacker. All in all we can state, that an attacker who can detect CAPTCHA and relay it to human solvers is able to bypass Turing tests and we can call this attack CAPTCHA Relay Attack. Computational Puzzles can not really be viewed as SPIT prevention mechanisms. It is ob- vious, that attackers usually possess high computational power. So Circumventing a system protected by Computational Puzzles, doesn’t even demand a special attack. The attacker just needs sufficient CPU power. In the end again we will take a look at some practical issues of the described techniques. As far as Turing tests are concerned, we can see, that this method is very intrusive. User Inter- action is forced every time a caller is not present in the White List of a callee. The difficulty with Computational Puzzles is, that different VoIP endpoints have different abilities in computational power. So if the task is to hard to solve (consumes too much CPU power), session establishment will be delayed very much for e.g. a low-end cell phone, while attackers with high CPU power PCs won’t be concerned much. With this fact Computational Puzzles are very ineffective and contra productive, as they only bother ’normal’ users. 50

3.5 Payments at risk

Payments at risk mechanisms can be used in order to demand payment from an unknown caller. In [37] this technique is described as follows: If user A wants to call user B, he must first send a small amount of money to user B. When User B accepts the call and confirms, that the call is not a SPIT call, the amount will be charged back to user A. With this technique it is possible to raise costs for SPIT callers while keeping ’normal’ calls cheap. In [37] it is described as an auxiliary technique, that solves the introduction problem of White lists, this means, that payment is only required for callers who are not on the White list of callee. In general the payment could be demanded for every call, but this would make the telephony service more expensive. An adaption of this method is described in [26], here the Payment technique is used in com- bination with a SPIT prediction value, that is computed at server side. If the SPIT likelihood is high the call is rejected, if the SPIT likelihood is small the call is forwarded to the callee and if the SPIT likelihood value is in between payment is demanded automatically. Only if the payment is fulfilled, the call will be forwarded to its target. The difference between the two approaches is, that in the first case the payed amount is only charged back for non SPIT calls and in the second case, callers who reject payment are treated as Spitters.

3.5.1 Weakness of Payment at risk

In which way Payment at risk can be bypassed depends mainly on the way it is implemented. As described demanding payment for each call won’t be very realistic, because this would require a high administrative overhead and more costs for service providers. Let us assume Payment at risk combined with White listing as in the first example, so that payment is only required for callers that are not present in the callee’s White List. In this case a caller could simply spoof identity as described in the section about White List. In the second scenario, where Payment at risk is combined with a Reputation system, the attacker just needs to achieve an adequate reputation value, as described in the corresponding section. Let us even assume, that Payment at Risk is used for every call. Even In that case an attacker could circumvent it, by impersonating as another user, so that he can establish calls and shift the costs on to ’normal’ customers. In which way this kind of SIP Identity Hijacking attack is fulfilled is an other question and out of scope for now. Besides the technical aspects, practical issues of Payment at Risk are numerous. At first the relative high costs, that are required for micropayment will must be viewed, the inequities in the value of currency between sender and recipient [37] and the additional interactions, that a user must take (e.g. confirming a call from an unknown party as non SPIT). 51

3.6 Intrusion Detection Mechanisms, Honey phones

Intrusion Detection Systems are (generally described) systems, that can be used for detection of any kind of abnormal behavior within a e.g. network and so reveal attacks. An implemen- tation of this technique is presented in [2] based on the Bayes inference approach combined with network monitoring of VoIP specific traffic. The Intrusion Detection System is designed as a defense mechanism against different VoIP specific attacks including scan attacks and SPIT attacks. For every attack a conditional probability table (CPT) is defined for variables such as request intensity, error response intensity, parsing error intensity, number of different destinations, max number of dialogs in waiting state, number of opened RTP ports, request distribution and response distribution. Let us look at e.g. the CPT for the number of different destinations variable: For a SPIT attack the likelihood of having more than 7 different desti- nations is set to 1 and the likelihood of having up to 7 different destinations is set to 0. The concept behind this technique is, that the different attacks affect these variables in different ways, e.g. a SPIT attack usually has a higher probability of a higher number of destinations than normal traffic. So a belief of a network trace can be calculated with the aid of likelihood vectors, that were defined in the CPT. In the end the trace can be categorized as an attack or normal trace (refer to [2] for detailed description). Honey phones can be used as part of an Intrusion Detection System as described in [21] [3] and can be viewed as VoIP specific Honeypots. A Honeypot represents a part of a network that is not accessible by ’normal’ users and therefore any access to the honeypot can be viewed as an attack. VoIP specific honeypots can be used in order to detect Scan attacks or SPIT attacks. As described in [3] the Honeypot is implemented as a complete parallel VoIP infrastructure, that is logically and physically separated from the normal network and so simulates a whole VoIP network. Let us assume a Scan attack as described earlier. When the attacker sends e.g. ’OPTIONS’ or ’INVITE’ requests to valid assigned permanent URIs they are forwarded through the normal SIP network (Proxy, UAC), but when the attacker tries to send an ’OPTIONS’ re- quest to an unassigned or invalid SIP URI the request will be forwarded to the Honeypot, where the requests can be monitored and treated adequately. The authors of [3] propose call analysis in order to determine attack characteristics, interaction with the originator in order to determine the source of the attack and blocking of the calls, as adequate treatment. The monitoring system of this approach works as follows: A day is divided into sections of specified time (e.g. one hour). For each section a predefined metric is calculated (e.g. num- ber of calls, number of different recipients, average duration of a call) matching predefined events (e.g.call). In the learning phase (e.g. a month), daily statistics are built to extract a long term account profile (e.g. daily average of the number of calls for each section). In the detecting stage (e.g. a day), a short term profile is compared to the long term one by using an appropriate distance function (e.g. Euclidean distance, quadratic distance, Mahalanobis distance). A recent profile which is quite different from the long term one indicates possible misuse. Another method is to study non stationary features of an account, for example the distribution of calls over all callees or the shape of the callees’ list size over all dialed calls. By comparing changes of a distribution over the time by using of an appropriate distance func- 52 tion (e.g. Hellinger distance), sudden bursts may be detected and treated as abnormalities [3].

3.6.1 Weakness of Intrusion Detection Mechanisms, Honey phones

Intrusion Detection Systems base on the assumption, that the characteristics of attacks differ much from characteristics of normal calls. At first sight this assumption seems logic, as e.g. within a SPIT attack, the attacker calls hundreds or thousands of victims within an hour, while a normal user wouldn’t even send out one percent of this amount of calls. Nevertheless the attacker has two possibilities in order to bypass detection by an Intrusion Detection System. The first is to align his behavior with the behavior of normal users, e.g. adjust the call rate to 5 calls per hour. Obviously this technique is hard to fulfil, because this would make an attack very inefficient as it would consume too much time, but on the other hand the goal of a Spitter is not to reach as much users as possible within the shortest time period. Reaching e.g. thousand users with a call rate of 5 calls per hour would take approximately 8 days. We can call this technique Call Rate Adaption, this means, that an attacker is able to adjust his call rate (e.g. number of calls per time slot, number of simultaneous calls). As the call rate is not the only variable, that is used in order to detect abnormal behavior an attacker can use a second technique in order to not be detected by Intrusion Detection Systems. The attacker can use different accounts for his attacks, so that statistic values are spread over several accounts. Let us assume that an attacker has one hundred valid user accounts. With this amount of accounts he can partition the targeted user accounts into one hundred groups and use only one account per group. The users from group one are only called with account one and so on. It is harder for a monitoring system to detect attacks, that are originated from different sources, as there must be a technique to correlate partial attacks to one complete attack. This technique can be called Account Switching, as the attacker switches the used account while he is performing an attack. Honeypots are very effective against scan attacks as anyone who tries to reach invalid or unassigned identities, will be trapped and so Honeypots are very effective against SPIT. When the Spitter can’t scan the network for assigned and unassigned numbers, he is forced to view all numbers as assigned. When he views all numbers as assigned, he will sooner or later step into the trap, because he will establish calls to endpoints, that are part of the Honeypot. Nevertheless attackers can trick the Honeypot mechanism with SIP Identity Hijacking. When an attacker impersonates the accounts of normal users and then performs SPIT attacks with this normal accounts, he will access end points in the Honeypot system with normal accounts. Even the possibility, that atatckers infect other machines to act as so called ’zombies’ [37] and generate SPIT can lead to accesses to the Honeypot system with ’normal’ accounts. So the assumption that accesses to the Honeypot are only established by attackers is lapsed. In the end we will take again a look at the practical issues of the presented solutions. The practical problem with intrusion detection systems in general is, that they base on statistical assumptions, that are not verified. The questions that has to be solved is: Where is the borderline between normal usage and abnormal usage? The publishers state that statistical values are assumed or derived from attack characteristics, but in order to reduce the rate of 53 false negative and false positive classifications, the knowledge basis must be precise. So we can say, that what we lack, is knowledge of SPIT characteristics as we nowadays can’t really distinguish SPIT from normal traffic unless the SPIT attacks are excessive. Honeypots have the disadvantage, that they only detect access to invalid or unassigned accounts, this means, that an attacker who only accesses valid accounts won’t be handled by a honeypot.

3.7 Summary

We can finally say, that we have seen SPIT countermeasures with different weak points. All of the presented ideas have technical and practical weak points, that can be exploited by attackers in order to circumvent these techniques. An attacker who is able to perform the following techniques:

Device Spoofing SIP Identity Spoofing SIP Header Spoofing Reputation Pushing or Pulling CAPTCHA Relay Attack SIP Identity Hijacking Call Rate Adaption Account Switching

Table 3.6: SPIT attack techniques has a good repertoire, that enables him to bypass any of the presented techniques or combi- nations of them. An attacker now needs a tool that aggregates the presented attacks. 54

4 SIP XML Scenario Maker

This chapter shows the main concept of our SIP benchmark tool, that can be used to facilitate execution of SIP scenarios. As the presented solution all base on the Session Initiation Proto- col, even our solution is based on SIP and can be viewed as a custom SIP scenario generator, which means, that we can produce any kind of SIP scenario and execute it against any kind of SIP endpoint or multiple endpoints.

4.1 Technical Basis

Our tool is named SIP XML Scenario Maker (SXSM) and is based on SIPp developed by HP [5]. SIPp is an Open Source test tool and traffic generator for SIP and was developed as a performance testing tool for the SIP protocol. It has a few SIP test scenarios integrated, such as e.g. a scenario file, that simulates the call establishment from the point of view of the UAC and one from the point of view of the UAS, but SIPp can also read XML scenario files describing any SIP testing configuration. It features the dynamic display of statistics

Figure 4.1: SXSM and SIPp about running tests (call rate, round trip delay, and message statistics), periodic CSV statistics dumps, TCP and UDP over multiple sockets or multiplexed with retransmission management, regular expressions and variables in scenario files, and dynamically adjustable call rates [5]. SXSM expands SIPp with the ability to quickly create custom SIP scenarios via a graphical user interface (GUI), execute created scenarios as a batch and evaluate the result of the execution. The power lies in the simplicity of creating custom SIP messages, putting them into 55 a sequence as a complete SIP scenario and executing the created scenarios. The functionality is fulfilled by two different editing modes and one execution mode. In the following we will take a closer look at these modes.

4.1.1 Message Editor

Figure 4.2: SXSM Message Editor

The message editor delivers the basic functionality, the ability to create and organize custom SIP messages. SIP messages are the smallest elements of SIP scenarios, as SIP scenarios are sequences of SIP messages. The SIP messages can be grouped in type folders, which means, that e.g. response codes of type 1xx (e.g. 100,180,181) can be clustered to the 1xx type. Additionally the message types can be grouped into message sets, so that the user can create e.g. two differently composed ’INVITE’ messages and put one into set A and one into set B. Later the user can distinguish the two ’INVITE’ messages, because they are in different sets. Within the message editor, the user can configure the layout of each and every SIP message, that can be used in the scenario editor (explained later). The standard procedure of creating a new SIP message works as follows:

• Create a set for the new message or select an existing one.

• Create a type for the new message within the chosen set or create a new one.

• Create a message by choosing a name for the message and entering message details into the message editor’s text field. 56

4.1.1.1 SIPp message format

SXSM comes preconfigured with a set of standard messages, that can be used as orientation in order to compose own messages. A message is composed as one of the following six SIPp commands and follows a special syntax:

Command Description Initiates the sending of a SIP request or response. Lets a scenario wait for the receiving of a SIP request or response. Lets a scenario pause (for a defined time). The nop command doesn’t do anything at SIP level. It is only there to specify an action to execute. Possible actions are discussed later. Receive data in SIPp 3rd Party Call Control mode. Send data in SIPp 3rd Party Call control mode.

Table 4.1: SIPp commands

Note that SIPp offers more than the six commands presented above, but for our context only the presented ones are important. Before we will take a look on examples for each command, as they can be used within the message editor, we first introduce an excerpt of keywords that can be used within messages as provided in [5]:

Keywords

Keyword Default value Usage description [service] service Service field, as passed in the ser- vice field in shoot mode. [remote_ip] - Remote IP address, as passed in the remote IP field in shoot mode. [remote_port] 5060 Remote IP port, as passed in the re- mote port field in shoot mode. [transport] UDP Transport specified in the Transport Dropdown in shoot mode. [local_ip] Primary host IP address Local IP specified in Shoot mode. [local_ip_type] - Depending on the address type of local_ip (IPv4 or IPv6). [local_port] Random Will take the value of local port se- lected in shoot mode. 57

[len] - Computed length of the SIP body. To be used in’Content-Length’ header. [call_number] - Starts from ’1’ and is incremented by 1 for each call. [cseq] - Can be used to generate a CSeq number. [call_id] - Generates a SIP call id. [media_ip] - Same as the local IP specified in shoot mode. [media_ip_type] - Returns the media IP type IPv4/v6 [media_port] - Returns the value of the media port. [last_*] - The ’[last_*]’ keyword is replaced automatically by the specified header if it was present in the last message received e.g. [last_from]. [field0-n] - Used to inject values from an exter- nal CSV file. [$n] - Used to inject the value of call vari- able number n. [authentication] - Used to put the authentica- tion header. This field can have parameters, with the fol- lowing form: [authentication username=myusername pass- word=mypassword]. If no user- name is provided, the value from service parameter in shoot mode is used. If no password is provided, the value from -ap command line is used. [pid] - Provide the process ID (pid) of the main SIPp thread. [branch] - Provide a branch value. [msg_index] - Provide the message number in the scenario. [cseq] - Provides the CSeq value of the last request received. This value can be incremented (e.g. [cseq+1]).

Table 4.2: Message keywords

After we have seen all special keywords, that can be used within messages, we can now proceed and take a look at the syntax of SIPp commands. 58

command

As described above the send command can be used to send requests or responses and the example shows an ’ACK’ request. The header fields are enclosed between the opening and closing XML tags within a CDATA section:

;tag=[call_number] To: sut [peer_tag_param] Call-ID: [call_id] Cseq: 1 ACK Contact: sip:sxsm@[local_ip]:[local_port] Max-Forwards: 70 Subject: SXSM test scenario Content-Length: 0 ]]>

Table 4.3: send command example

The send command can carry several optional attributes within the opening XML tag. An excerpt of the most useful attributes is listed below:

• retrans: Used for UDP transport only: it specifies the T1 timer value, as described in SIP RFC 3261[39]. E.g. will initiate T1 timer to 500 milliseconds (RFC3261 default).

• crlf: Displays an empty line after the arrow for the message in main SIPp screen. E.g. .

• next: Go to another part of the scenario when the actual message is sent. E.g. jumps to label 12.

• test: is constructed together with ’next’ attribute to indicate, that the branch to the label specified with ’next’ should only happen if the variable specified in ’test’ is set. E.g. jumps to label 6 if variable 4 is set.

• start_txn: Records the branch ID of this sent message, so that responses can be properly matched (without this element the transaction matching is done based on the CSeq method, which is imprecise). E.g. : Stores the branch ID of this message in the transaction named ’invite’. 59

command

The recv command is a little bit simpler than the send command, as we don’t have to specify header fields of received messages, because they are generated by a remote endpoint. The example shows a ’180 Ringing’ response which only consists of the opening and closing XML tags and doesn’t contain a CDATA section:

Table 4.4: recv command example

Even the recv command can carry several attributes, that influence the behavior of the ele- ment:

• response: Indicates, that a SIP response is expected, a ’recv’ element is either a re- sponse or a request, in the first case the response attribute is mandatory. E.g. : will expect a SIP message with response code ’200’.

• request: Indicates that a SIP request is expected, a ’recv’ element is either a response or a request, in the latter case the request attribute is mandatory. E.g. : will expect an ’ACK’ request.

• optional: Indicates if the message to receive is optional. If the message is actually received, it is not seen as an unexpected message. If it is not received the scenario will not be aborted. If optional is set to ’global’, SIPp will look every previous steps of the scenario. When an unexpected message is received, SIPp looks if this message matches an optional message defined in the previous step of the scenario. If optional is set to ’global’, SIPp will look every previous steps of the scenario.

• crlf: Displays an empty line after the arrow for the message in main SIPp screen. E.g. .

• auth: If this attribute is set to ’true’, then the ’Proxy-Authenticate:’ header of the mes- sage received is stored and is used to build the [authentication] keyword. E.g. .

• rrs:Record Route Set. if this attribute is set to ’true’, then the ’Record-Route’ header of the message received is stored and can be recalled using the [routes] keyword. E.g. .

• next: Go to another part of the scenario when the defined message is received. E.g. jumps to label ’5’ when receiving a ’403’ message. 60

• test:is constructed together with ’next’ attribute to indicate, that the branch to the label specified with ’next’ should only happen, if the variable specified in ’test’ is set. E.g. jumps to label ’5’ when re- ceiving a ’403’ response only if variable 3 is set.

• regexp_match: Indicates if ’request’ is given as a regular expression. If so, the recv command will match against the regular expression. This allows to catch several cases in the same receive command. Example of a recv command, that matches ’MESSAGE’ or ’PUBLISH’ or ’SUBSCRIBE’ requests:

• response_txn: Indicates, that this is a response to a transaction that was previously started. To match, the branch ID of the first via header must match the stored transac- tion ID. E.g. matches only responses to the message sent with start_txn="invite" attribute.

command

The pause command is used in order to let the scenario pause for a defined time and is constructed in its simplest form as follows:

Table 4.5: pause command example

More complex pause commands can be constructed with additional attributes and are de- scribed below:

• milliseconds: Specifies the pause delay, in milliseconds. When this delay is not set, the value of the -d command line parameter is used. E.g. : pauses the scenario for 5 seconds.

• variable: Sets the length of the pause to the value of a specified variable. E.g. pauses for the number of milliseconds specified by call variable 1.

• crlf: Displays an empty line after the arrow for the message in main SIPp screen. E.g. .

• next: Jumps to the specified label after a pause. E.g. jump to label "7" after pausing 4 seconds with . 61

command

As described in table 4.1 on page 56 The nop command doesn’t do anything at SIP level but is useful in order to execute special actions within a scenario. Possible actions are execution of internal SIPp commands, execution of external (system) commands, audio replay, video replay and use of regular expressions [5]. Note that the ’actions’ tag can not only be present within a ’nop’ tag but also within a ’recv’ or ’recvCmd’ tag. This means, that you can execute an action unconditionally within a ’nop’ command or conditionally, when a specific event is received. internal/external commands: There are three types of internal commands, that can be used within a scenario. The ’stop_call’ command The ’stop_call_gracefully’ command and the ’stop_now’ command. E.g. the syntax for a ’stop_now’ command would be as follows:

Table 4.6: nop internal command example

External commands can be constructed as any system command, that is available on the machine where SIPp runs on and are built analog to internal commands:

Table 4.7: nop external command example audio/video replay: The ’play_pcap’ command can be used at any point within a scenario to replay a prerecorded audio or video file in pcap format. In our scenario especially the audio replay is useful. As an example the syntax of the ’play_pcap_audio’ command is listed below (the play_pcap_video is constructed in analog way):

Table 4.8: nop play command example 62 regular expressions: A regular expression can be used within a ’nop’ command or within a ’recv’ command in an ’action’ tag. As it is more useful within ’recv’ commands we will take a look at an example within a ’recv’ command:

Table 4.9: ereg example

The example above analyzes a received ’200 OK’ response, extracts the contact header and assigns it to call variable 6. label command

The label command is a very simple command, because it carries only an ’id’ attribute, that specifies the label id. A label message can be constructed with the following syntax and can be used as jump target:

Table 4.10: label command example sendCmd/recvCmd command

The ’recvCmd’ and the ’sendCmd’ commands can be used in 3PCC scenarios where 2 SIPp (and therefore SXSM) instances are launched and need to pass information between each other. The ’sendCmd’ contains a CDATA section with the data included, that should be sent as we can see below in the example:

Table 4.11: sendCmd command example

In this example the call variable ’$1’ is sent and the call id is included as it is mandatory to be sent within every ’sendCmd’. Additionally the ’sendCmd’ tag can contain a ’dest’ attribute, that states to which destination data should be sent (e.g. ). 63

The ’recvCmd’ can be used to recv data, that has been sent via a ’sendCmd’ and the syntax follows this example:

Table 4.12: sendCmd command example

The examples shows the usage of ’recvCmd’ where an action is specified that searches via a regular expression for a specified message content and assigns it to the call variable ’$2’, so that it can later be re-injected.

4.1.2 Scenario Editor

Figure 4.3: SXSM Scenario Editor

The scenario editor is the core element of SXSM. In this mode the user can create SIP sce- narios, based on the bricks created in the message editor. The scenarios can be grouped in different sets. The user must chose a name for a scenario and can then select from the list of messages, the messages he wants to add to the scenario and in which order they should appear. Afterwards the user can edit the scenario, that is presented as an XML file, in detail. Creating a SIP scenario is fulfilled with the following steps:

• Select messages from wished set and type and add them to the message list

• Select a name for the new scenario 64

• Select a scenario set, where the scenario should be placed or create a new scenario set

• Save the scenario

Let us assume the user wants to create a scenario where an ’INVITE’ message is sent, then a ’100 Trying’ is received, then a ’180 Ringing’ is received then a ”200 OK” is received. In this case the user simply selects these messages and adds them to the scenario, saves the scenario file and work is done. In this mode complex SIP scenarios can be created with a few clicks. Here we can see, that the modular composition enables us to create any kind of SIP scenario for any kind of SIP testing purpose.

4.1.3 Shoot Mode

Figure 4.4: SXSM Shoot mode

The shoot mode represents the execution mode of SXSM. In this mode the user can put pre- viously created SIP scenarios into a sequence, execute them one after the other and evaluate the results presented. The user selects scenarios from the scenario list and adds them to the shoot list, then he configures parameters, that are specificly set for each scenario. The first scenario specific parameter is the call rate and is composed of three values. The first one indicates how many times the scenario should be played. The second value defines how many times per time period the scenario should be played and the third one defines how long a time period is. If the user wants e.g. the scenario to be played 100 times with a rate of 10 times per minute he must chose 100 for the first parameter, 10 for the second parameter and 60000 for the last parameter. Note that the time unit for the last parameter is defined in milliseconds. As mentioned above the call rate can be set individually for every scenario, that is put into the shoot list. Now the user can additionally set SIPp specific command line 65 arguments, that are not handled by SXSM but passed directly to SIPp. The following table gives an overview about useful SIPp parameters, that can be used:

SIPp Parameter Usage -aa Enable automatic ’200 OK’ answer for ’INFO’, ’UPDATE’ and ’NOTIFY’ messages. -auth_uri Force the value of the URI for authentication. By default, the URI is composed of remote_ip:remote_port. -base_cseq Start value of ’cseq’ for each call. -bind_local Bind socket to local IP address, i.e. the local IP address is used as the source IP address. If SIPp runs in server mode it will only listen on the local IP address instead of all IP addresses. -buff_size Set the send and receive buffer size. -deadcall_wait How long the Call-ID and final status of calls should be kept to improve message and error logs (default unit is ms). -default_behaviors Set the default behaviors that SIPp will use. Possible values are: all (Use all default behaviors), none (Use no default behaviors), bye (Send byes for aborted calls), abortunexp (Abort calls on unexpected messages), pingreply (Reply to ping requests). If a behavior is prefaced with a -, then it is turned off. Example: all,-bye -inf Inject values from an external CSV file during calls into the scenarios. First line of this file say whether the data is to be read in sequence (SEQUENTIAL), random (RANDOM), or user (USER) order. Each line corresponds to one call and has one or more ’;’ delimited data fields. Those fields can be referred as [field0], [field1], ... in the xml scenario file. Several CSV files can be used simultaneously (syntax: -inf f1.csv -inf f2.csv ...) -l Set the maximum number of simultaneous calls. Once this limit is reached, traffic is decreased until the number of open calls goes down. Default: (3 * call_duration (s) * rate). -master 3pcc extended mode: indicates the master number -max_retrans Maximum number of UDP retransmissions before call ends on timeout. Default is 5 for ’INVITE’ transactions and 7 for others. -max_invite_retrans Maximum number of UDP retransmissions for invite trans- actions before call ends on timeout. -max_non_invite_retrans Maximum number of UDP retransmissions for non-invite transactions before call ends on timeout. 66

-nd No Default. Disable all default behavior of SIPp which are the following:On UDP retransmission timeout, abort the call by sending a ’BYE’ or a ’CANCEL’, On receive timeout with no ontimeout attribute, abort the call by sending a ’BYE’ or a ’CANCEL’,On unexpected ’BYE’ send a ’200 OK’ and close the call, item On unexpected ’CANCEL’send a ’200 OK’ and close the call, On unexpected PING send a ’200 OK’ and continue the call, On any other unexpected message, abort the call by sending a ’BYE’ or a ’CANCEL’ -rtp_echo Enable RTP echo. RTP/UDP packets received on port de- fined by -mp are echoed to their sender. RTP/UDP packets coming on this port + 2 are also echoed to their sender (used for sound and video echo). -slave 3pcc extended mode: indicates the slave number -slave_cfg 3pcc extended mode: indicates the file where the master and slave addresses are stored -trace_msg Displays sent and received SIP messages in ’scenario file- name’_’pid’_messages.log -trace_shortmsg Displays sent and received SIP messages as CSV in ’scenario file name’_’pid’_shortmessages.log -trace_stat Dumps all statistics in ’scenario_name’_’pid’.csv file. -trace_counts Dumps individual message counts in a CSV file. -trace_rtt Allow tracing of all response times in ’scenario file- name’_’pid’_rtt.csv. -ap Set the password for authentication challenges. Default is ’password’ -tls_cert Set the name for TLS Certificate file. Default is ’cacert.pem’ -tls_key Set the name for TLS Private Key file. Default is ’cakey.pem’ -tls_crl Set the name for Certificate Revocation List file. If not spec- ified, X509 CRL is not activated. -3pcc Launch the scenario in 3pcc mode (’Third Party call con- trol’). The passed ip address is depending on the 3PCC role. When the first twin command is ’sendCmd’ then this is the address of the remote twin socket. SIPp will try to connect to this address:port to send the twin command (This in- stance must be started after all other 3PCC scenarii). When the first twin command is ’recvCmd’ then this is the address of the local twin socket. SIPp will open this address:port to listen for twin command.

Table 4.13: SIPp optional parameters

After all additional parameters are set, the user has to enter information about the target (tar- geted username, remote IP,remote port) and about himself (local IP,local port). Afterwards the user can execute the shoot list. SXSM then feeds SIPp with the input data and waits until 67 the scenarios are executed. When the execution is fulfilled, SXSM evaluates the exit codes, that were generated by SIPp. The following exit codes are considered:

Figure 4.5: SXSM Shoot Mode after execution

• 0: All calls were successful

• 1: At least one call failed

• 97: exit on internal command. Calls may have been processed. Also exit on global timeout

• 99: Normal exit without calls processed

• -1: Fatal error

After the execution of the whole shoot list, the results are presented. Based on the exit code a success rate is calculated and displayed. The success rate sets the scenarios, that finshed successfully in relation to the total number of scenarios, so if e.g. 5 out of 10 scenarios finished with exit code 0, the success rate will be 50 percent. The user can view log files, that were generated for debugging purposes while the scenarios are playing as life output and after the execution. 68

5 Using SXSM as attack tool

In chapter 2 on page 36 we saw, that SPIT consists of three steps:

• The systematic gathering of the contact addresses

• The establishment of communication sessions with the victims

• The media transmission

These attacks can be fulfilled in a naive way easily with SXSM by creating two custom sce- nario, the first scenario would be used as a scan attack. At first we would create a csv file with all targeted accounts which would look like the following:

SEQUENTIAL 5550000 5550001 ... 5559999

Table 5.1: victim.csv for Scan Attack

Than we would create a simple scenario file e.g. as follows:

• Send an ’OPTIONS’ request to the target (target injected from csv)

• If the response is a ’200 OK’ echo user into ’valid_assigned.csv’

• Extract the IP address of the victim if included in Contact header of the answer and echo user and IP into ’directip.csv’

• If the response is a ’480 Temporarily unavailable’ echo user into ’valid_offline.csv’

• otherwise echo user into ’unassigned.csv’

As a result we would get four csv files the ’valid_offline.csv’ and the ’unassigned.csv’ could be used for the next scan attack, to update results. The ’valid_assigned.csv’ can be used as target for SPIT via Proxy attacks and the directip.csv can be used as target for Direct IP Spitting as described in chapter 2. The session establishment process would again be a very simple scenario:

• Send an ’INVITE’ to the target (injected from ’valid_assigned.csv’ or ’directip.csv’) 69

• Wait for a ’100 Trying’ response (optional)

• Wait for a ’180 Ringing’ response (optional)

• Wait for a ’200 OK’ response

• Send an ’ACK’

• Play prerecorded audio

• Terminate call with a ’BYE’ request

In chapter 3 we discussed different SPIT countermeasures and derived attack techniques from their weaknesses. What we will see in the next sections of this chapter is how this attack techniques can be put into practice with SXSM, so that the naive SPIT attacks described above won’t be detected. The goal of this chapter is not only to show, that the presented weaknesses can be exploited easily, but that it is absolutely necessary for an administrator of a VoIP network to put his system under structured tests. As SXSM is implemented within a very broad and modular context, it can be used for all SIP testing purposes and in special as a SPIT producing attack tool. SXSM can be extended while new anti SPIT mechanisms evolve.

5.1 Device Spoofing

The Device Spoofing attack is an attack, that has two facets. As we discussed earlier, device fingerprints can be derived from the layout of the SIP messages or from the behavior. The layout of SIP messages can be manipulated within the message editor of SXSM. If a user wants to imitate the message layout (presence and order of SIP headers) of a device, he has to follow a simple procedure. Let us assume the user wants to imitate the layout of an ’INVITE’ message of the Softphone KPhone[4]. The first thing to do is to find out how SIP messages sent by KPhone are layouted. This task can be fulfilled with a network protocol analyzer such as Wireshark[7]. What the user has to do is:

• Start KPhone.

• Start a Wireshark trace.

• Initiate a call with KPhone to any target.

• Stop the trace and filter out the ’INVITE’ message generated by KPhone.

• Save the message to plain text.

• Open SXSM and start the message editor. 70

• Create a new message set and call it ’KPhone’.

• Create a new message type within the message set and call it ’sendRequest’.

• Create a new message and call it ’INVITE’.

• Copy the previously traced INVITE into the message editor.

• Adapt the syntax so that it matches the send command syntax of SIPp.

• Save the message.

The steps can be repeated for any other SIP message or other devices. Furthermore SIP devices can be probed with SIP scenarios so that they generated wished requests or responses that can be traced. E.g. if we want to see how a ’180 Trying’ response is layouted, we can initiate a call between two instances of KPhone and trace the messages exchanged with Wireshark. Or we can send a standard INVITE from SXSM to KPhone and trace the exchanged messages. With this procedure it is possible for the user to find out the layout of each and every UA that exists and copy it to the message editor. Within the message editor he can create a message set for each UA and organize the messages in types within sets. Later he can use these messages as bricks in the scenario editor to create scenarios, that send messages identically to any existing UA. The second part of device spoofing is the active part, where SXSM has to react to unexpected messages. The behavior of the client can be manipulated within the scenario editor. The ability to create scenarios that contain branch points eases this process. A scenario can then contain a section for every message that can be received. The user must only put tests into the scenario with the scheme ’if message x is received jump to section y handle it and go back to the main scenario’. This can be fulfilled with scenarios, that contain conditional branching and regular expressions. Analog to the previous example we assume that the attacker wants to simulate the behavior of KPhone. The scenario should be constructed in the following way:

• Send a standard KPhone ’INVITE’ via UDP.

• Wait for the receiving of an optional ’OPTIONS’ request.

• Extract transport and SIP version from the ’Via’ header with a regular expression.

• If transport does not equal ’2.0’ and SIP version does not match selected transport pro- tocol jump to a ’nop’ section.

• Otherwise send ’200 OK’ response with correct ’Allow’ header.

• jump back to the point where ’OPTIONS’ request is expected 71

• If no more ’OPTIONS’ requests are generated, wait for receiving of ’100 Trying’, ’180 Ringing’, ’200 OK’.

• Start sending media.

The above procedure is very effective as only two special cases had to be checked, because KPhone answers ’OPTIONS’ requests with ’200 OK’ except those with invalid transport or SIP version number. The procedure can be adapted for any client behavior or probe technique, as attackers always have the ability to trace received messages, even unexpected ones and adapt the behavior.

5.2 SIP Identity Spoofing

Identity Spoofing in its simple form is provided by inserting the wished SIP URI in the "From" and "Contact" header of the SIP messages. The user can either create messages for each identity in the message editor and use them as bricks in the scenario editor or simply create a scenario for each identity and set the values manually within the scenario. If the user wants to inject the SIP URI from an external csv file he must specify this in the scenario file. The user simply needs to put the expression "[fieldn]" where n represents a number (the column of the csv file) at the position where the user name is usually placed in the "From" or "Contact" header according to the following scheme: From: [field0 file="caller.csv"] ; In this example the first column of the "caller.csv" file contains the name of the identity and the second column contains the user name part of the SIP URI and the ’file’ attribute contains the path to the csv file (relative or absolute). Injecting values from a csv can be useful when a lot of accounts should be used. The csv file should be constructed as follows:

SEQUENTIAL Alice;443447 Bob;765443 Chuck;987554

Table 5.2: caller.csv example

The keyword ’SEQUENTIAL’indicates, that the list should be processed sequentially. Alterna- tively a csv, that contains the keyword ’RANDOM’ in the beginning will be scanned sequen- tially.

5.3 SIP Header Spoofing

SIP Header Spoofing can be fulfilled with two different approaches. The first one is to create custom SIP messages with the message editor and set headers and header values as wished. The second is using standard SIP messages and change values of headers with the detail view 72 of the scenario editor. The first one should be preferred, if the user wants to create a lot of scenarios with the same header value and the second variant should be used, if the user wants to tweak values only once in a while. Any wished value of a header can be set in this way.

5.4 Call Rate Adaption

The Call Rate Adaption attack can be fulfilled within the shoot mode. The user just needs to add a scenario to the shoot list and adjust the values for call rate. He can put e.g. "Scenario X" into the shoot list and set the call rate to 10 times per second and stop as soon as 100 calls have been finished. With this method it is possible to control in detail how many times in what time period a scenario is executed. As one and the same scenario can be present several times in the shoot list, the user can define the behavior very precisely. So he can e.g. determine, that ’Scenario X’ should be executed 100 times with a call rate of 10 calls per second and then 20 times with a call rate of 2 calls per second. Note that the phrase ’call’ means one pass of scenario from beginning to end and does not mean, that a call is actually placed. A scenario could e.g. consist of sending an ’OPTIONS’ request and receiving the answer.

5.5 Account Switching

The Account Switching attack is a special form of SIP Identity Spoofing and can be fulfilled by providing an external csv file with appropriate data. Let us assume the user wants to place 100 calls to hundred different targets with ten different SIP identities. The csv file for the callee should contain hundred rows and each row should contain the user name of the targeted URI. The csv file for the caller should contain 10 rows and each row should contain one of the ten user names, that should be used as source. Including the SIP identities for the caller can be fulfilled with the mechanism described in the section about SIP Identity Spoofing. Including the SIP identities of the target can be configured in the shoot mode by selecting the external csv file as target.

5.6 Reputation Pushing or Pulling

Reputation Pushing or Pulling is very dependent of the implementation, but can be fulfilled in a generic way. The user simply needs to create two scenarios. One, that sends out a call and one, that receives a call. The receiving scenario should include e.g. a positive reputation value into the BYE message. Note, that this is the point where it is implementation specific, as it depends on the implementation of the Reputation System, where the reputation value must be put. Then the user must launch two instances of SXSM and shoot out the calling scenario with one instance and the receiving scenario with the other instance. Set the Proxy 73 as target of both, so that the call bypasses the proxy and does not go directly from client to client. Combining this technique with Account Switching for the call receiving side can lead to the desired effect of Reputation Pushing or Pulling.

5.7 SIP Identity Hijacking

SIP Identity Hijacking is again very implementation dependant. but we can take a look at a simple attack derived from [6]. The registration hijacking attack is presented as follows:

1. Disable the legitimate user’s registration. This can be done by:

• performing a DoS attack against the user’s device

• waiting for the deregisteration of the user

• Generating a registration race-condition in which the attacker sends repeatedly REGISTER requests in a shorter timeframe (such as every 15 seconds) in order to override the legitimate user’s registration request.

2. Send a REGISTER request with the attacker’s IP address instead of the legitimate user’s

With SXSM this process could be put into practice, by creating scenarios for each of the presented steps and execute them one after the other.

5.8 CAPTCHA Relay Attack

The CAPTCHA Relay Attack can be fulfilled with the Third Party Call Control (3PCC) mech- anism. With this mechanism it is possible for SIPp (and therefore for SXSM) to create a communication session with several remote endpoints and so relay e.g. calls. The procedure is fulfilled as follows:

• Attacker calls victim. Victim sends Audio CAPTCHA.

• Attacker calls human solver.

• Attacker "REFER"s victim to human solver.

• Victim accepts. Human solver solves CAPTCHA.

As this attacks is as good as the used CAPTCHA detecting algorithm, the technique must be adapted to future implementations of both CAPTCHA generators and detectors. 74

6 Conclusions and Outlook

What we saw in this thesis is, that all presented state of the art anti SPIT mechanisms contain weaknesses in their ideas, that can be exploited by attackers. These weaknesses are results of either countermeasures, that were derived from contexts, that don’t match the SPIT context entirely (e.g. email spam countermeasures) or they based on assumptions, that were not verified or proven or they didn’t consider the capabilities of attackers, due to an unprecise definition of SPAM over Internet Telephony. As an attacker we acted only with simple methods on the Application layer, but could nonethe- less find ways to bypass countermeasures, without even exploiting SIP protocol weaknesses or deploy methods of lower layer protocols. This showed us, that in order to mitigate threats like SPAM over Internet Telephony, developers and administrators need to challenge their solutions. SXSM closes a small gap in that context, providing a tool, that can be used to test systems for weaknesses. As new countermeasures evolve, the scenario portfolio of SXSM can be extended with new bricks and new tests can be formed in that way. The question of how to solve the SPIT problem must remain unanswered for now, but what we already can state is that, the easier it is for an attacker to collect SIP URIs of victims, to take on different identities, to manipulate message headers the more likely it is, that SPIT will grow massively. Making it hard or nearly impossible for attackers to act in that way will reduce the threat. The results of this thesis can be therefore used to rethink the countermeasures, maybe adapt some and discard others. 75

Glossary

3PCC Third Party Call Control

AH Authentication Header Protocol

CA Certificate Authority CAPTCHA Completely Automated Public Turing test to tell Com- puters and Humans Apart CRLF Carriage Return Line Feed

DNS Domain Name Service DSL Digital Subscriber Line

ESP Encapsulating Security Payload Protocol

HTTP Hyper Text Transfer Protocol

IETF Internet Engineering Task Force IM Instant Messaging IP Internet Protocol IPSec Internet Protocol Security ISDN Integrated Service Digital Network

LDAP Lightweight Directory Access Protocol

OSI Model Open Systems Interconnection Basic Reference Model

PBX Private Branch Exchange PKI Public Key Infrastructure POTS Plain Old Telephony Service PSTN Public Switched Telephony Network

QoS Quality of Service

RFC Request for Comments RTCP RTP Control Protocol 76

RTP Real-time Transport Protocol RTP/AVP Real-time Transport Protocol/Audio Video Profile

S/MIME Secure/Multipurpose Internet Mail Extensions SDP Session Description Protocol SIP Session Initiation Protocol SIP URI SIP-Uniform Resource Identifier SPIT Spam over Internet Telephony SXSM SIP XML Scenario Maker

Telex Teleprinter exchange TLS Transport Layer Security

UA User Agent UAC User Agent Client UAS User Agent Server UDP User Datagram Protocol

VoIP Voice over Internet Protocol

WLAN Wireless Local Area Network 77

List of Figures

1.1 SIP-Protocol-Stack ...... 10 1.2 UDP Datagram ...... 11 1.3 RTP Datagram ...... 12 1.4 SIP Three Way Handshake ...... 19 1.5 SIP transactions ...... 20 1.6 SIP message structure ...... 21 1.7 An analogue telephone attached to a Fritz!Box Fon as SIP UA ...... 28 1.8 Registration process of a SIP UA ...... 29 1.9 Stateless vs. Stateful Proxy ...... 30 1.10 SIP security mechanisms ...... 31 1.11 Calculated response for Digest Authentication ...... 32 1.12 SIP over TLS ...... 33 1.13 SIP and S/MIME ...... 34 2.1 Three steps of SPIT ...... 37 2.2 Three cases in information gathering ...... 38 4.1 SXSM and SIPp ...... 54 4.2 SXSM Message Editor ...... 55 4.3 SXSM Scenario Editor ...... 63 4.4 SXSM Shoot mode ...... 64 4.5 SXSM Shoot Mode after execution ...... 67 78

List of Tables

1.1 1xx status codes ...... 16 1.2 2xx status codes ...... 16 1.3 3xx status codes ...... 17 1.4 4xx status codes ...... 17 1.5 5xx status codes ...... 18 1.6 6xx status codes ...... 18 1.7 Request line of an INVITE ...... 21 1.8 Status line of a 200 OK ...... 21 1.9 SIP header syntax ...... 22 1.10 SIP header fields ...... 22 1.11 INVITE message ...... 23 1.12 Message headers and their appearance ...... 26 1.13 rtpmap parameters for RTP/AVP ...... 27 1.14 REGISTER message ...... 28 1.15 SIPS URI syntax ...... 34 3.1 Passive fingerprints hardphones ...... 41 3.2 Passive fingerprints softphones ...... 42 3.3 Active fingerprints servers ...... 43 3.4 Active fingerprints hard- and softphones ...... 44 3.5 Allow header fields ...... 45 3.6 SPIT attack techniques ...... 53 4.1 SIPp commands ...... 56 4.2 Message keywords ...... 57 4.3 send command example ...... 58 4.4 recv command example ...... 59 4.5 pause command example ...... 60 4.6 nop internal command example ...... 61 4.7 nop external command example ...... 61 4.8 nop play command example ...... 61 4.9 ereg example ...... 62 4.10 label command example ...... 62 4.11 sendCmd command example ...... 62 4.12 sendCmd command example ...... 63 4.13 SIPp optional parameters ...... 66 5.1 victim.csv for Scan Attack ...... 68 5.2 caller.csv example ...... 71 79

Bibliography

[1] Immunizing Online Reputation Reporting Systems Against Unfair Ratings and Discrimina- tory Behavior, 2000.

[2] Intrusion detection mechanisms for VoIP applications, 2007.

[3] Holistic VoIP Intrusion Detection and Prevention System, 2008.

[4] Kphone at sourceforge, http://sourceforge.net/projects/kphone, 2008.

[5] Sipp at sourceforge, http://sipp.sourceforge.net/index.html, 2008.

[6] Two attacks against voip, http://www.securityfocus.com/infocus/1862, 2008.

[7] Wireshark homepage, http://www.wireshark.org/, 2008.

[8] A.G. Bell. Telephone patent - u.s. patent no. 174,465, 1876. http://inventors.about.com/library/inventors/bltelephone1.htm.

[9] J. Bellamy. Digital Telephony. John Wiley and Sons, Inc., 2000.

[10] B. (Ed.) Campbell, J. Rosenberg, H. Schulzrinne, C. Huitema, and D. Gurle. Rfc 3428 - session initiation protocol (sip) extension for instant messaging. Technical report, IETF, 2002.

[11] T. Dierks and C. Allen. Rfc 2246 - the tls protocol - version 1.0. Technical report, IETF, 1999.

[12] S. Donovan. Rfc 2976 - the sip info method. Technical report, IETF, 2000.

[13] S. Dritsas, J. Mallios, M. Theoharidou, G. F. Marias, and D. Gritzalis. Threat analysis of the session initiation protocol regarding spam. Technical report, IEEE, 2007.

[14] D. Eastlake and P.Jones. Rfc 3174 - us secure hash algorithm 1 (sha1). Technical report, IETF, 2001.

[15] C. Eckert. IT-Sicherheit, Konzepte-Verfahren-Protokolle. Oldenbourg Verlag, 2006.

[16] E. Eren and K. Detken. VoIP Security. Hanser Verlag, 2007. 80

[17] R. Fielding, J. Gettis, J. Mogul, H. Frystyk, L. Masinter, P.Leach, and T. Berners-Lee. Rfc 2616 - hypertext transfer protocol - http/1.1. Technical report, IETF, 1999.

[18] V.Fossella. Resolution in the house of representatives - h. res. 269, 2001.

[19] J. Franks, P. Hallaman-Baker, J. Hostetler, S. Lawrence, P. Leach, A. Luotonen, and L. Stewart. Rfc 2617 - http authentication: Basic and digest access authentication. Technical report, IETF, 1999.

[20] M Handley, V Jacobson, and C. Perkins. Rfc 4566 - sdp: Session description protocol. Technical report, IETF, 2006.

[21] M. Hansen, M. Hansen, J. Mueller, Rohwer T., C. Tolkmit, and H. Waack. Developing a Legally Compliant Reachability Management System as a Countermeasure against SPIT. 2007.

[22] S. Haykin and M. Moher. Introduction to analog and digital communications. John Wiley & Sons, Inc., 2007.

[23] C. Jennings. Computational Puzzles for SPAM Reduction in SIP.Internet-draft. 2008.

[24] S. Josefsson. Rfc 4648 - the base16, base32, and base64 data encodings. Technical report, IETF, 2006.

[25] S. Kent and K. Seo. Rfc 4301 - security architecture for the internet protocol. Technical report, IETF, 2005.

[26] S. Liske, K. Rebensburg, and B. Schnor. Spit-erkennung, -bekanntgabe und -abwehr in sip-netzwerken. Master’s thesis, University of Potsdam, 2007.

[27] S. Morse. Telegraph patent - u.s. patent no. 1,647, 1840. http://inventors.about.com/od/mstartinventors/ig/Samuel-Morse—Patent/.

[28] A. (Ed.) Niemi. Rfc 3903 - session initiation protocol (sip) extension for event state publication. Technical report, IETF, 2004.

[29] J. Petersen. The telecommunications illustrated dictionary. CRC Press, 2002.

[30] J. Postel. Rfc 768 - user datagram protocol. Technical report, IETF, 1980.

[31] J. Postel. Rfc 791 - internet protocol. Technical report, IETF, 1980.

[32] B. (Ed) Ramsdell. Rfc 3851 - secure/multipurpose internet mail extensions (s/mime) version 3.1 message specification. Technical report, IETF, 2004.

[33] R. Rivest. Rfc 1321 - the md5 message-digest algorithm. Technical report, IETF, 1992. 81

[34] R. L. Rivest, A. Shamir, and L. Adleman. Rsa patent - u.s. patent no. 4,405,829, 1983.

[35] A. B. Roach. Rfc 3265 - session initiation protocol (sip)-specific event notification. Tech- nical report, IETF, 2002.

[36] J. Rosenberg. Rfc 3311 - the session initiation protocol (sip) update method. Technical report, IETF, 2002.

[37] J. Rosenberg and C. Jennings. Rfc 5039 - the session initiation protocol (sip) and spam. Technical report, IETF, 2008.

[38] J. Rosenberg and H. Schulzrinne. Rfc 3262 - reliability of provisional responses in the session initiation protocol (sip). Technical report, IETF, 2002.

[39] J. Rosenberg, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks, M. Han- dley, and E. Schooler. Rfc 3261 - sip: Session initiation protocol. Technical report, IETF, 2002.

[40] J. Rosenberg, H. Schulzrinne, M. Handley, and E. Schooler. Rfc 2543 - sip: Session initiation protocol. Technical report, IETF, 1999.

[41] H. Schulzrinne and S. Casner. Rfc 3551 - rtp profile for audio and video conferences with minimal control. Technical report, IETF, 2003.

[42] H. Schulzrinne, S. Casner, R. Frederick, and V.Jacobson. Rfc 1889 - a transport protocol for real-time applications. Technical report, IETF, 1996.

[43] H. Schulzrinne, S. Casner, R. Frederick, and V.Jacobson. Rfc 3550 - a transport protocol for real-time applications. Technical report, IETF, 2003.

[44] R. Sparks. Rfc 3515 - session initiation protocol (sip) refer method. Technical report, IETF, 2003.

[45] M. Stiemerling, S. Niccolini, and S. Tartarelli. Requirements and methods for SPIT iden- tification using feedbacks in SIP.Internet-draft. 2008.

[46] A. Strowger. Strowger switch patent - u.s. patent no. 447,918, 1876.

[47] U. Trick and F. Weber. SIP, TCP/IP und Telekommunikationsnetze. Oldenbourg Verlag, 2007.

[48] H. Tschofenig, E. Leppanen, S. Niccolini, and M. Arumaithurai. Automated Public Turing Test to Tell Computers and Humans Apart (CAPTCHA) based Robot Challenges for SIP. Internet-draft. 2008. 82

[49] F. Wang, Y. Mo, and B. Huang. P2P-AVS: P2P Based Cooperative VoIP Spam Filtering. 2007.

[50] H. Yany, K. Sripanidkulchaiz, H. Zhangy, Z. Shaez, and D. Saha. Incorporating Active Fingerprinting into SPIT Prevention Systems. 2007.