Secure Session Establishment with SIP in Environments containing Middleboxes

KLAUS UMSCHADEN1,2, IGOR MILADINOVIC1,2,, JOHANNES STADLER1,2 1) Institute of Communication Networks Technical University of Vienna Favoritenstrasse 9 / 388, 1040 Vienna AUSTRIA 2) Forschungszentrum Telekommunikation Wien Donau-City-Str. 1, 1220, Vienna AUSTRIA

Abstract: - The Session Initiation Protocol (SIP) is an application layer protocol for session establishment, modification and teardown. For end-to-end security of the session information, SIP uses the Secure Multipurpose Internet Mail Extension (S/MIME). However, in many environments SIP has to traverse middleboxes with or NAT service, which breaks either end-to-end confidentiality or end-to-end significance of SIP messages and therefore the end-to-end security. This paper describes this problematic and two potential solutions. Especially one of them, the MIDCOM aware SIP proxy, which can be authorized to apply S/MIME to SIP, is an interesting approach that offers end-to-end security services without increasing existing firewall and middlebox security risks.

Key-Words: - SIP, Security, VoIP, Middlebox, Firewall, NAT, MIDCOM

1 Introduction approach where the user can authorize its SIP proxy to Session Initiation Protocol (SIP) [1, 2] is a protocol for apply Secure Multipurpose Internet Mail Extension session control over packet based networks. The 3GPP (S/MIME) [10] on SIP after the middlebox has been uses SIP for the signaling in Universal Mobile prepared for the media streams via MIDCOM. This Telecommunications System (UMTS) [3] networks. It is enables end-to-end security and centralizes the evident that end-to-end security is essential for next middlebox control in the SIP proxy. We want to state generation networks and in consequence for SIP. Due to that H.323 and other signaling protocols also face the the inherently insecure architecture of packet switching problem of secure middlebox traversal and therefore the protocols like the Internet Protocol (IP) in version 4, it is proposed solutions are applicable for these protocols too. necessary to apply security services like confidentiality, However, we describe the solution on the basis of SIP. message authentication and integrity to upper-layer We conclude the paper in section 5. protocols. SIP provides security in two different manners, hop-by-hop and end-to-end. A very good overview about SIP security issues can be found in [4]. 2 Related work Applying hop-by-hop security to SIP results in securing Due to different kinds of middleboxes and cascading the signaling channel between two adjacent entities in scenarios, middlebox traversal for SIP is very complex. the signaling path. Each SIP entity is capable of Several drafts have been published at the IETF to inspecting and modifying the transported signaling provide a common way to traverse all NAT types, information. If end-users need stronger protection, they firewalls and also middleboxes that implement both, must apply end-to-end security. In section 3, we will firewall and NAT service. Today, most of these drafts point out that middleboxes may prevent end-to-end are expired. The recent way to traverse middleboxes is security. As section 2 outlines, recent publications [5, 6, Interactive connectivity management (ICE) [19]. It relies 7] concentrate on work to improve middleboxes for among other protocols on STUN [16], Traversal Using dynamic VoIP traffic. This paper focuses on Relay NAT (TURN) [20] and MIDCOM [8, 9]. ICE improvements on application level. We assume that solves the problem of middlebox traversal by providing involved middleboxes understand the MIDCOM the terminals with the capability to control their protocol [8]. Section 4 discusses potential solutions that middlebox. We describe the mechanisms, advantages allow end users to securely control their communication. and disadvantages of ICE in chapter 4.1. First we will present the terminal intelligence approach Because of the dynamic behavior of Internet telephony, of Interactive Connectivity Establishment (ICE) [19] that static firewalls cannot serve the dynamic call setup and is based on the Simple Traversal of UDP NAT (STUN) termination. Consequently, there exist some publications protocol [16]. Then we describe in more detail our that focus on firewall improvements. One suggestion is to introduce a general protocol parser into the firewall to information, which is usually enclosed in the Session support flexible implementation and adaptation of Description Protocol (SDP) [12]. The call will be application level gateways (ALGs) [15]. This protocol established. parser has an adaptation layer to the firewall in order to keep the parser portable between different firewall MESSAGE vendors. Another proposal is Secure Telephony Enabled 200 OK Middlebox (STEM) [5]. STEM is also a new firewall architecture comprised of several components. Beside an enhanced firewall, the most important element is the Fig. 2: Secure using security manager, which controls the firewall via SIP MIDCOM protocol. The architecture also allows Figure 2 illustrates an instant messaging example. SIP controlled calls to/from Public Switched Telephony identifies one additional method for instant messaging Network (PSTN). Our solution complies with this pager model [13]. In the given example, it does not make architecture. It is possible to deploy the security manager any difference whether the terminals reside behind a on the security proxy described in section 4.2. The middlebox or not. The exchanged information does not authors of STEM also describe a mechanism to detect include any network specific information like IP telephony denial-of-service (DoS) attacks on transport addresses or TCP/UDP ports for media channels. and on application level using the cumulative sum Therefore, the exchanged information (like the credit method [7]. The most important gain for VoIP regarding card number of an end-user for example) keeps its end- firewalls is the MIDCOM protocol [8,9] that enables to-end significance. dynamic middlebox control. Both of the described solutions rely on it. INVITE (IP,ports) 200 OK (IP, ports) 3 End-to-end security vs. middlebox ACK traversal For end-users that require a secure call setup, end-to-end Fig. 3: Problematic session setup security is a prerequisite. For confidentiality and The example in figure 3 is somewhat different. Two integrity security services, SIP therefore specifies users that reside behind a middlebox want to establish a S/MIME. It is necessary to state that end-to-end session using S/MIME. If one of them resides behind a authentication crossing domain borders is not easy to middlebox that implements a firewall service, it is not realize. For this purpose SIP foresees digest possible to apply confidentiality to the SIP message. The authentication [11]. As long as digest authentication is firewall requires signaling information to open the based on a shared secret, each end-user would need one firewall pinholes for the media session that is to be secret for each of his potential communication partners, established. Therefore, the middlebox must be capable of which means one secret for each address book entry. inspecting the signaling data. Otherwise, the media session cannot bypass the firewall. If one of the call INVITE (IP,ports) participants in figure 3 resides behind a middlebox 200 OK (IP, ports) implementing NAT service, it is even not possible to ACK apply integrity to the exchanged SIP messages. NAT usually operates on OSI layer 3 and 4 (network and transport layer). It translates IP addresses and TCP/UDP Fig. 1: Secure session setup in SIP ports between two domains. On further interest, please Confidentiality and Integrity of SIP signaling messages refer to [14]. SIP usually needs an ALG [15] to can be tunneled end-to-end by using S/MIME. Providing substitute IP addresses and TCP/UDP ports on integrity for SIP messages prevents network application level, for example in a SIP address. For intermediaries and SIP proxy servers from modifying the correct behavior, the NAT aware middlebox has to messages. Applying confidentiality results in signaling modify the exchanged SIP messages, which obviously messages that cannot be inspected by network breaks end-to-end integrity. intermediaries. This security benefit has a major Summarized, it is not possible to apply end-to-end drawback: If there is a middlebox implementing firewall confidentiality if one of the terminals resides behind a service between the two terminals, SIP cannot perform firewall, neither end-to-end integrity if there is a NAT in session setup any longer. Figure 1 illustrates a S/MIME the path of the signaling messages. There are two targets encrypted session setup without middleboxes. Each call that seem to contradict: End-to-end security and participant can extract the call establishment middlebox traversal. 4 Potential Solutions information to the communication partner via SDP. (3) If The problem indicated in the last section is that end- the middlebox implements NAT service, the SIP users do not want to offer network intermediaries – terminal has to create NAT bindings at the middlebox which may be hostile – the ability to read or even modify using the MIDCOM protocol. (2) The SIP terminal the signaling messages. Therefore hop-by-hop security is secures the SIP INVITE request that carries the SDP not sufficient. At the other hand, middleboxes like with the confidential information end-to-end. firewalls and NATs have to inspect and modify the Consequently, intermediaries that reside on the path signaling data. We will describe two solutions that will between the two terminals cannot read or alter the overcome the outlined problems. Both solutions rely on exchanged session information. The media can bypass dynamic middlebox control [8, 9]. the middlebox (4), since the pinholes are open due to the STUN requests and the requested NAT bindings. In the 4.1 Terminal Intelligence case of an incoming INVITE request, the terminal uses The need to establish a session arises when end-users again STUN to receive information about IP address and instruct their user agents (UAs) to initiate a media ports. Then it creates the response with the appropriate session. As long as these terminals perform the session encrypted SDP. setup, it is possible to add some kind of "intelligence" to This solution is very scalable, but it has some them in order to manipulate the middlebox. They inform drawbacks. First, if the middlebox implements NAT their middlebox of the session parameters of the sessions service, the solution relies on the fact that the middlebox that are to be established. The MIDCOM protocol [8] permits outgoing UDP traffic. This is not favorable, can be used for this purpose, to dynamically open and since malicious programs like for example Trojan horses close firewall pinholes in a middlebox. ICE [19] can use this functionality to traverse the middlebox too. additionally relies on STUN [16], TURN [20] and other Second, as discussed in [16], it does not work with all protocols to traverse middleboxes. Because of the high kind of NAT types. This is no major drawback, since complexity of ICE connection setup, we simply describe TURN [20] can be used in most call setup scenarios. a session setup using the STUN protocol. ICE uses other Third, and this is the most important point, each terminal protocols when a connection setup of STUN would fail, must be capable of controlling the middlebox for for example because of an incompatible middlebox type. dynamic opening and closing of the firewall pinholes the Figure 4 illustrates the message flow of an outgoing SIP media stream requires. The firewall control, which INVITE request that relies on STUN. usually resides at one (or several well known) central point(s) in the protected LAN, has lost its function when several end-user terminals in the domain have the 1 permission to open and close pinholes at will. Additional 1 handicaps are open security issues of STUN. 2 3 3 STUN server 4.2 MIDCOM Proxy and Authorization The major drawback of the terminal intelligence approach is that it softens the security concept of a 4 4 centrally configured and controlled firewall. Another SIP terminal middlebox solution is to instruct a trustworthy party to perform the middlebox control. We have described the technical STUN protocol details of this solution in [17]. We will call the trusted MIDCOM protocol party that manages the middlebox the security proxy, as it is a SIP . This security proxy mostly encrypted SIP signaling resides in the same domain as the terminal. Figure 5 data traffic / RTP illustrates the message flow for an outgoing INVITE Fig. 4: Secure middlebox traversal using request. The security proxy inspects the SDP of the the STUN protocol signaling messages in order to control the firewall. For NAT service, it modifies the SDP of the SIP request. The terminals use the STUN protocol that enables With the gathered information it prepares the middlebox terminals that are not aware whether they reside behind a for the intended media session (2). Before it forwards the NAT or not to resolve their publicly available IP SIP message, it applies the S/MIME tunneling to it (3). addresses and ports. The STUN server simply replies This prevents other network intermediary devices from with the IP address and TCP/UDP port the STUN getting insight of crucial signaling data. After a request came from. (1) These pinholes that have been successful response from the communication partner, the created for the STUN requests will be reused for the involved terminals establish the media sessions that media of the session. Therefore, the terminal sends this bypass the prepared middlebox (4). field is not protected from modification when the request 2 leaves the trusted domain of the sender, therefore only the integrity protected, signed header field value of the 1 request is relevant. After these modifications, the 3 security proxy of the sender domain forwards the request SIP security proxy 3 downstream. All entities on the signaling path can inspect the SIP header, but the sensitive information for the session setup is kept secret, enclosed in the encrypted 4 4 SDP. Note that all header field values that are not SIP terminal middlebox integrity protected are vulnerable against tampering attacks. SIP signaling When the security proxy of the callee receives the MIDCOM protocol request, it adds an Encr-Src header field and inserts the server name of the server that encrypted and signed the encrypted SIP signaling SDP into the header field value. This information can be data traffic / RTP extracted from the signature of the encrypted SDP. After successful decryption of the SDP it inspects and/or Fig. 5: Message flow for outgoing modifies the SDP to control the middlebox. Then the INVITE request security proxy transmits the request securely to the terminal of the callee. TLS and IPSec can be used for On the example of an outgoing INVITE request, figure 6 this purpose, as described in [18]. The terminal of the shows how the authorization of the security proxy and receiver compares the outer and the inner Encr-Src therefore the end-to-end security is guaranteed. In the header field values. The inner header field value contains example both terminals are located behind a middlebox the name of the server that was authorized by the caller and therefore need a security proxy for secure session to encrypt the SDP. The outer Encr-Src header field establishment. First, the sending terminal provides the value was added by the security proxy of the receiver request with an additional Encr-Src SIP header field. The domain and contains the name of the server that header field value contains the name of the security effectively encrypted the SDP. If the two values are proxy that is authorized to encrypt the valuable different, the SDP was encrypted by an entity that has information, which is carried by the SDP. Additionally, not been authorized to do so. Consequently, the terminal the UA tunnels the integrity of this header field and the displays a message to the callee, who can accept or Date header to protect against replay attacks using decline the incoming session invitation. The participated S/MIME. These integrity protected Encr-Src and Date parties will transmit the response containing SDP header fields are the authorization for the security proxy upstream in the same manner. to encrypt the SDP. The SIP terminal sends the SIP request with the SDP, which contains the valuable information, to its security proxy over a secure Encr-Src Privproxy1(PubUA2(SDP)) Priv (Encr-Src) connection. TLS or IPSec can be used for this purpose. UA1 PrivUA1(Encr-Src) SDP The negotiation of a secure connection between to PubUA2(SDP) adjacent SIP entities is specified in [18]. SIP proxy1

Privproxy1(Pubproxy2(SDP)) Priv (Encr-Src) Encr-Src UA1 Encr-Src Pub (SDP) Priv (Encr-Src) proxy2 UA1 PrivUA1(Encr-Src) SDP SDP SIP UA1 SIP UA2

SIP proxy1 SIP proxy2 Fig. 7: Secured INVITE request with no security proxy in the receiver domain Figure 7 illustrates the scenario if only the sender resides SIP UA2 SIP UA1 behind a middlebox. In this case, the security proxy of Fig. 6: End-to-end secured INVITE the sender controls the middlebox and encrypts the SDP request directly for the callee’s terminal, which must support this SIP protocol extension. The request contains the The security proxy inspects and/or modifies the plain encrypted SDP, its signature and the authorization. The SDP of the request to prepare the middlebox for media UA of the callee, which does not reside behind a streams like illustrated in figure 5. It substitutes this middlebox, receives the encoded SDP and decodes it. plain SDP with an encrypted one. The Encr-Src header The UA can compare the inner and the outer Encr-Src header field value. If they differ, an unauthorized server Using a MIDCOM aware security proxy to traverse has encrypted the SDP. In this case, the callee can decide middleboxes provides end-to-end security while there is to accept or decline the call. On call acceptance, the a single point – the SIP security proxy – that opens and terminal constructs an appropriate response and encodes closes the firewall. This solution requires either that the its SDP for the security proxy of the sender domain. terminals are capable of decrypting SDP that has been encrypted by a SIP proxy server (instead of end user terminals), or that the terminals use a security proxy in Priv (Pub (SDP)) Priv (Pub (SDP)) their domain. UA1 UA2 UA1 UA2

SIP proxy2 5 Conclusion End-to-end security is a pre-condition of next generation networks, which will partially rely on SIP. Middleboxes break the end-to-end security of SIP messages. Since SIP UA1 SIP UA2 many Internet users reside behind firewalls or NATs, a solution for this problem is mandatory. Our solution Fig. 8: Caller without middlebox secures INVITE request slightly increases the complexity of exchanged SIP messages, but the important end-to-end security is Figure 8 and 9 describe the scenario when the sender ensured. In contrast to ICE, it additionally preserves the does not reside behind a middlebox. In this case, the centralized firewall control. This is essential from a terminal secures the SIP INVITE request with the security perspective. We published the technical details confidential SDP end-to-end and sends it to the callee’s of this solution in an IETF draft [17]. This draft omits terminal. The security proxy of the receiver domain has the scenario where the caller does not reside behind a no insight in the session data. Consequently, it cannot middlebox. We described this missing scenario in this prepare the middlebox. paper and will add it to the draft soon.

Privproxy1(PubUA2(SDP)) Encr-Src

PrivUA1(Encr-Src) PrivUA2(Encr-Src) References: [1] J. Rosenberg et al., SIP: Session Initiation Protocol, PubUA2(SDP) SDP IETF RFC 3261, 2002 SIP proxy2 [2] H. Schulzrinne, J. Rosenberg,, The Session Initiation Protocol: Internet-centric signaling, IEEE Communications Magazine, Vol.38, No.10, 2000 pp. 134-141 SIP UA1 SIP UA2 [3] Richardson, K.W., UMTS overview, Electronics & Fig. 9: Secure 200 OK if the caller does Communication Engineering Journal, Vol.12, No.3, not reside behind a terminal 2000, pp. 93-100 [4] Stefano Salsano, Luca Veltri, SIP Security Issues: The SIP terminal of the callee will apply the technique The SIP Authentication Procedure and its Processing described above to the 200 OK. It appends the Encr-Src Load, IEEE Network Magazine, Vol.16, No.6, 2002, header containing the name of its security proxy, tunnels pp. 38-44 this header to the caller and sends this message to the [5] B. Reynolds and D. Ghosal, STEM: Secure security proxy. The message carries the confidential Telephony Enabled Middlebox, IEEE session information in the SDP and is secured with TLS Communications Magazine, Vol.40, No.10, 2002, pp. or IPSec. The security proxy inspects the SDP and 52-58 prepares the middlebox accordingly. If the callee resides [6] U. Roeding, R. Ackermann and R. Steinmetz, behind a NAT, the security proxy creates the NAT Evaluating and Improving Firewalls for IP- bindings at the middlebox and changes the SDP. Then it Telephony Environments, Proceedings of the 1st IP- encrypts the SDP for the caller and forwards the message Telephony Workshop (IPTel2000), ISSN 1435-2702, that still carries the tunneled authorization to the caller. 2000, pp.161-166. The terminal of the caller will compare the tunneled [7] B. Reynolds and D. Ghosal, Secure IP Telephony Encr-Src header with the signature of the encrypted using Multi-layered Protection, Proceedings of SDP. If the two values differ, a server that has not been Network and Distributed System Security Symposium authorized encrypted the session information. The (NDSS), San Diego, 2003. terminal prompts its user whether to accept the call or [8] P. Srisuresh et al., Middlebox communication not. architecture and framework, IETF RFC 3003, 2002 [9] R. P. Swale et al., Middlebox Communications [16] J. Rosenberg et al., STUN - Simple Traversal of (midcom) Protocol Requirements, IETF RFC 3004, User Datagram Protocol (UDP) Through Network 2002 Address Translators (NATs), IETF RFC 3489, 2003 [10] B. Ramsdell, Ed, S/MIME version 3 message [17] K. Umschaden et al., End-to-end Security for specification, IETF RFC 2633, 1999 Firewall/NAT Traversal within the Session Initiation [11] J. Franks et al., HTTP Authentication: Basic and Protocol (SIP), IETF Internet draft, 2003, work in Digest Access Authentication, IETF RFC 2617, 1999 progress [12] M. Handley, V. Jacobson, SDP: Session [18] J. Arkko et al., Security Mechanism Agreement for Description Protocol, IETF RFC 2327, 1998 the Session Initiation Protocol (SIP), IETF RFC [13] B. Campbell et al., Session Initiation Protocol (SIP) 3329, 2003 Extension for Instant Messaging, IETF RFC 3428, [19] J. Rosenberg, Interactive Connectivity 2002 Establishment (ICE): A Methodology for Network [14] W. R. Cheswick and S. M. Bellovin, Repelling the Address Translator (NAT) Traversal for the Session Wiley Hacker, AddisonWesley Publishing Company, Initiation Protocol (SIP), IETF Internet draft, 2003, 1994 work in progress [15] R. Zalenski, Firewall Technologies, IEEE [20] J. Rosenberg et al., Traversal Using Relay NAT Potentials, Vol.21, No.1, 2002, pp.24-29 (TURN), IETF Internet draft, 2003, work in progress