Secure Session Establishment with SIP in Environments Containing Middleboxes
Total Page:16
File Type:pdf, Size:1020Kb
Secure Session Establishment with SIP in Environments containing Middleboxes KLAUS UMSCHADEN1,2, IGOR MILADINOVIC1,2,, JOHANNES STADLER1,2 1) Institute of Communication Networks Technical University of Vienna Favoritenstrasse 9 / 388, 1040 Vienna AUSTRIA 2) Forschungszentrum Telekommunikation Wien Donau-City-Str. 1, 1220, Vienna AUSTRIA Abstract: - The Session Initiation Protocol (SIP) is an application layer protocol for session establishment, modification and teardown. For end-to-end security of the session information, SIP uses the Secure Multipurpose Internet Mail Extension (S/MIME). However, in many environments SIP has to traverse middleboxes with firewall or NAT service, which breaks either end-to-end confidentiality or end-to-end significance of SIP messages and therefore the end-to-end security. This paper describes this problematic and two potential solutions. Especially one of them, the MIDCOM aware SIP proxy, which can be authorized to apply S/MIME to SIP, is an interesting approach that offers end-to-end security services without increasing existing firewall and middlebox security risks. Key-Words: - SIP, Security, VoIP, Middlebox, Firewall, NAT, MIDCOM 1 Introduction approach where the user can authorize its SIP proxy to Session Initiation Protocol (SIP) [1, 2] is a protocol for apply Secure Multipurpose Internet Mail Extension session control over packet based networks. The 3GPP (S/MIME) [10] on SIP after the middlebox has been uses SIP for the signaling in Universal Mobile prepared for the media streams via MIDCOM. This Telecommunications System (UMTS) [3] networks. It is enables end-to-end security and centralizes the evident that end-to-end security is essential for next middlebox control in the SIP proxy. We want to state generation networks and in consequence for SIP. Due to that H.323 and other signaling protocols also face the the inherently insecure architecture of packet switching problem of secure middlebox traversal and therefore the protocols like the Internet Protocol (IP) in version 4, it is proposed solutions are applicable for these protocols too. necessary to apply security services like confidentiality, However, we describe the solution on the basis of SIP. message authentication and integrity to upper-layer We conclude the paper in section 5. protocols. SIP provides security in two different manners, hop-by-hop and end-to-end. A very good overview about SIP security issues can be found in [4]. 2 Related work Applying hop-by-hop security to SIP results in securing Due to different kinds of middleboxes and cascading the signaling channel between two adjacent entities in scenarios, middlebox traversal for SIP is very complex. the signaling path. Each SIP entity is capable of Several drafts have been published at the IETF to inspecting and modifying the transported signaling provide a common way to traverse all NAT types, information. If end-users need stronger protection, they firewalls and also middleboxes that implement both, must apply end-to-end security. In section 3, we will firewall and NAT service. Today, most of these drafts point out that middleboxes may prevent end-to-end are expired. The recent way to traverse middleboxes is security. As section 2 outlines, recent publications [5, 6, Interactive connectivity management (ICE) [19]. It relies 7] concentrate on work to improve middleboxes for among other protocols on STUN [16], Traversal Using dynamic VoIP traffic. This paper focuses on Relay NAT (TURN) [20] and MIDCOM [8, 9]. ICE improvements on application level. We assume that solves the problem of middlebox traversal by providing involved middleboxes understand the MIDCOM the terminals with the capability to control their protocol [8]. Section 4 discusses potential solutions that middlebox. We describe the mechanisms, advantages allow end users to securely control their communication. and disadvantages of ICE in chapter 4.1. First we will present the terminal intelligence approach Because of the dynamic behavior of Internet telephony, of Interactive Connectivity Establishment (ICE) [19] that static firewalls cannot serve the dynamic call setup and is based on the Simple Traversal of UDP NAT (STUN) termination. Consequently, there exist some publications protocol [16]. Then we describe in more detail our that focus on firewall improvements. One suggestion is to introduce a general protocol parser into the firewall to information, which is usually enclosed in the Session support flexible implementation and adaptation of Description Protocol (SDP) [12]. The call will be application level gateways (ALGs) [15]. This protocol established. parser has an adaptation layer to the firewall in order to keep the parser portable between different firewall MESSAGE vendors. Another proposal is Secure Telephony Enabled 200 OK Middlebox (STEM) [5]. STEM is also a new firewall architecture comprised of several components. Beside an enhanced firewall, the most important element is the Fig. 2: Secure instant messaging using security manager, which controls the firewall via SIP MIDCOM protocol. The architecture also allows Figure 2 illustrates an instant messaging example. SIP controlled calls to/from Public Switched Telephony identifies one additional method for instant messaging Network (PSTN). Our solution complies with this pager model [13]. In the given example, it does not make architecture. It is possible to deploy the security manager any difference whether the terminals reside behind a on the security proxy described in section 4.2. The middlebox or not. The exchanged information does not authors of STEM also describe a mechanism to detect include any network specific information like IP telephony denial-of-service (DoS) attacks on transport addresses or TCP/UDP ports for media channels. and on application level using the cumulative sum Therefore, the exchanged information (like the credit method [7]. The most important gain for VoIP regarding card number of an end-user for example) keeps its end- firewalls is the MIDCOM protocol [8,9] that enables to-end significance. dynamic middlebox control. Both of the described solutions rely on it. INVITE (IP,ports) 200 OK (IP, ports) 3 End-to-end security vs. middlebox ACK traversal For end-users that require a secure call setup, end-to-end Fig. 3: Problematic session setup security is a prerequisite. For confidentiality and The example in figure 3 is somewhat different. Two integrity security services, SIP therefore specifies users that reside behind a middlebox want to establish a S/MIME. It is necessary to state that end-to-end session using S/MIME. If one of them resides behind a authentication crossing domain borders is not easy to middlebox that implements a firewall service, it is not realize. For this purpose SIP foresees digest possible to apply confidentiality to the SIP message. The authentication [11]. As long as digest authentication is firewall requires signaling information to open the based on a shared secret, each end-user would need one firewall pinholes for the media session that is to be secret for each of his potential communication partners, established. Therefore, the middlebox must be capable of which means one secret for each address book entry. inspecting the signaling data. Otherwise, the media session cannot bypass the firewall. If one of the call INVITE (IP,ports) participants in figure 3 resides behind a middlebox 200 OK (IP, ports) implementing NAT service, it is even not possible to ACK apply integrity to the exchanged SIP messages. NAT usually operates on OSI layer 3 and 4 (network and transport layer). It translates IP addresses and TCP/UDP Fig. 1: Secure session setup in SIP ports between two domains. On further interest, please Confidentiality and Integrity of SIP signaling messages refer to [14]. SIP usually needs an ALG [15] to can be tunneled end-to-end by using S/MIME. Providing substitute IP addresses and TCP/UDP ports on integrity for SIP messages prevents network application level, for example in a SIP address. For intermediaries and SIP proxy servers from modifying the correct behavior, the NAT aware middlebox has to messages. Applying confidentiality results in signaling modify the exchanged SIP messages, which obviously messages that cannot be inspected by network breaks end-to-end integrity. intermediaries. This security benefit has a major Summarized, it is not possible to apply end-to-end drawback: If there is a middlebox implementing firewall confidentiality if one of the terminals resides behind a service between the two terminals, SIP cannot perform firewall, neither end-to-end integrity if there is a NAT in session setup any longer. Figure 1 illustrates a S/MIME the path of the signaling messages. There are two targets encrypted session setup without middleboxes. Each call that seem to contradict: End-to-end security and participant can extract the call establishment middlebox traversal. 4 Potential Solutions information to the communication partner via SDP. (3) If The problem indicated in the last section is that end- the middlebox implements NAT service, the SIP users do not want to offer network intermediaries – terminal has to create NAT bindings at the middlebox which may be hostile – the ability to read or even modify using the MIDCOM protocol. (2) The SIP terminal the signaling messages. Therefore hop-by-hop security is secures the SIP INVITE request that carries the SDP not sufficient. At the other hand, middleboxes like with the confidential information end-to-end. firewalls and NATs have to inspect and modify the Consequently, intermediaries that reside on the path signaling data. We will describe two solutions that will between the two terminals cannot read or alter the overcome the outlined problems. Both solutions rely on exchanged session information. The media can bypass dynamic middlebox control [8, 9]. the middlebox (4), since the pinholes are open due to the STUN requests and the requested NAT bindings. In the 4.1 Terminal Intelligence case of an incoming INVITE request, the terminal uses The need to establish a session arises when end-users again STUN to receive information about IP address and instruct their user agents (UAs) to initiate a media ports.