<<

Information Security Issues in Voice Over Protocol

Satya Bhan Jonathan Clark Joshua Cuneo Jorge Mejia-Ramirez

CS 4235 Fall 2006 Table of Contents

I. Introduction.…………………………………………………………………………1

II. An Overview of VoIP……………………………………………………….……..1

VoIP Protocols…………………………………………………………..……3

III. Common VoIP Security Threats…………………………………………...……..6

Denial of Service Attacks……………………………………………….…… 7

Eavesdropping………………………………………………………….….… 8

Spoofing…………………………………………………………………....…9

Theft of Service…………………………………………………………...…. 10

Spam over Internet Telephony (SPIT)………………………………….....….11

IV. VoIP Algorithms……………………………………………….…….12

PGPfone………………………………………………………………………12

Motivation………………………………………………………………....….12

Technical Details………………………………………………...... … 15

Secure Real-time Transport Protocol………………………………..………..16

ZRTP and ……………………………………………………...……….18

ZRTP………………………………………………………………….18

Zfone………………………………………………………………….20

Skype………………………………………………………………………… 20

V. Research and Development to Improve VoIP Security…………………...……… 23

Locating Users in a Secure and Reliable Way………………………..…...… 23

Current State and Motivation to Change…………………………….. 24

Proposed Scheme…………………………………………..……...….25

Monitoring VoIP Networks………………………………………………….. 26

Motivation………….………..……………………………….……… 26

Current State………………..………………………………….…….. 26

Proposed Idea………………………………………………………….27

Intrusion Detection and Prevention on SIP……………………………………28

The Prototype………………………………………………….………29

VI. Concluding Remarks………………………………………..…………………… 30

VII. Works Cited………………………………………………..…………………… 32

VIII. Glossary……………………………………………………...………………… 35

1

I Introduction

Voice Over Internet Protocol (VoIP) is the routing of voice communications over any kind of digital, IP-based network. Although VoIP has been in existence for a long time, it has become a predominant technology within the past few years as users realized its advantages and as companies started offering cheap, easy-to-use

VoIP-based services. However, like any other new technology, the rise of new issues has accompanied the rise of VoIP, and because this technology is still in its infancy, there are a number of competing methods to deal with these issues.

This paper will examine many of the most common VoIP-related security issues and some existing and proposed solutions. The discussion begins with an overview of VoIP and its related protocols to provide the necessary technical background, followed by a summary of common security vulnerabilities and cryptographic techniques for securing voice communications. The last section lists some standard VoIP security measures proposed at an IEEE conference earlier this year.

II An Overview of VoIP

VoIP is a general term referring to the digitization of an analog voice- generated , the transmission of that signal over any IP network, and the transformation back to an analog voice signal at the receiving end. It includes any software, hardware, or protocols—such as H.323 and SIP, discussed later in this paper—related to this transformation (Vagle). Although voice communication travels over the network in packets just like any other data, VoIP cannot be protected just using existing network architecture. The nature of VoIP adds a number of 2 information security and other complications to the network (Kuhn) as discussed in

Section III.

The concept of a voice dates back to 1973 and the experimental Network Voice Protocol developed for the ARPANET, the world’s first packet switching network and the precursor to the internet. For many years, VoIP was a technological prospect for future development. Within the past half decade, however, technology companies have started offering a variety of VoIP services, including a digital interface with a traditional telephone handset, conferencing units that provide VoIP-based conference calls, mobile VoIP units, and PC or “softphone” units that require only a headset and computer (Kuhn).

These new services offer users many advantages. VoIP offers increased functionality and can facilitate tasks that are more difficult using public switched telephone network (PSTN) lines. VoIP is extremely mobile and allows users to travel anywhere in the world and still make and receive phone calls. Furthermore, because

VoIP bypasses long-distance telephone networks by using existing IP networks, users can make global phone calls at local rates or less. As a result, users have discovered that VoIP offers a cheaper and clearer alternative to traditional PSTN systems, and more organizations and individuals have been moving voice transactions to VoIP systems (Kuhn).

However, VoIP technology is still new, so its widespread use means that many data networks are open to a host of new security vulnerabilities that VoIP developers have not yet corrected. A proper examination of these vulnerabilities must begin with an analysis of the dominant protocols used in VoIP. The next section provides an overview of these protocols. 3

VoIP Protocols

The establishment of a VoIP communication channel requires a complex series of packet exchanges. Computer systems are addressed using IP addresses, so when the user dials a recipient’s number, several protocols help resolve this number into the corresponding IP address. Once the recipient answers, an analog-digital converter transforms the voice communication to a digitized form. VoIP then parses the voice data into packets that use the Real-time Transfer Protocol (RTP), for RTP has special header fields that hold data needed to reassemble the packets into a continuous voice stream on the recipient’s end. These packets are carried over the internet using the UDP protocol so that network nodes can process them as ordinary data packets. On the recipient’s end, the process is reversed. Data is extracted from the RTP and reassembled, and another analog-digital converter transforms the packets back into analog sound. Figure 1 illustrates this process.

Figure 1. Voice data processing is a VoIP system. (Kuhn)

4

A given VoIP network most likely uses one of two protocols that govern the overall transmission of voice communications: H.323 and SIP.

H.323 is an International Telecommunications Union (ITU) umbrella specification that defines a series of protocols for audio-visual communication sessions on any packet network. For instance, it uses one standard protocol to negotiate the establishment of a connection and another protocol to create a channel for the flow of RTP packets and to establish the audio codecs used for the voice data.

H.323 provided some of the first available standards that satisfied the requirements of

VoIP, so it has been widely adopted in a number of VoIP networks (H323 Overview).

The standard specifies the four components of a complete network necessary for multimedia communications: terminals, gateways, gatekeepers, and multipoint control units (MCUs). These components can be seen in Figure 2.

Figure 2. Components necessary for multimedia communication (Kuhn)

The terminal is the end user device, such as a PC or analog telephone. The gatekeeper provides address resolution and bandwidth control on the H.323 network and may use a Back End Service (BES) to maintain data about the network’s users.

The gateway functions as a bridge between the H.323 network and the outside world, 5 enabling the transmission of voice data over non-H.323 devices. An MCU is an optional device that allows voice conferencing between more than two end users

(Kuhn).

The Session Initiation Protocol (SIP) is a protocol and proposed standard for handling interactive multimedia user sessions through a variety of media, including

VoIP. As opposed to H.323, a user is not bound to a specific host but instead reports his or her location to a registrar who in turn stores it in a location server. When a user wishes to establish a line of communication with another user, a message is sent to a proxy or redirect server which resolves the specified destination to an IP address using the location server. The server then sends the message to the recipient’s proxy server. This process can be observed in Figure 3.

Figure 3. The SIP process (Kuhn)

During the setup process, the Session Description Protocol (SDP) helps communicate the appropriate logistical information, such as codecs. When a user wishes to contact another user, the recipient replies with an “OK” message that includes the recipient’s call preferences in SDP format. All information is transferred 6 through one port in a simple text format, as opposed to the complicated port switching found in H.323 networks.

III Common VoIP Security Threats

The prominence of cheap, readily deployable voice services has come with a massive price tag on security and privacy which may be exploited in the near future if a profitable motive arises. “Security administrators might assume that because digitized voice travels in packets, they can simply plug VoIP components into their already secured networks and get a stable and network.” (Walsh and

Kuhn 44). However, existing firewalls cannot efficiently handle new VoIP protocols—such as the aforementioned SIP and a wide range of vendor proprietary protocols—since they relay on dynamic port ranges and do not support Network

Address Translation (NAT) very well.

Some newer firewalls (such as Session Border Controls, or SBCs) address most of these problems, but most firewalls, Intrusion Detection Systems (IDS),

Intrusion Prevention Systems (IPS) and similar security devices rely on deep packet inspection techniques. These techniques introduce delay and jitter to the VoIP packet streams, thus impacting overall Quality of Service (QoS). In VoIP, the maximum packet delay is set to 150 ms (and even higher in some cases), but the multi-layer nature of security infrastructure could add significant delays and jitter that would make the VoIP services unusable (Materna 3). Therefore, network administrators only implement these common security techniques to VoIP networks sporadically to avoid QoS issues, but this haphazard implementation has left VoIP vulnerable to traditional network threats. This section discusses some of these threats as they apply to VoIP. 7

Denial of Service Attacks

Service availability attacks are viewed as the most harmful to VoIP due to its direct impact to customers, resulting in loss of revenue and profit, system downtime and loss of productivity. They are especially destructive to services such as E-911

(Emergence Response Service 911 on VoIP), where disruption could lead to catastrophic damages. “Latency turns traditional security measures into double-edged swords for VoIP” (Walsh and Kuhn 45). As discussed above, traditional security measures such as encryption and firewall can protect VoIP networks, but they also introduce significant delay. Latency isn’t just a QoS issue but also a security issue because it increases the system’s susceptibility to denial-of-service (DoS) attacks.

Unlike data networks, where partial DoS attacks would only cause loss of bandwidth and thus slow down network traffic, delaying voice packets in a VoIP network for only a fraction of a second would cause them to become unintelligible at the destination and render the service unusable. The necessary impediment is even less when latency-producing security devices are slowing down traffic.

Another problem that makes VoIP extremely susceptible to DoS attacks is packet loss during transmission. Given VoIP’s real-time nature, data is never stored in a VoIP scenario, so any packet loss cannot be retransmitted like ordinary data networks. Fortunately, since the packets in voice networks are small (generally ten to

50 bytes), loss of a single packet would hardly affect the voice transmission.

However, in most traditional IP networks, buffered transmission generally results in the loss of all packets being delivered at the same time. “Packet losses as low as one percent can make a call unintelligible, depending on the compression scheme used. A five-percent loss is catastrophic, no matter how good the codec.” (Walsh and Kuhn

46) 8

Therefore, computer worms could easily target VoIP networks since the loss of bandwidth could potentially knock out the network, though it might not disrupt conventional IP networks. Also, the need for gateways for the interaction between

VoIP networks and the traditional PSTNs has created soft spots for new attacks.

These attacks may be aimed at either network and can include Destination

Unavailable (DUNA) or Signaling Congestion (SCON) attacks.

Eavesdropping

“With conventional telephones, eavesdropping requires tapping a line or penetrating a switch” (Walsh and Kuhn 46). Physical access to the PTSN telephone cable also makes eavesdropping harder and more detectable. Furthermore, proprietary protocols and specialized software make the process very difficult. However, the convergent nature of the VoIP and IP services—with VoIP and data often transmitted through the same logical network—gives attackers convenient and secure access for eavesdropping. Standardized protocols, along with readily available tools to monitor and control network packets, make this process almost trivial. “Good quality open source packages are available for such monitoring, including both SIP and H.323 plug-ins for packet sniffers such as the popular Ethereal analyzer

(www.ethereal.com). Voice Over Misconfigured Internet Telephones

(http://vomit.xtdnet.nl), a publicly available utility with an unfortunate acronym

Vomit, converts standard tcpdump (http://sourceforge.net/projects/tcpdump) format files into .wav files that any computer can play” (Walsh and Kuhn 46). Other utilities like Tcpdump, available for both and Windows, make VoIP eavesdropping accessible to anyone with a PC and internet.

The software distributions (generally available for download via the provider’s website) for VoIP services also increase the potential for eavesdropping. A technical 9 hacker can modify the software update and host it for download via a rogue server.

Using familiar TCP attacks such as Address Resolution Protocol (ARP) cache- poisoning techniques (changing the MAC address associated with a particular IP address) to substitute a rogue server for the correct one, the attacker can cause users to download the hacked software. “An even easier attack is to set up a rogue server with modified configuration files containing the IP addresses of call managers. Victims’ calls are then routed through the attacker’s call manager, providing eavesdropping and traffic analysis opportunities” (Walsh and Kuhn 46).

The increasing use of VoIP services in the Critical Infrastructures (CI) sector has also made eavesdropping a critical issue. Confidentiality of conversations is required for many CI services. “IP telephony can open the doors for eavesdropping or sniffing on both signaling traffic and media traffic. Current IP telephony deployments provide very few protections from eavesdropping and sniffing, especially against inside intruders” (Feng and Cao 3).

Spoofing

Identity management is extremely complicated in the VoIP scenario because it is not necessary to have a physical device attached to a VoIP number. This issue is further complicated by the use of Universal Reference Identification (URI) by some providers for user identification. “How to distribute the identification information linked together to different parties is another challenge for deploying IP telephony”

(Feng and Malik 140). The lack of standards makes VoIP extremely susceptible to spoofing attacks. For example, the attackers can spoof the IP addresses as well as caller identification to deceive the callee in a VoIP session.

Another known spoofing vulnerability in VoIP is the ability to spoof the caller’s identification information that gets displayed to the callee. Using a SIP- 10 enabled VoIP hardware such as the Cisco ATA 186 Analog Telephone Adaptor, the attacker only needs to call up a regular phone line, place the caller on hold and flash over to a dial tone using the three-way call feature, and then call a second party for this to work. The caller’s ID information that tends to show up is the first called party's telephone number with either their name listed or "unknown name" showing on a conventional caller ID-enabled telephone (Wosnack). This attack is extremely dangerous, especially in corporations and the CI sector where it could be used to break into voice mail accounts or for Private Branch eXchange (PBX) exploitations with the aim at gathering proprietary information. It also allows the attacker to use social engineering to commit telephone and toll frauds.

Theft of Service

In the recent Edwin Pena and Robert Moore VoIP fraud, the accused criminals secretly routed more than ten million calls through unsuspecting companies while selling telephony service cheap to customers, blatantly exposing the immaturity of

VoIP security. By using dummy servers to conduct millions of scans for vulnerabilities on computer networks, Pena orchestrated a "brute force" attack to identify the prefixes needed to gain access to VoIP networks (Dunn). The attack could have also been used to conduct toll frauds by setting up a calling company in a third world country with calling rates as high as $5 a minute, then placing calls to it using hacked VoIP networks or user accounts. The unsuspecting users and companies would be left with the bill while the attacker enjoyed pure profits.

Security vulnerabilities of the user’s software could also be targets for attacks by hackers. Sniffing user accounts and passwords would again give attackers means for abusing VoIP networks for profitable frauds such as identity theft, long distance, or toll frauds. The clear trend, though, shows that hacking the VoIP segment can be 11 quite profitable, and companies should expect more attacks. Besides causing financial damages to the unsuspecting parties, theft of service also severely impacts the availability of a system and the QoS of VoIP services.

Spam over Internet Telephony (SPIT)

Analogous to the email spam problem in data networks, security analysts have envisioned a major attack of voice and video in VoIP networks. Even though mass advertising attacks have been launched by advertising agencies on the regular PSTN network, the complexity and costs of doing so are prohibitive for mass harassment. However, SPIT becomes a major issue without traditional telephony lines. The access to millions of internet phones and traditional PSTN phones via the internet at extremely low costs is a resource just waiting to be abused by attackers once penetration of VoIP services have gained significant momentum. SPIT poses a potentially critical threat to VoIP services as millions of unwanted voice messages

(i.e. advertisements) could overwhelm customers. Although this attack seems extremely similar to email spamming attacks, and there are advanced solutions such as blacklists and quarantines developed to combat email spam, applying those technologies to VoIP networks would be extremely hard given its real-time nature and difficulty in deciphering the content of the message. SPIT attacks that target the

PSTNs from the VoIP networks would almost be impossible to block.

There are also concerns of session hijacking in VoIP, whereby an attacker would be able to capture a video conference channel and transmit advertisements instead. Similar attacks would also be possible on voice conversations which could be hijacked for impersonation or broadcasting mass messages. 12

IV VoIP Encryption Algorithms

Due to the variety of threats that VoIP technology poses, there have been some attempts to better secure the technology. One very active topic of discussion and development is the use of encryption for voice over IP calls. While the area of encryption is active, it is also relatively new and, therefore, not many solutions exist.

Additionally, some of the solutions that do exist are still in development and therefore not necessarily considered secure yet. However, new developments have continued, and some more standardized technologies do exist.

PGPfone

PGPfone was developed by , who is also the creator of the original (PGP) email encryption, which is still widely used to encrypt email. However, other than name and creator, PGPfone and PGP have nothing in common.

PGPfone was released in 1995, long before broadband became widespread. It has since been abandoned in favor of Zimmermann's new voice over IP encryption,

ZRTP. While it was written to work over the internet in a standard voice over IP configuration, it was designed with direct modem-to-modem connections in mind.

According to Zimmermann’s PGPfone user's manual, PGPfone uses "biometric signatures (your voice) to authenticate the exchange, triple-DES, CAST, or

Blowfish for encrypting the voice stream, and GSM for the speech compression"

(Zimmerman, “PGPfone”).

Motivation

Before getting too far into the technical details of the encryption algorithms and protocols, it is important to explore the reasons for their existence. While every reason that could be thought of comes back to the basics of confidentiality, integrity, 13 and authenticity, the question is why some people feel that these are in question. In his PGPfone user's manual, Phil Zimmermann explains the reason he created PGPfone in the first place. As this was essentially the first encryption technique aimed specifically at voice over IP, his reasons are important.

Zimmermann states that in the United States, citizens have a right to privacy provided by the Constitution. “Privacy is as apple-pie as the Constitution"

(Zimmerman, “PGPfone”). He says that "the right to privacy is spread implicitly throughout the Bill of Rights" (Zimmerman, “PGPfone”) and explains that the reason it was not explicitly written in is because at the time of its writing, there was no need.

The only way to communicate with someone was to have a conversation, and as long as there was no one within earshot, the conversation was private. The concept of long distance voice communications was foreign to the founders. This changed with the invention of the telephone. In modern times, many conversations take place over wires that can be listened to without either speaker’s knowledge. For this reason,

Zimmermann wrote and released PGPfone.

Zimmermann also explains the reasons that he felt it was necessary to release it when he did. First of all, he recalls that he released the original PGP email encryption because Senate Bill 266 (which was eventually defeated) was currently under debate in Congress. If this bill had passed, it "would have forced manufacturers of secure communications equipment to insert special 'trap doors' in their products, so that the government could read anyone's encrypted messages" (Zimmerman,

“PGPfone”). Having this as motivation, he wrote and released the original PGP and made it "available to the American public before it became illegal to use it"

(Zimmerman, “PGPfone”) for free. 14

His motivation for releasing PGPfone was similar. He discusses the 1994

Digital Telephony bill that did pass and "mandated that phone companies install remote wiretapping ports in their central office digital switches" (Zimmerman,

“PGPfone”). He calls this "point-and-click" wiretapping. In other words, federal agents no longer have to go attach clips and other equipment to phone lines but instead can just tap any line from their offices.

Of course, the law still requires a court order for a wiretap. But while

technology infrastructures tend to persist for generations, laws and

policies can change overnight. Once a communications infrastructures

optimized for surveillance becomes entrenched, a shift in political

conditions may lead to abuse of this new-found power. Political

conditions may shift with the election of a new government, or perhaps

more abruptly from the bombing of a Federal building (Zimmerman,

“PGPfone”).

It could be argued that this line was a prediction of the things to come. It recently came to light that after the September 11 terrorist attacks, the Bush administration put in place a program to wiretap international calls without a warrant. While the legality of these taps is still being debated, it certainly is a policy shift of the sort

Zimmermann thought could be coming, and it was in response to a threat he mentioned. His manual goes on to state a number of other reasons why he developed and released the software, but his point throughout is clear. He believes that everyone has a right to privacy and he has taken it upon himself to protect that right for himself and others.

15

Technical Details

The first thing that a PGPfone connection must do is to generate a secret key to use for its symmetric encryption techniques. To accomplish this, Diffie-Hellman, a standard private key generation algorithm, is used. Since this algorithm requires both sides to agree upon a prime number, there is a relatively secure way of selecting a prime. Note that the prime does not have to remain secret for Diffie-Hellman to be secure, but it adds more protection. However, while a decently secure exchange was implemented, the lists of primes in PGPfone were static, so there was not much value in the protections because the primes were not random. Despite this, the manual states that "merely finding out that Alice and Bob share a private prime seems to be of such limited value to an attacker that it seems not worthwhile to defend against such attacks" (Zimmerman, “PGPfone”). After the prime is agreed upon, Diffie-Hellman continues as normal to let the two sides agree upon a key.

Once a key is agreed upon, there is no more unencrypted information sent across the . Everything from this point on is encrypted using one of three ciphers.

These ciphers are: TripleDES, CAST, and Blowfish, and they all run in counter mode.

Counter mode turns a block cipher into a sort of stream cipher by XORing the data with the result of block encryption performed on the combination of what is known as a nonce and a random number (the counter). The nonce is a random value that is somehow precomputed by both parties. The counter and nonce can be combined through either concatenation, addition, or an XOR. Then, the combined counter and nonce are encrypted in the block cipher using the previously agreed upon key. The output of this operation is then XORed with the plaintext, outputting your ciphertext to send. To decrypt, the exact same procedure is performed, except replacing the plaintext with the received ciphertext, and the output is of course the original 16 plaintext. This is secure because you have to be able to compute the encryption with the block algorithm, which you need the key for, in order to XOR it with the ciphertext.

Since counter mode "does not actually encrypt or decrypt any of the original data" (Zimmerman, “PGPfone”), it can be computed before the data is even ready.

This provides the benefit of being able to use a block cipher as a stream cipher because the XOR can be done as the bits become available and does not have to wait for a full block of data to become available. Additionally, the fact that the counter mode data can be precomputed allows for a reduction in the overhead of encryption.

(“Block Cipher Modes”)

Through these methods, your data is kept secure. Additionally, there is

"biometric" protection from man-in-the-middle attacks. This is the simplest part of the whole algorithm. At the beginning of each call a short authentication code is generated and displayed on your screen. You read that code aloud to the person on the other end. If they have the same code, then no man-in-the-middle attack has occurred.

The manual recommends that each side read half of the displayed code.

Secure Real-time Transport Protocol

The Secure Real-time Transport Protocol (SRTP) was created in Request For

Comments (RFC) 3711 in March 2004 and is a secure version of the Real-time

Transport Protocol (RTP) used to carry voice and video data, as discussed in Section

II. Part of SRTP also involves defining the Secure Real-time Transfer Control

Protocol (SRTCP), which is a secure version of Real-time Transfer Control Protocol

(RTCP). Beyond this, the creators list a number of goals in the RFC, such as ensuring

"the confidentiality of the RTP and RTCP payloads, and the integrity of the entire 17

RTP and RTCP packets, together with protection against replayed packets"

(Baugher).

The creators also state that "These security services are optional and independent from each other, except that SRTCP integrity protection is mandatory

(malicious or erroneous alteration of RTCP messages could otherwise disrupt the processing of the RTP stream)" (Baugher). A user could run SRTP with almost no protection enabled, meaning that the user would be running roughly the equivalent of regular RTP. The creators also list a number of non-security related goals, most of which pertain to speed, memory, and size issues.

Another goal is "a framework that permits upgrading with new cryptographic transforms" (Baugher). In other words, the creators aim for it to be upgradeable if new and better encryption algorithms emerge. This should allow the protocol to remain current and secure in the future, which is a major issue in encryption technology. By being upgradeable, if a problem is ever found with their default algorithm, the creators can switch to a different one.

The first thing that SRTP does when a connection is made is a key exchange.

According to the RFC, "interoperable SRTP implementations MUST use the SRTP key derivation to generate session keys" (Baugher). However, it does not specify the number of packets the user wants to send before generating a new key. The key exchange itself is completely defined in the RFC and involves complicated math that takes place on a random master key and master salt. In the end, a secure key is generated for use with the cipher of choice. In addition to leaving open the number of packets sent under one key, "The upper limit on the number of packets that can be secured using the same master key […] is independent of the key derivation"

(Baugher). 18

Besides the key exchange, the technology behind SRTP essentially amounts to standard encryption techniques applied to voice and video data, with a few relatively small changes and additions to make the process run smoothly. In the RFC, they state that although "there are numerous encryption and message authentication algorithms that can be used in SRTP, […] we define default algorithms in order to avoid the complexity of specifying the encodings for the signaling of algorithm and parameter identifiers" (Baugher). The default cipher used in SRTP is Advanced Encryption

Standard (AES), and it runs in one of two modes. The first mode is Segmented Integer

Counter Mode AES (or just counter mode) and the second is AES in f8-mode. The implementations of each is quite complicated, but the main idea is to take the normal

AES block cipher and turn it into a sort of stream cipher, which is suitable for RTP data. Among these, the default is counter mode. Additionally, HMAC-SHA1, another encryption algorithm, is used to guarantee message authenticity, just as in PGPfone.

ZRTP and Zfone

ZRTP

The newest addition to VoIP encryption technology is ZRTP. This technology is written by Phil Zimmermann and is his successor to PGPfone. It is not quite complete, so the RFC for it is still in draft form, but many of the technical details are filled out. The most exact definition of the technology is given in the title of this RFC:

"Extensions to RTP for Diffie-Hellman Key Agreement for SRTP" (Zimmermann,

“ZRTP”). ZRTP implements a Diffie-Hellman key exchange to get the key instead of the original key exchange used in SRTP. Among its features are protections against man-in-the-middle attacks without any reliance on a public key infrastructure (PKI).

The lack of reliance on a PKI is important because "deploying centrally managed

PKIs can be a painful and often futile experience" (Zimmermann, “ZRTP”). Instead, 19

ZRTP uses a couple of different techniques to provide protection.

As stated, ZRTP uses Diffie-Hellman key exchange, which can be vulnerable to man-in-the-middle attacks. This problem is solved in ZRTP in two different ways.

The first is through authentication strings, which are similar to PGPfone. Again, it works by generating a short string to be read by each party and compared. If they are the same, then no attack has occurred. If an attack has occurred, the parties will have two different string because there will really be two connection in place (party A to man-in-the-middle and man-in-the-middle to party B). One important aspect of this scheme is that the voice of the person who reads the string must be the same voice of the person with whom the caller is having a conversation, or else the attacker could read the string to each side. In theory, this should be easy to recognize and should not be an issue.

Unfortunately, the biggest risk here is a user's laziness. It is quite probable that users would simply not read the strings given to them because it is extra work.

Additionally, it is possible that an answering machine that cannot authenticate the message would answer the call. With these problems in mind, another protection was added. This is a "shared secret" generated after the first key exchange that is then cached on both sides and used in generating later keys. This secret changes with every session and is always used in generating a key for the next call. The only time there would be no shared secret would be during the first call between the parties or if one side lost their cache. Using this, in order to be successful, an attacker would have to be present for the first call when the initial key is generated without the secret and then every call after that to see changes to the secret. Therefore, reading the authentication strings would only be necessary when a new key is generated without an existing secret (Zimmermann, “ZRTP”). 20

Another feature of ZRTP is to be backwards compatible with regular RTP.

The reason for this is that it uses an RTP header extension to perform all of the encryption handshaking and key exchanges at the same time that other RTP setup information is being exchanged. The RTP protocol is designed to ignore any header extensions that it does not recognize, so if a user tries to use ZRTP to connect to someone that only has standard RTP, that user will get a perfectly working but insecure connection. This will at least provide convenience, a smooth upgrade path, and, a possible widespread adoption.

Zfone

Zimmermann also wrote the ZRTP protocol that has been implemented into

Zfone. The program runs on Linux, Mac OSX, and Windows XP. The software "lets you turn your existing VoIP client into a secure phone" (Zfone Home Page). To secure a call, it "intercepts and filters all the VoIP packets as they go in and out of the machine, and secures the call on the fly" (Zfone Home Page). In other words, it does not implement a whole new voice system but simply detects any RTP-based VoIP calls being made and encrypts the traffic providing both sides are running Zfone. It is essentially transparent to the user, though it does have a separate user interface to tell the user whether his or her call is secured and to display the short authentication string that he or she should read. Additionally, Zimmermann has created a ZRTP software development kit (SDK) that will be available to license "for VoIP developers to integrate this protocol into their VoIP applications, for both software and hardware

VoIP clients" (Zfone Home Page).

Skype

The final popular encryption technique is the closed source and closed specification program Skype. Since the program is closed and the workings are 21 hidden, good information on it is scarce. However, in April 2005, Skype invited Tom

Berson of Anagram Laboratories to do a thorough evaluation of Skype's security.

Berson was given "unimpeded access to Skype engineers and to Skype source code"

(Tom Berson). He then published his findings in a report on October 18, 2005. Before he begins, he makes it clear that he thinks Skype has a good security model, and he says, "I have found out a lot about Skype. The more I found out, the happier I became" (Tom Berson). However, he does point out a possible weakness in his evaluation when he states, "This report represents a four-month evaluation. A longer evaluation effort might uncover problems not yet seen," (Tom Berson) which leaves the door open for weakness that he could have missed.

Berson starts out with a summary of his findings. According to him, "the cryptographic systems engineered […] are well-designed and correctly implemented"

(Tom Berson). He also says that "Skype uses only standard cryptographic primitives"

(Tom Berson). Among these primitives are "the AES block cipher, the RSA public- key cryptosystem, the ISO 9796-2 signature padding scheme, the SHA-1 hash function, and the RC4 stream cipher," (Tom Berson) all of which are standard and widely used encryption algorithms. He also says that all of these are implemented correctly and conform to all standards. He goes through each cipher explaining their specific purposes and how they conform to standards. The implementation is similar to other protocols. For instance, AES is used to encrypt the actual communications.

However, Skype also uses a central server for a variety of things, such as a public key infrastructure to verify authenticity. He then lists some attacks that are possible and how they may affect Skype. None of them appear to pose any serious threat according to his research. In conclusion he says, 22

The designers of Skype did not hesitate to employ

widely and well in order to establish a foundation of trust,

authenticity, and confidentiality for their peer-to-peer services.

The implementers of Skype implemented the cryptographic

functions correctly and efficiently. As a result, the confidentiality

of a Skype session is far greater than that offered by a wired or

wireless telephone call or by email and email attachments (Tom

Berson).

He also says that he looked for any holes or malware in the Skype code and found no evidence of any (Tom Berson).

Berson's security evaluation answers quite a few questions about Skype and its security but still leaves some concerns. First of all, this report was performed by one person and was obviously company-sponsored, so the credibility of the results is in question. For this reason, it is difficult to trust the report in full. Also, Skype's insistence on staying closed—and its focus on stopping any attempts at reverse engineering or otherwise testing their software—makes it difficult to confirm or deny anything that was claimed in the evaluation. While staying closed might provide some temporary protection, it could be only a matter of time until someone breaks it. In fact, a small Chinese company recently claimed that they broke the protocol and will be releasing an entirely compatible application in a short time (Al Sacco). Whether this is true or not is still in question, but the proprietary nature of Skype is obviously under attack and, therefore, even if this attempt was not successful, there is definitely a commercial incentive to hack the system, so the company may expect more attacks. 23

V Research and Development to Improve VoIP

Security

As the previous sections have demonstrated, the rise of VoIP technology presents a number of security and management challenges. Unfortunately, because

VoIP is still in its infancy, there are no standard solutions for addressing these challenges. Therefore, these challenges demand new conceptual and pragmatic solutions from researchers in government, academic, and private organizations.

One of the most important workshops on VoIP technologies was the 1st IEEE workshop on VoIP Management and Security of 2006 ("The 1st IEEE"). This massive workshop united both private companies and major university research centers with the objective of creating the first collaborative research vision on the management of VoIP and the security of related infrastructures. The result of the workshop was an exploratory forum with researchers from all over the world proposing new solutions and alternatives to improve VoIP security. This section provides a concise discussion on the major developments and the most innovative proposals of this workshop along with the technological problems that inspired those proposals.

Locating Users in a Secure and Reliable Way

Lei Kong, Vijay Arvind Balasubramaniyan and Mustaque Ahamad proposed a new lightweight scheme for securely and reliably locating SIP users. These researchers are part of the Georgia Tech College of Computing and claim that one of the most important problems facing VoIP is locating the communicating parties via the internet in a secure and reliable way. 24

Many companies are exploring a variety of security mechanisms and different algorithms that include the use of SIP. However, the authors claim that these algorithms are weak and expensive to deploy, and they propose a new, alternate scheme to protect the integrity of SIP contact addresses. They also point out that this would achieve a high availability of SIP services through replication. For this to happen, it is essential to have an end user public key distributed through the scheme that can also be used for end-to-end user authentication and for a session key exchange (Kong).

Current State and Motivation to Change

As discussed in previous sections, VoIP sessions are peer-to-peer connections, that is, one terminal is allowed to contact another terminal without intermediaries.

However, when the sending terminal places a call, the recipient terminal needs to be located on the Web before the session can start. This adds a level of complexity for large-scale public use because a static set-up becomes unfeasible. Therefore, there is a need for infrastructure to discover and locate dynamic VoIP endpoints (Kong).

The integrity of the mapping from SIP to contact address is critical to the security and reliability of VoIP. If it is possible to change the contact address, an attacker could launch a denial of service attack, but even more important, an impersonation attack. Moreover, the authors believe that is also important to distribute information such as public keys to enable mutual authentication between users of VoIP (Kong).

It comes not surprise our desire to establish the integrity of the callee's current contact address. Nonetheless, the "standard" SIP discussed so far offers little or very low protection on the contact address. Indeed, if one were to modify the callee's 25 contact address during call initialization, one would be able to redirect the call to a different location or simply stop the service

Proposed Scheme

The authors propose that SIP phones should stop bothering the registrar services and sign their own contact address bindings on behalf of their users. This way, the integrity of the caller and the callee can be verified through the simple use of public keys, and this would also reduce the workload on the registrars. It is important to clarify that the authors are not proposing the use of end-user certificates but instead a change in the SIP architecture itself to distribute user public keys (Kong).

The authors make an important assumption that could be a weakness in their proposal, which is that all involved SIP servers have certificates issued by a well- known public authority. Moreover, they also assume that the caller and the callee trust each other enough to correctly establish the contact’s identity and address bindings for their own domains, which does not have to be the case all the time. After all, not all the numbers dialed from telephones are "secure" numbers or are directed to "secure" entities.

Nevertheless, the authors report that they can protect SIP contact addresses through user signatures, which clearly avoids relying on public key infrastructures through the chaining of trust among SIP entities across the domains (Kong). The use of a distributed public key scheme like the one they propose could be of great help for the industry’s efforts on security, and while their preliminary experimental results look promising, the idea requires more research on the scalability and performance of a VoIP system using their proposal.

26

Monitoring VoIP Networks

Toshiya Okabe, Tsutomu Kitamura, and Takayuki Shizuno are researchers for the System Platforms Research Laboratories of the NEC Corporation in Japan, and they are proposing a technique that aims to maintain communication confidentiality in

VoIP networks (Okabe).

Motivation

As seen in previous sections, many of the possible threats to normal networks are also threats to VoIP, and if the industry does not start to take important steps to secure VoIP networks, there may be a rise in impersonating attacks in the near future.

One important component of securing VoIP is considering the emergence of impersonating traffic, P2P traffic, and SPAM over Internet Telephony (SPIT), all of which adversely use the network resources to hurt consumers. Carrier networks should provide a better service by identifying and separating the traffic without peeking into the contents of the data packets.

To accomplish this goal, the authors have studied techniques to identify illegal traffic from limited information. This limited information could include headers or transmission patterns in the packets. The authors propose a traffic identification technique for a real-time application that uses statistical information such as the frequency of packet arrival. This technique is useful in preventing impersonation attacks by identifying the traffic generated by not only VoIP packets but also video applications that are more complex (Okabe).

Current State

The purpose of the authors’ proposal is to prevent illegal use of network resources by finding out the real time communication flow represented by VoIP.

However, there are already several conventional techniques for flow identification. 27

The first one is the host behavior approach. This technique seeks to infer an application that generates traffic by establishing a relationship between a host and others, and it focus on that relationship. However, it is rather difficult to maintain a high detection accuracy if two or more applications are running on one host.

Moreover, this technique also requires a lot of computational power, which is not suitable for large networks.

The second one is the traffic behavior approach. This approach uses the behavior of the network traffic to locate an application generating that traffic. There are three ways of implementing it: transaction-level behavior, flow-level behavior, and packet-level behavior. Each of these techniques has a variety of weaknesses that have stopped their implementation on current networks (Okabe).

Proposed Idea

After evaluating the weaknesses of each approach, the authors propose the use of a flow identification technique that is based on flow-level behavior.

Figure 4. Flow Identification Process (Okabe).

As seen in figure 4, the authors propose a multi-step process. First, the received traffic is divided into flows or different streams of data. As the division is being made, a time stamp is given to each packet. The packet size is also measured 28 and recorded. Next, the feature of the flow is extracted. Usually, this feature is statistical information that is obtained from each flow without checking the payload of each packet, which gives users the confidence that their VoIP company is not

“listening” to what they are saying. After that, the data obtained is verified against established reference patterns of illegal “eavesdropping.” Next, there is verification process with the reference pattern that is cyclical. This verification process seeks to avoid false negatives. Finally, a flow control is performed on the traffic with the parameters established by the company (Okabe).

One of the most interesting features of this project is that the authors were able to launch a prototype to monitor the flow of known VoIP programs, such as Skype,

SIP softphone, and Microsoft Netmeeting. Moreover, they also tested their application with P2P programs like Kazaa. Their results are very promising, and the authors believe that this technique can also be used to grasp network trends and predict the degradation of the communication quality in VoIP traffic (Okabe).

Intrusion Detection and Prevention on SIP

A team of researchers from the University of Pisa in Italy and the Ecole d'Ingénieurs et de Gestion du Canton de Vaud in Switzerland have proposed what many have called the first intrusion detection system for VoIP. For this team of researchers, and for most of the scientists in the workshop, VoIP deployment is expected to grow, but with them, intrusion problems similar to those found in data networks will start appearing as well. The authors proposed to analyze the VoIP requirements for intrusion detection and prevention systems and offered a prototype implementation (Niccolini). 29

The Prototype

One feature shown in the IEEE workshop was the working prototype of the

SIP intrusion detection and prevention system implemented using the popular Snort software.

Figure 5. Network Intrusion Detection on SPI Using Snort (Niccolini).

The basic scheme used for the intrusion detection with Snort is depicted in

Figure 5. This scheme is not different than the one used in many regular corporate networks for intrusion detection. The authors believe that using Snort is an essential part of their technique. These network-based techniques should be implemented in devices able to observe the traffic to be analyzed. Therefore, the entry point of a SIP network is best suited to implement their system, which would nothing else than a

SIP-aware firewall (Niccolini).

In addition to filtering, their prototype was able to distinguish legitimate from illegitimate requests. They accomplish this feature by

• Checking the SIP syntax of the message against the SIP rules in search for

discrepancies. 30

• Checking the SIP mandatory fields for correct size and headers.

• Verifying the SIP state table. This is extremely important to prevent SPIT

because this check performs a rate limitation on the number of transactions a

particular user can initiate in a time period (Niccolini).

These techniques, combined with a regular network intrusion detection system for

SIP, are quite revolutionary, and the authors were able to test their ideas successfully using a brute force generator that tried to sabotage their VoIP network. The implementation of Snort in VoIP could be an important step against future threats.

However, as the authors are quick to point out as well, there will be great challenges with trying to implement this system on a network that could have millions of people trying to place a call at a given moment.

VI Concluding Remarks

As seen in the three examples of research and development described in the preceding section, there is a great deal of effort to secure VoIP networks, and both companies and universities are taking a leadership role in planning for better management and security. Some of the ideas are quite revolutionary, and others are trying to implement old information security techniques on existing VoIP technology.

However, everybody shares the same objective of making VoIP a safer environment and providing a level of security that will be required soon by consumers.

There are, of course, many ways to make VoIP more secure. Among these solutions is encryption. In this space, many people have made an effort to find a proper encryption mechanism. Of course, the main issue for encryption (besides security) is speed, as there is no room for long delays, or the technology will become useless. However, researchers have made great strides such that it is now possible to have reliable and fast encryption on voice communications with little difficulty. 31

Among these solutions is ZRTP, written by Phil Zimmermann, which works on top of standard and open protocols and is fully backward-compatible. While there are still issues with some approaches—Skype, for instance—the technology overall is advancing quite rapidly in all areas.

However, this effort is not being shared by all big VoIP providers. Indeed, companies like Vonage, Comcast, AT&T, and Time Warner offer VoIP products that do not include concrete solutions to the security problems explained in this paper.

This technology caught both security experts and hackers by surprise, and none of them were prepared to protect or exploit the flaws in the technology. Nevertheless, both researchers and companies know that this will not last forever, and it is very encouraging to see a great amount of effort to prevent future problems.

32

VII. Works Cited

Baugher, M., et al. The Secure Real-time Protocol (SRTP). March 2004. The Internet

Society. 13 July 2006. .

Berson, Tom. Skype Security Evaluation. 18 October, 2005. Anagram Laboratories.

13 July 2006.

031%20security%20evaluation.pdf>

“Block Cipher Modes of Operation.” Answers.com. 18 July 2006.

.

Dunn, Andrew. "Hackers in VoIP phone fraud stole 10 million free

minutes." The Sydney Morning Herald 16 JUN 2006

18 JUL 2006

.

Feng Cao; Malik, S., "Vulnerability analysis and best practices for

adopting IP telephony in critical infrastructure sectors," Communications

Magazine, IEEE , vol.44, no.4pp. 138- 145, April 2006.

NL&arnumber=1632661&arnumber=1632661&arSt=+138&ared=+145&arAu

thor=Feng+Cao%3B+Malik%2C+S.>.

"H323 Overview, Tutorials/Resources." Telecomspace. 2006. Telecomspace.com. 16

Jul 2006 .

Kong, L., Balasubramaniyan, V.B., and Ahamad, M. "A lightweight scheme for

securely and reliably locating SIP users." IEEE Xplore. Georgia Tech Lib.,

Atlanta, GA. 12 July 2006 .

Kuhn, D. Richard, Thomas J. Walsh, Steffen Fries. United States. National Institute of

Standards and Technology, Technology Administration, Department of 33

Commerce. Security Considerations for Voice Over IP Systems.

Gaithersburg, MD: NIST, 2005.

Materna, Bogdan. A Proactive Approach to VoIP Security. Voipshield, 2006.

81&pagtype=samecatsamechan>.

Niccolini, S. et al. "SIP intrusion detection and prevention: recommendations and

prototype implementation." IEEE Xplore. Georgia Tech Lib., Atlanta, GA. 12

July 2006 .

Okabe, T., Kitamura, T., and Shizuno, T. "Statistical traffic identification method

based on flow-level behavior for fair VoIP service." IEEE Xplore. Georgia

Tech Lib., Atlanta, GA. 12 July 2006

.

Sacco, Al. “Chinese Company: Skype Protocol Cracked.” CIO Tech Informer 14 July

2006. 17 July 2006. .

"The 1st IEEE Workshop on VoIP Management and Security-VoIP MaSe'06." IEEE

Xplore. Georgia Tech Lib., Atlanta, GA. 12 July 2006

.

Vagle, Jeffrey L. "How secure is VoIP?" IT Manager's Journal. 13 Sep 2005. 13 July

2006.

=81>.

Walsh, T.J.; Kuhn, D.R., "Challenges in securing voice over IP,"

Security & Privacy Magazine, IEEE , vol.3, no.3pp. 44- 49, May-June 2005.

=JNL&arnumber=1439501&arnumber=1439501&arSt=+44&ared=+49&arAu

thor=Walsh%2C+T.J.%3B+Kuhn%2C+D.R.>.

Wosnack, Nathan. "Bugtraq: A Vonage VOIP 3-way call CID Spoofing

Vulnerability." seclists.org 13 AUG 2003 18 JUL 2006.

.

Zimmermann, Philip R. PGPfone Owner’s Manual. 8 July 1996. Phil’s Pretty Good

Software. 13 July 2006.

.

---, et al. ZRTP: Extensions to RTP for Diffie-Hellman Key Agreement for SRTP. 5

March 2006. The . 13 July 2006.

Zfone Home Page. Phil Zimmermann & Associates. LLC 13 July 2006.

.

35

VIII. Glossary

AES - "The Advanced Encryption Standard (AES), also known as Rijndael, is a block cipher adopted as an encryption standard by the US government."

ARP - "In computer networking, the Address Resolution Protocol (ARP) is the method for finding a host's hardware address when only its IP address is known"

ARPANET - the world’s first packet switching network and the precursor to the internet

CI - Critical infrastructure is a term used by governments to describe material assets that are essential for the functioning of a society and economy

DES - "The Data Encryption Standard (DES) is a cipher (a method for encrypting information) selected as an official Federal Information Processing Standard (FIPS) for the United States in 1976, and which has subsequently enjoyed widespread use internationally"

GSM - The Global System for Mobile Communications (GSM) is the most popular standard for mobile phones in the world

H.323 - is an International Telecommunications Union (ITU) umbrella specification that defines a series of protocols for audio-visual communication sessions on any packet network

HMAC - "A keyed-hash message authentication code, or HMAC, is a type of message authentication code (MAC) calculated using a cryptographic hash function in combination with a secret key."

IDS - "An Intrusion Detection System (or IDS) generally detects unwanted manipulations to systems. There are a lot of different types of IDS, some of them are described here"

IEEE - "The Institute of Electrical and Electronics Engineers or IEEE (pronounced as eye-triple-e) is an international non-profit, professional organization for the advancement of technology related to electricity"

IPS - "An intrusion prevention system (a computer security term) is any device which exercises access control to protect computers from exploitation. ""Intrusion prevention"" technology is considered by some to be an extension of intrusion detection (IDS) technology, but it is actually another form of access control, like an application layer firewall" 36

ITU - The International Telecommunication Union is an international organization established to standardize and regulate international radio and telecommunications

MAC - In computer networking a Media Access Control address (MAC address) is a unique identifier attached to most forms of networking equipment

NAT - "In computer networking, the process of network address translation (NAT, also known as network masquerading or IP-masquerading) involves re-writing the source and/or destination addresses of IP packets as they pass through a router or firewall."

PGP - Pretty Good Privacy (PGP) is a computer program which provides cryptographic privacy and authentication

PGPfone - Pretty Good Privacy Phone (PGPfone) is a secure voice telephony system developed by Philip Zimmermann in 1995

PKI - "In cryptography, a public key infrastructure (PKI) is an arrangement that provides for trusted third party vetting of, and vouching for, user identities"

PSTN - "The public switched telephone network (PSTN) is the concentration of the world's public circuit-switched telephone networks, in much the same way that the Internet is the concentration of the world's public IP-based packet-switched network"

QoS - "In the fields of packet-switched networks and computer networking, the traffic engineering term Quality of Service (QoS) refers to the probability of the telecommunication network meeting a given traffic contract, or in many cases is used informally to refer to the probability of a packet succeeding in passing between two points in the network within its desired latency period."

RFC - "In internetworking and engineering, Request for Comments (RFC) documents are a series of memoranda encompassing new research, innovations, and methodologies applicable to Internet technologies."

RTCP - RTP Control Protocol (RTCP) is a sister protocol of the Real-time Transport Protocol (RTP)

RTP - The Real-time Transport Protocol (or RTP) defines a standardized packet format for delivering audio and video over the Internet. It was developed by the Audio-Video Transport Working Group of the IETF and first published in 1996 as RFC 1889 which was obsoleted in 2003 by RFC 3550.

37

SBC - "A Session Border Controller is a device used in some VoIP networks to exert control over the signaling and media streams involved in setting up, conducting, and tearing down calls"

SDP - "Session Description Protocol (SDP), is a format for describing streaming media initialization parameters. It has been published by the IETF as RFC 2327."

SHA1 - The SHA (Secure Hash Algorithm) family is a set of related cryptographic hash functions.

SIP - "The Session Initiation Protocol (SIP) is a protocol and proposed standard for handling interactive multimedia user sessions through a variety of media, including VoIP"

SPIT - Spam over Internet Telephony

SRTCP - "The Secure Real-time Transport Protocol (or SRTP) defines a profile of RTP (Real-time Transport Protocol), intended to provide encryption, message authentication and integrity, and replay protection to the RTP data in both unicast and multicast applicatio"

SRTP - "The Secure Real-time Transport Protocol (or SRTP) defines a profile of RTP (Real-time Transport Protocol), intended to provide encryption, message authentication and integrity, and replay protection to the RTP data in both unicast and multicast applicatio"

TCP - The Transmission Control Protocol (TCP) is one of the core protocols of the Internet protocol suite

UDP - "The User Datagram Protocol (UDP) is one of the core protocols of the Internet protocol suite. Using UDP, programs on networked computers can send short messages sometimes known as datagrams to one another."

VoIP - "Voice Over Internet Protocol (VoIP) is the routing of voice communications over any kind of digital, IP-based network"

ZRTP - ZRTP is an extension to Real-time Transport Protocol (RTP) which describes a method of Diffie-Hellman key agreement for Secure Real-time Transport Protocol (SRTP).