Voice Over IP Is the Real Time Delivery of Voice Across Networks Using The

Survey of Voice over IP (VoIP) By Nachiappan Nachiappan ([email protected]) Fredrik Sjoqvist ([email protected]) ii

Table of Content

ABSTRACT______iv 1. Introduction to Voice over IP______1 1.1 Challenges______1 1.2 Applications of VoIP______1 1.3 Difference between IP Telephony and PSTN______1 1.4 Features of IP Telephony______2 1.5 VoIP Signalling and its evolution______2 2. H.323______4 2.1 Elements of H.323______4 2.2 H.323 protocol suite______4 2.3 Calling with H.323______5 2.4 H.450.x services______7 2.5 H.323 Interworking with PSTN______7 2.6 Gateway Decomposition______7 2.7 H.323 and H.248 (a.k.a. Megaco)______8 2.8 Data Transport Using RTP______8 3 Session Initiation Protocol (SIP)______10 3.1 SIP Messages and Headers______10 3.1.1 Header Fields______11 3.2 A SIP Call______11 3.3 SIP INFO______12 3.4 Real Time Streaming Protocol (RTSP)______12 3.5 Session Description Protocol (SDP)______13 3.6 Session Announcement Protocol (SAP)______13 3.7 SIP-PSTN interworking______13 4 Comparison of H.323 and SIP______14 4.1 Call Setup and Capability Exchange:______14 4.2 QoS and Call Control______16 4.3 Looping Avoidance______16 4.4 Reliability______16 4.5 Address Resolution______16 4.6 Ease of use and Extensibility______16 4.7 Call control services______16 4.8 Interoperability______17 4.9 Interworking with PSTN______17 4.10 Interworking of SIP and H.323______17 5 Quality of Service in VoIP______18 5.1 Quality Discussion______18 5.2 Key Issues______18 5.2.1 Accumulation Delay (sometimes Called Algorithmic Delay)______19 5.2.2 Processing Delay______19 5.2.3 Network Delay______19 5.2.4 Jitter______20 5.4 Measures for achieving QoS______20 5.4.1 cRTP______20 5.4.2 Queuing______21 5.4.3 RSVP______21 5.4.4 Lost-Packet Compensation______21 5.4.5 Echo Compensation______22

ii iii

6. VoIP over Wireless (VoIPoW)______23 6.1 Header compression for VoIP in Wireless______23 6.2 VoIP in GPRS______23 6.2.1 Enhancements required in GPRS for VoIP______23 6.3 VoIP in UMTS______24 6.4 Mobile Internet Telephony______25 6.5 VoIP added mobility to GSM Users______25 6.6 VoIP in Satellite Networks______26 7. Feature Interaction in Internet Telephony______27 7.1 Advantages of Internet Telephony______27 7.2 Complications of Internet Telephony______28 7.3 Cooperative and Adversarial Interactions______28 8. AAA Usage in VoIP______29 9. Cisco VoIP______29 9.1 Cisco VoIP Products and Developments______29 9.2 Accounting and Billing for Cisco VoIP products______30 10. VoIP Market______31 11. SIP Implementations______32 12. Future of VoIP______33 Appendix A______a References______c

iii iv

ABSTRACT

In this survey paper we introduce voice over internet protocol (VoIP) in general, how it is different from the traditional telephony using public switched telephone networks (PSTN) and what are the advantages of VoIP. We also trace the evolution of voice over IP, how it evolved into where it is today. Two standards currently compete for IP telephony signaling: The H.323 protocol suite by ITU and Session Initiation Protocol (SIP) by IETF. We describe both protocols in good detail and also the protocols they use to achieve a full end-to-end VoIP solution. These protocols are going to co-exist with PSTN and use the existing infrastructure; hence we discuss their interworking with PSTN which is critical for their success. We also compare this two protocols in various kind of functionality they offer and interworking among themselves. Quality of Service (QoS) is one of important criteria that will decide the adoption and success of VoIP itself, we discuss the issues of it, various delays that affect QoS and the measures adopted to achieve desired QoS. VoIP is fast catching up in wireless markets also. The issues of VoIP specific to wireless networks are discussed and also the VoIP in various wireless networks like GPRS, UMTS etc. We discuss the mobile internet telephony and also how VoIP can give added mobility to GSM users. Satellite networks are part of communications today; hence we discuss the VoIP possibilities in them also. When lot of new features are getting added to the internet telephony then it will face the same problem traditional telephony has, which is feature interaction. This also is dealt in detail. VoIP can use authentication, Authorization and Accounting (AAA) for QoS and also accounting for users. Its an emerging area which will become an integral part of VoIP soon. We took one company (Cisco) which has lot of VoIP products today and which supports both H.323 and SIP in their products. The possibility of accounting and billing for VoIP is discussed using their products. Then we discuss about what is the market potential VoIP and we look at some of the implementations of SIP currently available in the market. We conclude the survey with what we see as future for VoIP.

1. Introduction to Voice over IP

Voice over IP (VoIP) is the real time delivery of voice across networks using the Internet Protocols. People are using internet in new and exciting ways; it means significant revenue opportunities. But the revenues generated by internet are miniscule compared to the revenues generated by voice. Important driving force behind voice over IP is cost savings. Circuit switching carries voice well, but is expensive. IP is attractive for voice because of lower equipment cost, integration of voice and data, lower bandwidth requirements and widespread availability of it. IP telephony is cheap compared to high cost of long distance and international voice calls. New services can be easily enabled in IP telephony because the intelligence lies in the end systems and IP networks are open in nature. Also many of the existing services like caller-id, call forwarding, multi-line presence will be very easy to implement and also will be offered for free. For corporate carrying voice on existing data networks is cost efficient and also integration of voice and data applications can result in efficient business process. Examples are integrated voice mail and email, teleconferencing, automated and intelligent call distribution etc.

1.1 Challenges

The main challenge for VoIP is the voice quality should be as good as today’s telephone networks. Also the ease of operation, functionality should be the same as public switched telephone network (PSTN). This becomes really challenging because IP is not designed to carry voice or real time traffic. The important requirements for quality voice are less delay, minimal jitter, low (or no) packet loss, and speech-coding techniques that maintain natural speech as well as require low bandwidth. Also very important are reliability and scalability of a VoIP system.

1.2 Applications of VoIP

Today VoIP is used in following applications.

 Toll bypass – Reduce cost of long distance calls by routing them through IP-based networks.  Call center integration – Call centers that use one link to talk to customers and one to retrieve data can combine them into one saving costs. Also customers can go find information in web site and make a call directly from there.  Unified Messaging – which allows users to receive various forms of messages (voice, email, fax etc) at a single access point.  IP videoconferencing – which will reduce cost for businesses.  Corporate intranets – integration of voice and data networks in corporate using IP based PBX solutions, reducing infrastructure and administration costs.  Hosted PBX solutions for small office/home office (SOHO).

1.3 Difference between IP Telephony and PSTN

IP telephony relies on “end to end” paradigm for delivery of services. Signaling is done end-to-end; hence the call state is also end-to-end as are the instantiation of many telephony features. This leads to tremendous flexibility and extensibility. IP telephony separates call setup and reserving resources. Hence it breaks the assumption of all or nothing call completion of current telephone system. PSTN addresses are overloaded with multiple functions like end point identification, service indication, indication of who pays for the call and carrier selection. IP telephony addresses are used solely for endpoint identification and basic service indication. Payment and carrier selection are handled by protocols such as RSVP and RTSP.

Phone system employs different signaling protocols between user and Network (UNI) as compared to network elements (NNI), because of which certain features are not available to end users. This distinction does not exist in Internet both at level of data transport and signaling.

The open, end-to-end nature of Internet means that completely different service vendors provide various components of telephony services. The separation of functionality simplifies the number portability problem also.

1.4 Features of IP Telephony

 Adjustable quality – end systems can control the amount of compression based on network bandwidth.  Security – signaling messages and media can be encrypted.  User Identification – talker indication in both multicast and bridged configurations and more detailed information if caller desires.  User Interface – graphical user interface offered by Internet telephony can be readily customized and offer richer indications of features like process and progress indications.  Integration of computer and telephony.  Feature ubiquity because services are defined largely by end systems.  Multimedia - as multiplexing is natural for packet networks.

The Carrier benefits are: Silence suppression and compression, shared facilities among voice, data and signaling networks, simpler to develop and deploy advanced telephony services, separation of voice and control flow.

1.5 VoIP Signalling and its evolution

Signaling enables various network components to communicate with each other to setup and tear down calls. The signaling infrastructure of IP telephony must

 provide the functionality required to set up manage and tear down calls and connections  be scalable to support large number of end points  support network management for policy control, accounting, billing etc  provide mechanisms to setup QoS requested  be extensible to make addition of new features easier  should be interoperable among different vendors, versions and among different signaling protocols

An IP telephony signaling protocol also has to accomplish the following function [13]:

Name translation and user location - to determine the IP address of the host to exchange media with. The translation can also be based on many other things like time of day, caller or status of callee.

Feature negotiation that allows end systems to agree on what media to exchange and their respective parameters such as encoding and Feature changes to change the media composition during the call.

Call participant management, which lets users to invite others into existing call or terminate existing connections, or hold, transfer calls

VoIP industry has gone through three stages [2] in terms of signaling protocol evolution: precommercial (pre 1995), PC centric (95-98), and carrier grade (1998 on).

Precommercial stage produced the Real Time Transport Protocol (RTP) and Multiparty Multimedia Session control (MMUSIC) of Internet Engineering Task Force (IETF) designed various protocols

2 3 including Session Initiation Protocol (SIP) for session setup and teardown. During this stage the focus was on audio and video conferencing over the Internet.

PC-Centric stage started when commercial VoIP software products allowed user to place a call over the Internet among multimedia PCs. But these products had their own signaling protocols and hence there was no interoperability among them. To address this problem International Telecommunication Union (ITU) started working on VoIP signaling protocols and released H.323 v1, referred to as a standard for real-time videoconferencing over non-guaranteed quality of service LANs. Interworking with PSTN was one of the main focuses and lot of PC client software vendors built H.323 complaint products. These products enabled phone calls to be made across PSTN and the Internet. One of the widely popular product includes NetMeeting from windows.

Carrier-grade stage started when IP telephony service providers realized the limitations of H.323 that assumed that a gateway handles signaling conversion; call control and media transcoding in one box, which did not scale well for large deployments. H.323 also had no provision for SS7 connectivity that hindered seamless integration with PSTN. In 1998 concept of decomposed gateway was introduced which consisted of Media Gateway Controller (MGC) that handled call control and Media Gateway (MG) that handled media transformation. ITU & IETF defined media gateway control standard H.248 or Megaco in June 2000.

2. H.323

The International Telecommunications Union (ITU) ratified the first version of H.323 in May 1996 [3]. H.323 standard was initially targeted to multimedia conferencing over LANs that do not provide guaranteed QoS. But since then been updated three times and the latest is version four (v4), and has evolved into a protocol for generic packet-based multimedia communication. The standard defines how voice, data, and video traffic are transported over IP–based networks and it also incorporates the T.120 data-conferencing standard. In figure 1 an overview of the protocols included in the H.323 standard is drawn and the different functions of the protocols are labelled.

2.1 Elements of H.323

A typical H.323 network is composed of number of zones interconnected via a WAN. Each zone consists of a single gatekeeper (GK) and a number of H.323 terminals (TEs), a number of H.323 gateways (GWs) and a number of multipoint control units (MCUs). A zone can span a number of LANs or just a single LAN. Each component of the architecture is defined as follows:

Terminal (TE): The TE is an endpoint in the H.323 call, i.e. the creator or receiver of voice. The TE can communicate with another H.323 TE, a GW or a MCU. The TE must have a system control unit, media transmission, audio codec and as packet based network interface. A video codec and user data applications are optional capabilities. The H.323 TE can set up a call with another TE directly or via a GK.

Gatekeeper (GK): This element is an optional function that provides address translation and control services to the endpoints (i.e. the TEs, GWs and MCUs). The endpoints may also receive other services from the GK, such as bandwidth management and locating gateways.

Gateway (GW): The GW combines the characteristics of a Switched Circuit Network (SCN) endpoint and the H.323 endpoint in order for the two systems to co-exist. The GW is the interface of the VoIP implementation to the rest of the world and enables the VoIP implementation to communicate with other communication solutions. The GW is not needed if the implementation is not intended to communicate with the outside world.

Multipoint Controller Unit (MCU): The MCU provides support for conferences between three or more endpoints. It transmits the capability set to all the endpoints and can revise capabilities during the transmission. The main task of the MCU is to receive and distribute audio and video streams to all the participants in a conference. The MCU may be incorporated into a TE, GW, or GK, or it can be kept separated.

2.2 H.323 protocol suite

The H.323 protocol is implemented with a set of standards that combined bring the H.323 services. The protocols, as they were interconnected up to version 2 of the standard, are described as shown in figure 1 below.

Control Data Audio Video A/V Control Control

G.7XX H.26X GK Q.931 H.245 T.120 RTCP RAS

TCP UDP

Figure 1: Protocols relationship in H.323v1/v2 (Source: [2] VoIP signalling, H.323 and beyond)

Registration Admission and Status (RAS): Transaction-oriented protocol for an endpoint to contact a GK. RAS is used by an endpoint to discover the gatekeeper, register/unregister with the GK, requesting call admission, bandwidth and clearing a call. The GK can also use the RAS to request information about the endpoint or to communicate with other GKs. Since gatekeepers are optional in the standard RAS is only used when gatekeepers are present.

Q.931: The signalling protocol between two terminals, which is also present in a PSTN network. However the H.323 is a variation of the PSTN version and uses only a subset of the messages. The adoption of Q.931 into H.323 was motivated by interoperability with existing phone networks, for example H.320 and H.324.

H.245: This protocol is a connection control protocol. It lets the endpoints negotiate media processing capabilities such as codecs, and it is widely used in the H-series multimedia standards. In H.323 the standard is used to exchange terminal capability, determine master-slave relationship and open/close the logical channel between the two endpoints.

Real time transport protocol (RTP) used by media is discussed in later part of this section.

2.3 Calling with H.323

The stages of H.323 call is as shown below in Table1.

Phase Protocol Intended Functions 1. Call admission RAS Request permission from GK to make/receive a call. At the end of this phase the calling endpoint receives the Q.931 transport address of the called endpoint. 2. Call setup Q.931 Set up call between two endpoints. At the end of this phase, the calling endpoint receives the H.245 transport address of the called endpoint. 3. Endpoint capability H.245 Negotiate capabilities between two endpoints. negotiation and logical Determine master-slave relationship. Open logical

5 6 channel setup channel between two endpoints. At the end of this phase, both endpoints know the RTP/RTCP addresses of each other. 4. Stable call RTP Two parties in conversation. 5. Channel closing H.245 Close the logical channels. 6. Call teardown Q.931 Tear down the call. 7. Call disengage RAS Release the resources for this call.

Table 1: The phases of a H.323 call. (Source: [2] VoIP signalling, H.323 and beyond)

When there is no GK involved then the first and last step are omitted. H.323 v2 defines a fast connect which combines the Q.931 and H.245 stages. H.323 v4 defines H.245 negotiation in parallel to fast connect so that if fast connect fails endpoints will connect more quickly using H.245 (instead of starting the process all over again).

As figure 2 shows, the different protocols in H.323 uses TCP or UDP. The media stream uses the non- guaranteed delivery UDP protocol and the motivation for such a choice is argued in the QoS section. On the other hand the control and set up functions of H.323 till v2 uses TCP, which has delivery guarantee service and from v3 onwards it has an option to use UDP also to minimise the call-setup time. The call procedure in relation to TCP and UDP connections are shown in figure 2.

Figure 2: H.323 (v2) Call Sequence. (Source: http://www.iec.org/tutorials/int_tele/)

Two call control modes are possible using H.323: direct call and GK-routed call. As the name indicates in direct call all (Q.931, H.245) messages and media (using RTP) are exchanged directly between endpoints. In GK routed call model all signaling messages are routed through GK cloud. By the later approach the service providers can control the network (on admission and bandwidth control) and exercise accounting and billing functions.

2.4 H.450.x services

H.323 defined services similar to the features available in PSTN by H.450.x series [4,5,6,7]. They are

H.450.1 for generic functional protocol and procedures H.450.2 for call transfer H.450.3 for call diversion (including call forwarding and deflection) H.450.4 for call hold H.450.5 for call park and pickup H.450.6 for message waiting indication H.450.7 for call waiting H.450.8 for name identification H.450.9 for call completion H.450.10 for call offer H.450.11 for call intrusion H.450.12 for additional common information network services

The alternative to H.450 based services is to implement them in proprietary manner in GK. There are ways defined for integration of Intelligent network (IN) services in PSTN with GK and H.246 Annex D defines this. Also a HTTP-based control channel was defined in H.323 v4, using which a service provider is able to display web pages to the user with H.323 call related contents. Annex K of H.323 provides ways to create new services using mechanisms similar to third party call control. Annex L of H.323 addresses the issue of having simple endpoints with intelligence residing in the network elements (such as feature servers).

2.5 H.323 Interworking with PSTN

H.323 GW is involved in connecting PSTN with internet. The gateway provides the following functionality  PSTN interfaces: PSTN signaling interface that terminates signaling protocols such as ISDN Q.931, PSTN media interface that terminates media streams such as pulse code modulated (PCM) voice streams.  VoIP Interfaces: VoIP signaling interface that terminates H.323 and the packet media interface that handles RTP.  Signaling conversion: Translation between ISDN Q.931 signaling and H.323 signaling for call control  Media Transformation: Translation between 64kbps PCM and RTP streams of various speeds  Connection Management: Coordinate between signaling flows and media transformations, and maintains (create/modify/delete) the association between PSTN and internet flows during the lifetime of call.

2.6 Gateway Decomposition

Some of the limitations of GW in H.323 were:

 Scalability – GWs could support only few thousand lines when compared to tens of thousands of lines by telephone switches.  SS7 Connectivity - Until’98 all GWs did not have this connectivity and hence could not provide rich set of services enable by SS7 and also because of that subscriber had to dial a phone number to connect to GW and then dial the destination number.  Availability – When a GW is down all active calls of that GW disappears. There was no mechanism for fail over.

The monolithic packaging of signaling and media transformation into single box limited the GW on the number of lines it can handle, because the capability for handling multiple lines are dependent on CPU processing power and memory capacity. Signaling is less computationally intensive, while media transformation is more computationally intensive and also occurs for entire duration of call. Hence to make GWs more scalable the decomposition of signaling and media transformation in the GW was decided.

The GW is decomposed into three functional components namely signaling gateway, media gateway and media gateway controller.

Signaling gateway (SG) provides the signaling mediation function between IP and PSTN domains.

Media gateway (MG) provides media mapping and/or transcoding functions between RTP media and PCM encoded voice. MG also performs voice compression, network echo cancellation, silence suppression, comfort noise generation, encryption, fax conversion and analog modem conversion. It also performs conversion between tones on PSTN side and appropriate signals on packet network side and also plays announcements, performs voice recognition.

Media gateway Controller (MGC): The MGC sits between MG,SG and GK. It provides the call processing function for the MG and maintains necessary call state information. MGC manages network level resources available for calls. It receives PSTN signaling information from SG and IP signaling from the GK. MGCP was introduced as a control protocol between MG and MGC, but IETF and ITU eventually created a common standard called Megaco/H.248.

By the decomposition of GWs overcomes the deficiency of monolithic GW. Using MGC to control multiple MGs in effect it creates a virtual GW that can handle more lines, by this scalability issue can be addressed. By connecting SG function to SS7 network SS7 connectivity can be achieved and one stage dialing can be achieved. Letting multiple MGCs to control a single MG can increase availability. If one MGC fails then another MGC can takeover using fail over procedures.

2.7 H.323 and H.248 (a.k.a. Megaco)

The Megaco/H.248 standard [9] provided a way to address some of the H.323 issues like scalability, availability and integration with SS7. H.248 addressed the gateway decomposition issue discussed earlier. Both H.323 and H.248 will coexist, with H.323 being used by terminals to communicate with each other and with network; H.248 used by GKs to control large gateways. But H.248 provides an alternative Thin- Client approach for H.323/VoIP systems, which are based on Thick-Client functionality [10]. The H.248 architecture assumes the intelligence is in the network and equipment has limited functionality that reduces cost. By this way new services can be introduced easily in the network without need to upgrade all the equipment. Megaco provides a flexible framework for master/slave control of broad range of IP telephones and similar devices. It lends itself well to definition of user interface elements such as function keys, indicators, displays and so on. This makes it possible to define very simple devices reducing complexity, cost and improving interoperability. Also H.248 does not allow terminals to make calls directly to other terminals hence allows providers to control QoS and charge appropriately. Several Cable and DSL modem have limited computing resources hence their VoIP products have adopted H.248. Which of this thick (i.e. intelligent) client or thin client approach will succeed might be a question, but it's conceivable that both models will coexist.

2.8 Data Transport Using RTP

Real time transport protocol (RTP) [11] provided services like sequencing and loss detection, intra media synchronization, pay load identification, frame indication etc. which are required by real time voice traffic. It has two components one is RTP itself and other is RTCP. RTP apart from providing the above services

8 9 is also mulitcast-friendly, media independent, and includes support for mixers and translators. RTP media streams can be encrypted. RTP is generally used in conjunction with UDP. A random 32-bit synchronization source identifier (SSRC) distinguishes users within a multicast group. This allows to distinguish streams coming from the same translator or mixer and associate receiver reports with sources. The contributing SSRC (CSRC) list identifies all the SSRCs who contributed to that packet. RTP supports the media dependent framing to assist in the reconstruction and playout process. The payload type identifies the media encoding used in the packet. The sequence number increments sequentially from one packet to another and is used to detect losses and restore packet order.

Media senders and receivers periodically send RTCP packets to the same multicast group as is used to distribute RTP packets (but different ports). Each RTCP packet contains usually a sender report (SR) or receiver report (RR) followed by source descriptions (SDES). SR are generated by users who are sending media and it describes the amount of data sent so far as well as correlating the RTP sampling timestamp and absolute time to allow synchronization between different media. Session participants who are receiving media send RR. It contains one block for each RTP source in the group, which describes the instantaneous, and cumulative loss rate and jitter from that source. It also indicates the last timestamp and delay since receiving a sender report allowing sources to estimate their distance to sinks. SDES packets are used for session control; they contain the CNAME (canonical name) a globally unique identifier similar in format to email address. This is used for resolving conflicts in the SSRC value and associate different media streams generated by the same user. The SDES packets also identify the participant through its name, email and phone number. Thus using different packets RTCP allows receivers to provide feedback of QoS. It has loose session control also.

3 Session Initiation Protocol (SIP)

Session Initiation Protocol is from the Internet Engineering Task Force (IETF) as an alternative for the H.323 protocol. Session Initiation Protocol (SIP) [12] is an application-layer control (signaling) protocol for creating, modifying and terminating sessions with one or more participants. The sessions include Internet multimedia conferences, Internet telephone calls and multimedia distribution. SIP is designed to be independent of the lower-layer transport protocol with a restriction that whole SIP message either has to be delivered in full or not at all. It is a simple client server protocol in which client sends request to the server that processes them. SIP is a text based protocol and very similar to Hypertext transfer protocol (HTTP). SIP reuses many of the header fields in HTTP such as entity headers, authentication headers, which allows easier integration with web servers. SIP keeps the signaling as simple as possible; hence calls can be established faster (rapid call setup). SIP enables useful information to be included which helps in making intelligent decision about call handling. SIP can be easily extended with additional capabilities.

SIP supports five facets of establishing and terminating multimedia communications. They are User location, User capabilities, User availability, Call setup, and Call handling.

SIP is designed as part of the overall IETF multimedia data and control architecture [13] and can be used with other IETF protocols such as RSVP for reserving network resources, the real-time transport protocol (RTP) for transporting real-time data and providing QOS feedback (RTCP), the real-time streaming protocol (RTSP) for controlling delivery of streaming media, the session announcement protocol (SAP) for advertising multimedia sessions via multicast and the session description protocol (SDP) for describing multimedia sessions. However SIP does not depend on any of these protocols for functionality or operation.

As a SIP call participant can generate or receive requests, SIP enabled end systems contain both client and server and is generally known as user agent server. SIP also supports following network servers:  proxy servers which forwards the request to next hop server(s)  redirect servers which informs the client of the address to contact next  registrars with which users register themselves. Generally registrars are located along with proxy or redirect servers and might offer location services.

As such in the protocol there is no distinction between any of the servers discussed above. The difference lies in their functionality such as the user agent server can accept or reject a request while the proxy servers or redirect servers cannot.

Network addresses with SIP also can be Internet address or GSTN addresses or OSI addresses or private numbering plans. SIP identifiers are email like identifier of the form user@domain/user@host/user@ip- address/phone-no@gateway. SIP uses this addresses as part of SIP URLs such as sip:[email protected]. SIP can use Domain Name System (DNS) mail exchange (MX) records or Lightweight Directory Access Protocol (LDAP) as means of delivering SIP invitations if email address of the user itself is published as SIP addresses.

SIP only handles the communication between caller and callee, the end point addressing and user location while Session Description Protocol (SDP) serves to exchange media capabilities (media and their attributes a participant is capable of receiving) of the session. SDP expresses lists of capabilities for audio/video and indicates where the media is to be sent. It allows scheduling media sessions flexibly including future, repeated, timed/non-timed sessions.

3.1 SIP Messages and Headers

INVITE invites a user to a call and establishes a new connection BYE terminates a connection between two users in a call

STATUS informs about the progress of signaling to the client about the action it has requested OPTIONS solicits information about capabilities ACK is used for reliable message exchanges REGISTER is used to register a user’s location with SIP server CANCEL terminates a search for a user

Like HTTP additional methods can be introduced and if a method requested by client is not supported then server signals an error. It also informs the method it supports using Public and Allow response headers. The OPTIONS request also returns the list of available methods.

SIP headers are largely self-describing and the data structures fall into parameter-value category. SIP was designed for character set independence and with ability to indicate languages of enclosed content it is suited for cross-national use.

3.1.1 Header Fields

Calls in SIP are uniquely identified by caller id ("Call-ID" header field in SIP messages) that is created by the caller and used by all participants of the call. "From" header field in messages identify the entity that is requesting the connection (it may be a proxy server also - which acts on behalf of other users) and "To" header field identifies the receiver. The media destination (which includes the IP address and port) is where the media has to be sent for the particular recipient.

SIP provides number of elements to construct various services. Two principal headers used are "Also" and "Replaces". The "Also" header instructs the recipient to place a call to the parties listed. "Replaces" header instructs the recipient to terminate any connections with the listed parties. With "Require" header clients can indicate to server that it must support certain features or capabilities. An extension has been proposed to have "Supported" header by which clients can indicate to server about what extension it supports.

Client and server without explicit indication can add new request and response headers, if they are not crucial in interpreting the message. The entity receiving the header can silently ignore the headers it does not understand. For the client to include headers that are vital to interpreting the request SIP "Required" header is used. The server must refuse the request if it does not understand one of the features enumerated. SIP uses this to ascertain whether telephony call-control functions are supported avoiding the problem of partial implementations that have unpredictable sets of optional features.

3.2 A SIP Call

A SIP client obtains the address of the recipient of the form name@domain. Then it tries to translate this domain to an IP address. Either one of DNS service (SRV) records, MX, Canonical Name (CNAME) or Address (A) records do the translation. Once the server’s IP address is found client sends an INVITE message.

The server that receives the message may not be the user agent server (of the user the client is trying to reach), but may be a proxy or redirect server. Let us assume that a user wants to contact sip:[email protected] and that user jill is in the host beauty. If try.com is a proxy server then will forward the request to [email protected]. If it is a redirect server then it would respond to the client to contact [email protected] directly. The proxy or redirect server determines the next hop server (beauty.try.com) using location server that can be anything like LDAP server or a database or a file or finger command etc.

Once the user agent server has been contacted it sends a response back to the client that contains a response code and message (similar to HTTP). Responses of 100 class (1xx codes) indicate progress of the call and will be followed by a final response. 2xx responses indicate success, 3xx for redirection, 4xx for client failure, 5xx for server failure, 6xx for global failure. The server will retransmit the final responses until the client confirms it with an ACK to server. The responses can include more detailed information also. An

11 12 extension has been proposed to have 183 session progress status code to enable server to indicate to the client that a one-way media path must be opened from client to server.

One example of SIP’s proxy serviced call is as shown below in Figure.3 in which a user invites [email protected] and the proxy receives the call. It uses a location server to find out that the user henning is at play host and then forwards the message to play. The user accepts the call then an acknowledgement is sent from the caller. As noted in the figure all calls are routed through the proxy fure

Figure 3: SIP Call Setup using Proxy (Source http://www.cs.columbia.edu/sip/overview.html)

In case of multiple proxies between client and server a VIA header traces the progress of invitation and allows the response to find its way back and helps to detect loops. Proxy server can also forward the invitation to multiple servers at once (called forking proxy) in the hope of contacting user at one of the location or can forward the invitation to multicast groups.

SIP supports the creation of multi party conferences using network level multicast, via one or more bridges, or a mesh of unicast connections. Listing a group address in session description is enough to achieve multicast conference, and the caller need not even be the member of the multicast group to issue an invitation. Bridges may take over branches of mesh through the Replaces header and participants need not be even aware of the bridge. In multi unicast client maintains point-to-point connection with each participant and the full meshes are easily built using Also header.

3.3 SIP INFO

SIP INFO [14] is a new method introduced to enable transfer of information, generated as a result of events, in the middle of the call. Examples of its use could include things like transfer of Digital Tone Multi-Frequency (DTMF) digits, transfer of account-balance information for pre-paid kind of solutions, transfer of mid call signaling information generated by another network (like PSTN). Through the use of this INFO method application layer information can be transferred in the middle of the call allowing the SIP servers to act on them in a timely fashion. Any application that wishes to use INFO method will also include a specification of the message body to be used. INFO provides a powerful and flexible tool within SIP to support new services.

3.4 Real Time Streaming Protocol (RTSP)

IP telephony voicemail follows IP telephony and the protocol to give control to user over the voice mail server is Real Time Streaming Protocol (RTSP) [15]. RTSP allows a client to instruct a media server to record and playback multimedia sessions including functions such as seek, fast forward, rewind, and pause. RTSP is also a textual protocol and integrates easily with SIP. A user can use SIP to invite a media server or voicemail server to a multimedia session, and then use RTSP to control operation during the session.

3.5 Session Description Protocol (SDP)

SDP [16] is a text-based protocol and is used for advertising multimedia conferences; communicate conference’s addresses and tool specific information necessary for participation. It’s a protocol used for general real time multimedia session description. Session Announcement Protocol (SAP) can be used for announcing the multimedia session described using SDP. Only one session announcement is permitted in a single packet if SAP is used. Alternatively SDP can use WWW or Email for conveying session descriptions using MIME content of type application/sdp. SDP include media information (type, format, transport protocol, address cum port etc), timing information (start, stop, repetitive) etc. SDP may also include additional pointers in the format of Universal Resource Identifiers (URIs) for more information. It supports categorization, internationalization of free text fields, public/private sessions.

3.6 Session Announcement Protocol (SAP)

SAP [17] is used for advertising multimedia conferences and multimedia sessions by multicasting the session descriptions (defined by SDP) to a well-known multicast address and port. The announcement is with the same scope as the session it is announcing. There are announcers and listeners. The announced sessions can be modified or deleted (explicitly or by implicit/explicit timeout). SAP supports authentication also.

3.7 SIP-PSTN interworking

PSTN/Internet Interworking service (PINT) and the interfaces for such interworking have been defined by IETF. Also SIP and SDP has been extended support interworking with PSTN [18]. A PINT Gateway is a host that accepts PINT service request from clients who wish to invoke service within a telephone network. SIP added multipart MIME payloads, support for "Warning" headers, "Require" headers, new methods like SUBSCRIBE, NOTIFY, and UNSUBSCRIBE. Also introduced is format for PINT URLs in a request and telephone network parameters within PINT URLs.

4 Comparison of H.323 and SIP

We have two competing protocols from two different worlds (telecom and internet). Hence its imperative that they will be compared on what's their difference which one is better etc. We compare H.323 and SIP [20] in terms of their call setup, Quality of Service, Reliability, Flexibility, Interoperability, Interworking with PSTN etc in similar scenarios in this section.

The components of SIP and H.323 are comparable. SIP user agent server is equivalent to H.323 terminal and SIP network servers are equivalent to H.323 gatekeeper. SIP protocol itself is equivalent to RAS and Q.931-like protocol in H.323. SDP is equivalent to H.245. The media flows are using RTP in both H.323 and SIP. 4.1 Call Setup and Capability Exchange: SIP call set up procedure is similar to H.323v3 and can use either UDP or TCP (generally they use UDP to minimize call setup time to 1.5-2.5 roundtrips). H.323v1 and v2 used just TCP, which means the call setup has to do TCP connection, which increases the time to setup the calls. H.323 v1 can take about 6-7 round trip times including setup of Q.931 and H.245 TCP connections, but from v2 onwards the delay is reduced to 3 roundtrips by including H.245 logical channel information in SETUP and CONNECT messages itself (a.k.a. Fast Connect), but only G.711 based voice communication can be established between the two parties. If they wish to establish other media channels they have to optionally perform H.245 after G.711 channel is established. From H323 v4 H.245 negotiations can happen in parallel with fast connect.

SDP used by SIP for capability exchanges has limited expressiveness; hence SIP does not have full negotiation capabilities of H.245. For example SDP does not support asymmetric capabilities (either receive or transmit only) and simultaneous capabilities of audio and video encoding. Refer Table 2 for the comparison of messages exchanged in various stages of a call.

Functions SIP H.323 Registration REGISTER, ACK RAS: RRQ, RCF, RRJ, GRQ, GCF, GRJ Admission INVITE RAS: ARQ, ACF, ARJ Call Initiation INVITE H.225 Setup Capability Exchange SDP, OPTIONS, NOT H.245 Open Logical Channel SUPPORTED (Can be embedded in H.225 for Fast Start). H.245 Terminal Capability Set and Ack can be used for slow-start. Resource Allocation No specific signaling Some consider ARQ, GRQ a crude form of resource reservation at admission. For per- session allocation there is No specific signaling. Status Any response/acknowledgement Alerting, Progress, Call messages (1xx-5xx) ex. O.K., Proceeding, Connect Ringing, Progress, ACK Teardown BYE, Ack Release Complete Reliability of Messages Has timers and session Relies on TCP, SCTP or Annex methods/messages to achieve E. H.323 over UDP alone is not reliability. SIP is reliable over reliable. H.323 has timers as UDP. SIP can also bundle well. requests and responses.

Table 2: functions and comparison of messages in SIP and H.323v4 (Source:[21] A Comparison of H.323v4 and SIP by Nortel Networks).

4.2 QoS and Call Control Gatekeepers in H.323 provide control and management functions like address translation, admission control, bandwidth control and zone management. Also the gatekeepers might have optional functions like call control signaling, call authorization, bandwidth management and call management. SIP does not supply management or control functions by itself but relies on other protocols. Admission control makes the decision of whether to allow the call based on whether the network has sufficient resources. To do this the protocol must handle bandwidth management, call management and bandwidth control. H.323 supports this, but not SIP.

Both H.323 and SIP does not support resource reservation by itself, but they recommend using resource reservation protocols such as RSVP, DiffServ defined by IETF. H.323v3 can offer some differentiated services based on QoS parameter negotiation (bit rate, delay, jitter). A terminal may request a “Guaranteed Service” “Controller Service” or “Unspecified Service”.

4.3 Looping Avoidance In order to prevent loops in forwarding H.323 version 3 defines a PathValue field to indicate maximum number of gatekeepers the signaling message can traverse before getting discarded (like TTL in TCP), previous versions of H.323 didn’t have that functionality. PathValue reduces the rate of loop occurrence, but SIP provides highly efficient loop detection algorithms similar to the one used by BGP (Border Gateway Protocol), using the VIA header field.

4.4 Reliability H.323 v3 provides better reliability that SIP by providing redundant gatekeepers and endpoints. There can be alternate gatekeepers, which can be used in case of failure of primary gatekeeper. Similarly an endpoint might have a backup or alternate transport address.

4.5 Address Resolution

Address resolution in H.323 is to utilize aliases E.164 or H.323ID and a mapping mechanism provided by gatekeepers. H.323v4 also defined a URL of the form h323:user@host. SIP uses email like addresses called SIP URL. SIP servers obtain the callers possible location from location servers and uses external mechanisms such as DNS/LDAP to locate endpoints in other administrative domains.

4.6 Ease of use and Extensibility H.323 is a complex protocol and uses lot of other H. series of protocols. Many services require interaction between these protocols and increase the complexity. SIP and SDP are less complicated which simplifies programming and maintenance. Also customization of H.323 is tougher especially because of backward compatibility with other versions, while text based SIP can be easily customized. H.323 defines NonStandardParam in ASN.1 as extension mechanism to support additional features, which limits the extensions. SIP has flexible hierarchical namespace of feature names and hierarchically organized numerical error codes. The new features can be registered with Internet Assigned Numbers Authority (IANA) or can be derived from feature owner's internet domain name.

4.7 Call control services H.323 defines lot of supplementary services (H.450.x series) and SIP can support most of them very easily because of its easy extensibility, but some of them might be tougher or not possible. CGI can be used for developing SIP based service creation environments (rfc3050); using these new services in SIP can be

16 17 created and deployed rapidly. SIP defines a CGI for itself, which is little different from the HTTP, based CGI.

4.8 Interoperability

H.323 is fully backward compatible across different versions, while in SIP a newer version might discard some old features thus might lose some compatibility with different versions. In order to have interoperability among different vendors both protocols have their own interoperability events. International Multimedia Teleconferencing Consortium defines a complete IP telephony interoperability protocol for H.323, while SIP also conducts interoperability testing (SIPit earlier known as bake-offs) to test interoperability among vendors.

4.9 Interworking with PSTN H.323 uses subset of Q.931, which makes it easier to interoperate with PSTN, which also uses Q.931 for User to Network Interface (UNI), in SS7 (and ISUP for Network-to-Network Interface NNI). Since H.323 only supports subset of Q.931 and there is no established standard for relaying SS7/ISUP messages, H.323 can translate only portion of SS7 messages. H.323 version 4 defined a tunneling so that H.323 can act as a transparent tunnel for non H.323 signaling protocols; hence SS7/ISUP may be tunneled without translation. SIP-PSTN interaction is done by signaling translation, again there is not going to be one to one mapping, which will make it difficult to address all scenarios. There is another internet draft proposing a tunnel for the ISUP or QSIG messages in IP network by carrying them in SIP message body in hex format. There are other internet drafts describing the integration between SIP and IN services.

4.10 Interworking of SIP and H.323 There is work in progress in IETF [22] on the interworking between H.323 and SIP itself. A logical entity known as SIP-H.323 Interworking Function (SIP-H.323 IWF) is described which would allow the interworking among this major VoIP signaling protocols. The SIP-H.323 IWF should register with H.323 gatekeeper and SIP servers if they are present. It may also register the SIP entities with H.323 gatekeeper and vice-versa. The IWF should support all addressing schemes of both H.323 (alias address) and SIP (URL) protocols. The IWF should try to map H.245 and SDP to maximum extent and shall conform to call signaling procedures on each side independent of other side. The presence of IWF should be transparent for both sides and it will maintain the call sequence in such way.

5 Quality of Service in VoIP

The goal of a voice communication system is to pass on the voices of the participant within some predetermined measurements of quality [23]. The quality of the system is not only determined by the quality of the voice but also by how long the transmission time is in order for interactive communication to take place. This chapter we will survey the way that quality of service (QoS) is guaranteed in a VoIP implementation. The chapter is divided in to three sections; quality discussion, key issues and measurements for QoS.

5.1 Quality Discussion When implementing a VoIP system decisions about what speech quality this new network will be capable of delivering are made. The quality can hardly be noticeably worse than the PSTN network, or the users will have a hard time adapting to the new system. Assuming that the implementation is using VoIP hardware without quality defects the next important area to start being concerned about is the wide area network itself; that is, the data pipe that will carry speech packets around. A congested WAN consisting of burst packet loss, latency, and jitter all work to degrade the quality of VoIP calls. Different VoIP implementations suffer differently from these effects, however these effects all contribute to lower quality calls in some sense.

All kinds of loss in quality due to slow/unpredictable IP traffic patterns can be compensated by an increase in bandwidth. Although this approach often brings extra costs and there may be an imminent risk that the cost for the VoIP solution is higher than the conventional, circuit-switched PSTN calls. Even though piping voice and computer data over the same WAN brings with it some nice single-network benefits (and savings), most would argue that properly deployed VoIP should also be cheaper than long-distance PSTN.

Most manufacturers of VoIP solutions generally assume a generally high level of QoS, and basing the settings and performance requirements in the system as a static entry. Though, IP QoS often varies with time-of-day traffic patterns and sometimes takes major hits during very unpredictable data traffic events. On congested IP networks routers will discard packets if their packet inflow exceeds their outflow for a given data route, leading to a big loss of packets when the network gets too busy. If there were some way to measure when things were starting to degrade, VoIP networks could attempt to dynamically improve the quality of current and subsequent calls. Very few VoIP products today have a way of dynamically monitoring QoS, let alone intelligently adapting to changing real-world QoS conditions. The end users’ perception of quality is ultimately the measure for what requirements the VoIP systems will have. The quality will therefore be a subjective measure and testing quality will involve many users opinions. Four combinations of qualitative and quantitative measures that are used to determine the quality requirements of a VoIP solution is:

 Service availability (99.999 %)

 Call set up time (less than two seconds for a local call)

 Voice delay (less than 150 ms one way)

 Voice quality (minimal echo and disturbing sounds)

5.2 Key Issues The QoS issues specifically interesting with regards to VoIP are all part of general QoS issues in packet networks. The way that the issues are significant may be a bit different though since the metrics for quality in interactive communication are different from general data traffic. This section will list and describe the key issues concerning VoIP QoS.

Different versions of packet delay cause problems. One is echo, which is caused by the signal reflections of the speaker's voice from the far-end telephone equipment back into the speaker's ear [36]. If the round-trip delay is less than 50 milliseconds this is not considered to be a problem since the speaker cannot hear any difference between the echo and what he/she is saying. However, when the delay is greater than 50 milliseconds (almost always the case in VoIP) the echo can be distinctly heard by the speaker. As echo is perceived as a significant quality problem, voice-over-packet systems must address the need for echo control and implement some means of echo cancellation.

Another delay problem is talker overlap, which is same as when one talker is stepping on the other talker's speech. This problem occurs when the one-way delay becomes greater than 250 milliseconds. The end-to- end delay constraint is therefore the major limitation and main requirement for reducing delay through a packet network intended for VoIP use.

In order to understand the quality issues the delay phenomenon has to be dissected and will now be explained below.

5.2.1 Accumulation Delay (sometimes Called Algorithmic Delay) In order for a voice coder to be able to process a packet of encoded voice the collection of the voice data has to be done. The size of a frame of voice samples to be processed by the voice coder varies; it is related to the type of voice coder used and varies from a single sample time (.125 microseconds) to many milliseconds. A representative list of standard voice coders and their frame times is shown in table 3

Codec Bandwidth Bandwidth Sample Latency Consumed Consumed with cRTP G.729 w one 10-ms 40 kbps 9.6 kbps 15 ms sample/frame G.729 w four10-ms 16 kbps 8.4 kbps 45 ms sample/frame G.729 w two10-ms 24 kbps 11.2 kbps 25 ms sample/frame G.711 w one 10-ms 112 kbps 81.6 kbps 10 ms sample/frame G.711 w two 10-ms 96 kbps 80.8 kbps 20 ms sample/frame Table 3 Standard voice coders. Source: [8] Voice over IP Fundamentals, Cisco Press

5.2.2 Processing Delay Encoding and collecting samples into a packet for transmission causes is a something that causes delay over a packet network. The encoding delay is a function of both the processor execution time and the type of algorithm used. Furthermore it is common that multiple voice-coder frames will be collected in a single packet to reduce the packet network overhead. A single packet may contain three frames of G.729 code words, equivalent of 30 milliseconds of speech, thus creating a delay while this data is processed and collected.

5.2.3 Network Delay The physical medium and protocols used to transmit the voice data is another source of delay. In the network delay we can also count the delay caused by the buffers used to remove packet jitter on the receive side (see jitter below). Network delay is a function of the capacity of the links in the network and the processing that occurs as the packets transit the network. The jitter buffers add delay, which can be a significant part of the overall delay, as packet-delay variations can be as high as 70 to 100 milliseconds in some frame-relay and IP networks. Thus, compensating for jitter in the network will create a trade off of total delay ad variation delay.

5.2.4 Jitter Jitter is a name for the fact that the delay of individual packets that are transmitted may vary in a packet network. Jitter is a variable interpacket timing caused by the network a packet traverses. In order to compensate for this the receiver, or an intermediate node, have to create buffers and pass on the packets from it with constant intervals. In other words the buffers collects packets and holds them long enough to allow the slowest packets to arrive in time to be played in the correct sequence. All this buffering is an additional delay.

As stated above there are two conflicting goals of minimizing delay and removing jitter, and the trade off between them is decided upon the VoIP implementation specific characteristics. In order to have a flexible system implementations have engendered various schemes to adapt the jitter buffer size to match the time- varying requirements of network jitter removal. Such an adaptation aims at minimizing the size and delay of the jitter buffer, while at the same time preventing buffer underflow caused by jitter.

An approach to minimize the jitter buffer is to measure the variation of packet level in the jitter buffer over a period of time and incrementally adapt the buffer size to match the calculated jitter. This approach works best with networks that provide a consistent jitter performance over time, such as ATM networks. Another approach is to count the number of packets that arrive late and create a ratio of these packets to the number of packets that are successfully processed. This ratio is then used to adjust the jitter buffer to target a predetermined, allowable late-packet ratio. This approach works best with the networks with highly variable packet-interarrival intervals, such as IP networks.

5.4 Measures for achieving QoS

In short, taking measures so that a VoIP solution meet the QoS requirements can be done in two ways, from the sender side and from the receiver side. We will start to describe the measures that can be taken from a sender side. In the sender side we include the set up measures that can be taken.

5.4.1 cRTP Table 1 indicated that there is a way to compensate for the fact that the overhead is so large in the IP packets in a VoIP solution [8]. cRTP enables the feature to compress the 40-byte IP/RTP/UDP header to the order of 2 to 4 bytes most of the time. The header is compressed to 2 bytes when UDP checksums are not used, and 4 when they are used. The compression is clear in table 1 and the advantages include both less transmission delay and better bandwidth utilization. Figure 1 shows the effect of the header compression.

Before RTP Header Compensation

40 bytes

U R IP D T Payload P P 20 to 160 bytes Header After RTP Header Compensation

2 to 4 bytes

Payload

IP/UDP/RTP Header 20 to 160 bytes

Figure 4: cRTP header compression.(Source: [8] VoIP Fundamentals, Cisco Press). cRTP and TCP header compression has some techniques in common, namely that they both use the fact that half of the bytes in the headers are the same. The big compression gain in cRTP is achieved through using the fact that the header does not vary significantly over time. Thus the algorithm can just add a 1 to every value received.

5.4.2 Queuing The way that packets are passed through a network has an effect of the QoS in the sense of among other things delay. In different parts of the network the packets are queued and the way that this queuing is done affects the delay. Having the voice packets being passed through even though other resources are queuing up lots of packets may lessen the delay, and may be justified by the fact that delay is such an important criteria for different kinds of packets.

Basic queuing can bee described as FIFO, or first in – first out. Packets are passed through the network on a first come first serve basis. This causes the voice packets to be waiting in the queue behind less important packets. An alternative is to use WFQ, weighted fair queuing, which enables several queues. Different queues will exist for different classifications, and then equal bandwidth is allocated to each class. This makes the voice packets flow even though a bandwidth consuming application, such as FTP, is running simultaneously.

5.4.3 RSVP With RSVP the endpoints of a connection may signal the network which kind of QoS that is needed for an application. The use in lets endpoints request bandwidth and latency requirements. The applications using RSVP receive feedback whether the requested specifications are met and the application may then select to use an alternative route (such as PSTN).

One of the drawbacks of RSVP is scalability. RSVP has not been implemented yet in a large scale and problems ay arise for backbone routers that may have to reserve and serve several thousands of flows with different reservations. The scalability issue has led to the fact that implementations of RSVP are pushed towards the edges of the network and the backbone remains RSVP free.

Now, as stated before the second way to achieve QoS is measures taken on the user side. Bellow follows a listing of some common receiver side measures

5.4.4 Lost-Packet Compensation

A lost packet over the network can cause problems, depending on the type of packet network that is being used. IP networks do not guarantee delivery and they will most often exhibit a higher degree of lost voice packets than ATM networks.

Current IP networks treat all voice frames like data. In the case that frames have to be dropped, voice frames will be dropped equally with data frames. The data frames are not needed within the specific delay requirements that are needed for voice packets, and dropped packets can be appropriately corrected through the process of retransmission. Voice packets, however, has a maximum end-to-end delay requirement and the result of this is that retransmission in reality takes to long time. There are a few schemes used by VoIP implementations that aims to address the problem of lost frames: Interpolating: Using this compensation, lost speech packets are compensated by replaying the last packet received during the interval when the lost packet was supposed to be played out. The scheme fills the time between non-contiguous speech frames and works well when the incidence of lost frames is infrequent. The scheme does not work well if there are a number of lost packets in a row or a burst of lost packets since this leads to the same packet being repeated so long that the receiver will notice the compensation. Send redundant information: Transmitting the speech information in several packets will create means to compensate for packet loss on the expense of bandwidth utilization. This approach replicates and sends the nth packet of voice information along with the (n+1)th packet. The sustained advantage of this method is that the receiver can of correct for the lost packet exactly. The downside is that the approach uses more bandwidth (more information is transmitted) and creates greater delay (larger transmission time). Hybrid approach: A combination of the two schemes can be created with a much lower bandwidth voice coder in order to provide redundant information carried along in the (n+1)th packet. This combination reduces the problem of the extra bandwidth required but fails to solve the problem of delay.

5.4.5 Echo Compensation In a telephone network, echo is caused by signal reflections generated by the hybrid circuit that converts between a four-wire circuit (a separate transmit and receive pair) and a two-wire circuit (a single transmit and receive pair). These reflections will cause the speaker to hear his or her own voice in the phone. In a conventional circuit-switched telephone network the echo is present, however because the round-trip delays through the network are smaller than 50 milliseconds the echo is masked by the normal side tone every telephone generates.

The echo problem must be dealt with in VoIP networks since the round-trip delay through the network is almost always greater than 50 milliseconds. Echo-cancellation techniques are always used in VoIP solutions. The ITU standard G.165 defines performance requirements that are currently required for echo cancellers. Echo is generated by the telephone network and transmitted in the packet network. The echo canceller compares the voice data received from the packet network with voice data being transmitted to the packet network. A digital filter takes away the received component of the voice data that is on the way to be transmitted.

6. VoIP over Wireless (VoIPoW)

VoIP has not only gaining ground on landline networks but also is getting lot of interest in the wireless network also especially with the wireless world's impending move to third-generation (3G) networks. Third-generation radio-access technology will provide a common IP-based service platform, which will offer mobile users real-time and interactive services [24]. The advantage of running IP all the way is service flexibility. There will be no dependency between applications and network; hence anyone can develop new applications. The challenge for VoIPoW is to achieve quality and efficient use of spectrum. The present day cellular service has high radio performance, but low service flexibility – because they are vertically integrated and optimized. But we suffer lot of protocol overhead by combining radio interface with IP packets. However we can overcome this problem with header compression (rfc2508).

There are voice codecs, which are developed to have high spectrum efficiency for usage in cellular network. Since cellular radio links have high bit error rates they have unequal error protection (UEP) and unequal error detection (UED). The codec information must be available in radio-access network to employ UEP and UED. Without this kind of mechanism all bits should be protected according to most sensitive one, which will lead to reduced system capacity. Also a cyclic redundancy code (CRC) that will cover entire voice frame should be used and when the number of faulty frames increases voice quality decreases hence the quality of the channel must be improved to regain voice quality that will result in reduced overall capacity.

6.1 Header compression for VoIP in Wireless

The voice packets generated by the speech coder are encapsulated in RTP, UDP and IP before passed in network. A speech frame carried by IP/UDP/RTP has a header size of 40 bytes and the size of the payload itself will be 15-30 bytes depending on the codec. Because of this the overhead is very high compared to the small speech frame size. There is a high degree of redundancy in the headers of consecutive packets that belong to the same stream hence they can be compressed. The algorithms that do header compression maintain the context, which is nothing but the uncompressed version of the last transmitted header, at both end of the channel. The compressed header carries only the changes in the context. If the compressed header gets lost (or damaged) over the channel the context at the receiving side cannot be updated properly. Hence the header compression schemes must have mechanisms for detecting out of data context and repairing it. IETF has standardized many header compression schemes. One of which is compressed Real Time Protocol (cRTP) that will reduce the 40-byte header to minimum of 2 bytes. We discussed this in the QoS section as well. cRTP sends a context update request to repair context; hence the round-trip time over the link will limit the efficiency of context-repair mechanism. Another header compression scheme that provides a high degree of compression and is robust for cellular usage is Robust Checksum-based header Compression (ROCCO). ROCCO header has a code, which provides the decompressor with enough information to repair context locally even when several consecutive packets have been lost or damaged. This eliminates the negative effect of long round-trip delays. Comparisons [24] have shown that ROCOO has superior performance with respect to robustness, compression ratio and capacity with respect to CRTP and also will make VoIP an interesting alternative to circuit switched cellular speech service.

6.2 VoIP in GPRS

A challenging task is allowing VoIP applications over packet switched mobile networks such as GPRS [25]. 3GPP has defined 400 ms to be upper limit for the transmission delay of data from audio applications. But perceived service quality will start dropping if 150ms delay is exceeded, hence the end-to-end delay aimed should be 150ms.

6.2.1 Enhancements required in GPRS for VoIP

A header compression mechanism previously discussed is necessary in GPRS. Header compressions can reduce the combined RTP, UDP and IP headers from 40 bytes to 2-4 bytes in ideal conditions.

The Temporary Block Flow (TBF) which is established between the Mobile Station (MS) and network takes about 100-150 ms for uplink and 60-100 for downlink in ideal case. TBF has been designed with the assumption that large application packets should be transmitted in a single TBF. The standardized TBF handling does not meet the requirement of minimum end-to-end delay for real time applications. To avoid the frequent TBF releases and re-establishments within a talk spurt itself, the TBF release can be delayed by the inter-arrival time of the speech frames or more.

Another alternative could be having a semi-permanent TBF, which will establish a TBF at the beginning of the call and will be released, at the end of the call. During inactive periods the assigned resources are reduced to minimize the resource waste and to multiplex the best effort data during silence periods.

Permanent TBF is nothing but equivalent to circuit switched mode where a Packet Data Traffic Channel (PDTCH) is permanently allocated to an MS. The permanent TBF will eliminate the resource utilization gain a pure packet switched approach has; hence it can be optimized such that real time data and best effort data can still be multiplexed. This can be achieved by defining a special block format for best effort data and multiplexed into the idle periods of real time flow. By applying this delay can be reduced considerably till 80 ms in an ideal condition. By applying admission control for the real-time flows only 50% of the capacity might be utilized hence additional best effort traffic should be added to increase the utilization of the GPRS network.

6.3 VoIP in UMTS

Third Generation (3G) partnership projects now allow universal roaming characteristics and hence 3G mobile systems are referred as Universal Mobile Telecommunication Systems (UMTS). UMTS first phase Release1999 (R99) was a logical evolution from 2nd generation system architecture. The second phase of UMTS is called Release2000 (R00) is a complete revolution introducing new concepts and features. The standardization of this will get completed only in late2001/early2002 and commercial operation will begin in 2004. [26] R00 all-IP UMTS specifications replace the circuit-switched transport technologies by packet switched (e.g. IP) transport technologies. It also introduced multimedia support for UMTS core network. Also defined were open standardized interfaces for UMTS service architecture leading to evolution of open service architecture (OSA) that will allow third parties to develop services. This will also enhance the portability of telecommunication services between networks and terminals. This is called virtual home environment (VHE) in 3G.

Usage of VoIP, which results in end-to-end IP sessions with higher bandwidths as in UMTS, opens a whole new set of multimedia services for mobile end users. Delivering these services is one of the main drivers for UMTS. Also using the same IP technology in both fixed and mobile networks facilitate interworking between them and the development of new services is provided in a consistent way. One big challenge ahead for real-time VoIP service is provisioning of enough QoS especially in this context of mobile networking, controlling the delays introduced by handover, manage scarce radio resources and also perform admission control.

Some of the new features introduced by R00 are provisioning of multimedia services as a additional capability to the existing packet-switched services, enabling a circuit-switched architecture independent of the bearer (i.e. packet based network transport replaces circuit switched transport), network architecture is independent of transport layer (ATM or IP). In all-IP core network, all data will be transported on IP including circuit-switched voice data. R00 supports both circuit-switched voice service and IP-based multimedia services. Because of requirements like backward compatibility, supporting roaming/handover of 2G networks etc there will be three types of 3G mobile terminals, they are circuit-switched, packet- switched (IP based) and those which support both modes. Circuit switched voice is optimized in terms of bandwidth and quality while packed-switched is flexible in terms of services supported and allows introduction of multimedia applications.

3GPP has decided to use SIP as call control protocol between terminals and mobile network [refer Appendix A on comparison of SIP-H.323v4 which in part influenced the decision]. Dedicated server in the network will provide interworking with other H.323 terminals. The elements in this architecture are

 MSC Server that controls calls from circuit-switched mobile terminals and mobile terminated calls from a PSTN/GSM/ISDN to a circuit-switched network. This server interacts with media gateway control function (MGCF) for call to/from PSTN. Also R00 introduces separation of functionality of MSC where call control and services part is maintained in MSC server and an IP Router – media gateway (MG), will replace switch.  Call state control function (CSCF) is a SIP server that provides and controls multimedia services for packed switched (IP) terminals either mobile or fixed.  MG at UTRAN side, which transforms VoIP packets into UMTS radio, frames.  MG at PSTN side, which translates calls from PSTN to VoIP, calls for transport in UMTS core network.  Signaling gateway (SG) that relays call-signaling to/from PSTN/UTRAN on an IP bearer (tunneling) and sends the signaling data to MGCF.  Home subscriber server (HSS) is an extension of home location registry database with subscriber’s multimedia profile data.

This MGs defined are controlled by Megaco/H.248 protocol. MGCF controls the MGs via H.248. It also performs the translation of call control signaling between ISUP (used by PSTN) and SIP signaling. UMTS uses GPRS for data traffic and for voice it uses GPRS Tunneling Protocol (GTP) on top of IP for packed switched mobile terminals. The mobility problems are solved by GPRS protocols.

VoIP services can be provided in VHE using two possible ways Developing a SSP on top of SIP server and a mapping between SIP call state model and state model of IN/CAMEL (mobile version of IN – Customized Application for Mobile Networks Enhanced Logic). This kind of SSP is called SoftSSP. By this VoIP can support IN Application Protocol (INAP) or CAMEL Application Protocol (CAP) that are used by SSPs to trigger SCP when IN service control services are needed. SIP also allows new services to be defined through powerful third-party call control mechanism and can use Common Gateway Interface (CGI) or Call Processing Language (CPL) for defining new services.

6.4 Mobile Internet Telephony

A Mobile Internet Telephony (MVoIP) that integrates VoIP and mobile computing can be created as described in [27]. The handsets of this system should have multiple adapters that will have different coverage range of communication area and should be used in complementary. The system integrates Internet, Cellular Network (which will broad coverage area but small bandwidth), Wireless Local Network (which will have high bandwidth but small coverage) and PSTN. Hence the handsets should have Ethernet, wireless local area network and cellular phone network adapters. The issue of how the adapters can be used in complementary way to enlarge the mobile area and how the handoff delays can be reduced to enable smooth voice transmission are addressed.

6.5 VoIP added mobility to GSM Users

GSM service providers can provide mobility to subscribers allowing them to use either GSM handsets or H.323 terminals (IP phones or PCs) to access telecommunication services, using VoIP [28]. Telecommunications and Internet Protocol Harmonization over Network (TIPHON) specifies a media gatekeeper that provides service control functions for convergence of IP network, mobile network, fixed wireless and PSTN. There are different ways in which IP and mobile networks can be integrated. The mediation gatekeeper can serve as visitor location register (VLR) to support roaming. The base station

25 26 controller (BSC) or base transciever station (BTS) in the IP network provides wireless access to the IP network. Another scenario in which a gateway can provide interface between IP and GSM. [28] describes a scenario in which a user can move from GSM network into IP network and can use his/her H.323 terminal to receive call (instead of the mobile station) and other VoIP services. This will provide mobility to users and lot of flexibility.

6.6 VoIP in Satellite Networks

Satellite systems a part of communications infrastructure and have global coverage, reach to remote areas. Hence similar to terrestrial networks voice and data integration is inevitable in satellite networks as well. They have an increasing portion of their capacities used to carry data packets and are well positioned to enable growth of VoIP services. Many services providers have announced the inclusion of satellite links in their networks, new satellite systems that provide high-speed internet access to business and home users are being developed as well. These systems will offer VoIP service also. COMSAT, a global satellite communications provider, has a VoIP test bed using commercial VoIP solution. The test results [29] showed that satellite links also provide a reliable medium for VoIP transport and satellite propagation delay does not affect normal operation of GK/GW. Some of the observations made are: Bandwidth usage can be improved by standard voice compression and silence suppression mechanisms; congestion should be avoided by not exceeding the link capacity (using IP QoS etc). Routers should be configured to limit the UDP traffic so that it does not halt the TCP traffic. When using silence suppression the silence threshold should be adjusted to suit the level of background noise in order to avoid the treatment of background noise as speech.

7. Feature Interaction in Internet Telephony

Feature interaction occurs when several features or services, operating simultaneously, interact in such a way that interferes with the desired operation of some of the features. Feature interaction issues remain the same in traditional telephony and also internet telephony. [30] Internet telephony is different from PSTN and some of the differences resolve or prevent feature interaction problems. However it introduces some new types of interaction and makes it difficult/impossible for several techniques to prevent or resolve interactions. Since the internet telephony protocols have been designed from scratch number of difficulties present in traditional networks have been avoided. Also the internet has been developed over past decades to support variety of types of services, hence these can be leveraged to provide powerful new abilities for the telephony. The social and commercial evolution of Internet is different from PSTN and it carriers over to the Internet telephony.

7.1 Advantages of Internet Telephony

The advantage of internet telephony in terms of feature creations and interactions are many. They are discussed below.

Internet telephony signaling protocols are more expressive, which eliminates many previous limitations. As an example a for call transfer function the end system can indicate to the caller explicitly to whom the call should be transferred instead of elaborate sequence of switch hook and DTMF tones. Internet telephony is extensible while maintaining backward compatibility. The devices can query each other to determine the properties and parameters they support. Adding new control protocols for example RTSP can be done independently of telephony signaling protocol compared to analog systems in while only DTMF can be used. New services can be created that integrates telephony services with existing internet protocols and services. Forwarding or transferring a call to web page or Email is not conceptually different from forwarding it to another telephone. The payload description mechanism of web and e-mail MIME can be used to carry an arbitrary payload in the body of a signaling request. Because the signaling protocols separate the event from the description of stream it can be used to invite someone anything other than multimedia session also (like a game etc).

Since internet telephony protocols does not make a strong distinction from user to user communication and network device to network device communication it can scale from few individuals running their own end system to giant organization. The users can communicate with each other directly also rather than through intermediate servers and can bypass the provider if his current needs don’t require its services. Since the capabilities of end systems can be labeled in Internet telephony protocols, accidental calling of fax machine or modem, which occurs in traditional network, can be avoided altogether. Since the signaling protocols can use logical names for addressing they can eliminate user-level address scarcity. It also improves portability and mobility.

In Internet both signaling and media can be sent off through the same mechanism hence no need for two parallel infrastructures like the traditional networks. In PSTN architecture of the network requires signaling and media to traverse the same administrative domains, while in Internet the media can take its natural route and signaling can travel across many servers that provide third-party services. Internet has a distributed environment in which multiple providers interwork and compete on a fine-grained level. It enables programmability on a scale unprecedented in PSTN networks. The users can chose from any service provider for services regardless of whom their data connections come from. This will motivate service providers to offer services that will distinguish them from their competitors.

Internet telephony can adapt to the network load by varying the bandwidth usage within a call. Multicasting can be done without bridges because of IP support for network-level multicast protocols. The transition between multi-party telephone calls and large-scale conferences can also be made seamlessly. Strong

27 28 encryption and authentication can secure communication in Internet environment, compared to the physical security of network cables and equipment of PSTN.

7.2 Complications of Internet Telephony

Some of the Complications of Internet telephony in terms of feature interaction are

The distributed nature of the Internet itself in which separate entities that implement features may be unaware of each other or even competing and will not be inclined to co-operate to resolve feature interactions. Ease of programming and creation of new service might lead to features created by amateur feature designers who may not consider feature interaction issues thoroughly, either because of ignorance or expediency.

Media packets travel end-to-end means intermediate servers that cannot intercept them can no longer implement a number of features (like press # for new call in calling cards etc). End systems have control of call state. Although it introduces many possibilities in creation and deployment of many features, it complicates the issue when network has to impose control like in 911 in which operator must hang up the call and end systems cannot. Internet’s lack of address scarcity can complicate some common features like call screening, because the users can switch from one address to another with minimal effort to evade block. Trust model in PSTN between telephone company and user gets break down in internet. Caller-Id blocking kind of features becomes very difficult when users cannot trust the network not to reveal calling information to recipients.

7.3 Cooperative and Adversarial Interactions

The feature interactions can be categorized into cooperative and adversarial [30]. Cooperative interactions are those where all parties who implement features would consider others actions reasonable and avoid an interaction if possible. The conflicting implementations can interact in ways that will prevent the most desirable means of communication from occurring. Adversarial interactions can occur when parties involved in a call have conflicting desires and one trying to subvert others feature. For example Request forking and call forward to voicemail, Multiple expiration timers, camp-on and call-forward on busy are cooperative interactions. Outgoing call screening and call forwarding/end-to-end connectivity, incoming call screening and polymorphic identity/anonymity are adversarial interactions.

Many cooperative interactions can be prevented or made less likely by being explicit about the actions taken and their desired effects. Parameters can be added to protocols (since they are extensible) to express to servers what actions are desired. By having strong authentication of requests many problems introduced by polymorphic identities and identity forging can be resolved. Administrative restrictions like firewalls can be used to limit end-to-end connectivity. Ensuring the correct operation of features interaction is testing them and then resolves the issues.

8. AAA Usage in VoIP

Authentication, Authorization and Accounting (AAA) can be used for interdomain SIP call setup with QoS [31]. The AAA functionality can also be used for Billing of users based on their accounting information. Using AAA thus serves multiple purposes providing Security, QoS, and Accounting/Billing infrastructure for users. Interdomain IP telephony is accomplished using clearinghouse services and a combination of proprietary and standard AAA protocols.

For providing end-to-end QoS for a phone call, Authentication has to be provided between devices (e.g. SIP phones) to service provider, end user to service provider, service providers to clearinghouse. Clearinghouse services will provide interdomain authorization. The Open Settlement Protocol (OSP) can be used for authorization and accounting by service providers. IP Security (IPSec) is used for IP telephony gateway authorization. The Policies for admission control can use Common Outsourcing Protocol Service (COPS) for policy administration. The QoS policies can either be installed in router by policy server (push model) or the edge router can query the policy server (pull model).

9. Cisco VoIP

Cisco is one of the major proponents of VoIP and is fast in coming out with support for new protocols. Hence we chose to survey the products Cisco has for VoIP. Most of current Cisco VoIP products use H.323 to establish sessions but new ones are coming out with support of SIP also and use RSVP if configured to achieve QoS. The coder decoder compression schemes (Codecs) are enabled at both ends and voice data is carried using RTP/UDP/IP. Call progress indications (like DTMF) are carried in RTCP using its application-defined extension mechanism. G.711 (64kbps pulse code modulation voice coding technique) or G.729 (code excited linear prediction compression at 8kbps) are used in Cisco 1750 router (small office voice gateway). There is a process of expediting the determination of packet destination and getting it to the output queue quicker to reduce delays. Jitter is compensated in Cisco voice devices by setting up a playout buffer to playback voice in smooth fashion. Echo cancellers can be enabled in Cisco's voice implementations with echo trails (amount of time waited for reflected speech to the received) configurable (8/16/24/32 ms). Both loop start and ground start access signaling (determining when line has gone off- hook or on-hook) are supported by Cisco's voice implementations and is configurable.

9.1 Cisco VoIP Products and Developments

[32][33] Cisco developed Architecture for Voice, Video, and Integrated Data (AVVID). The AVVID components are all IP based and by combining them one can create a phone PBX system. Cisco produced IP phones using AVVID they are 12SP+ and 30VIP, 2nd generation IP phones 7960 (XML based display feature with SIP support), 7940, and 7910's . Another important component is a call manager, which performs, tasks and processes associated with traditional Private Branch Exchange (PBX). Cisco has gateways for connecting AVVID phones to PSTN; they are Analog telephone adapters ATA 181 which uses H.323, while ATA182, ATA 186 can use either of H.323/SIP. Cisco 1750 routers carry voice traffic using VoIP and require voice interface cards (VICs). Other products include Cisco 827 ADSL Router support VoIP using H.323, uBR 924 is VoIP cable CPE device, CVA 122 a consumer form-factor VoIP cable modem, IAD2400 family for integrated access device with WAN and telephony options, Cisco 26xx, 36xx mid sized voice gateways (supporting both H.323/SIP and MGCP), Cisco 7xxx Enterprise voice gateways (with H.323v2 support), AS5xxx voice gateways with H.323v2/SIP support, MGX 8260 media gateway, SC2200 SIP-SS7 gateway, SIP proxy server etc.

SIP based IP phones support call hold, call transfer, 3rd party call control, diversion header, message waiting indicator etc. in version 2 itself while XML, telnet support auto-dial, auto-answer, dynamic login/logout and credentials, REFER method, enhanced subscribe, notify will be the added features in

29 30 version 3. In future they plan to have PDA interface, MIME encoding, RSVP support, INFO method, web portal interface, Instant messaging etc. Interactive voice response (IVR) in Cisco's voice gateways can be done by loading TCL based IVR or VoiceXML. To support QoS following are the functions which are currently supported/developed priority based queue, RSVP support, Distributed header compression from 40 bytes to 2-4 bytes, fragment large packets and interleave with voice packets over WAN links to reduce voice delay and jitter. Cisco developed Gatekeeper Transaction Message Protocol (runs on TCP) as a text based protocol mirroring RAS for the VoIP signaling.

9.2 Accounting and Billing for Cisco VoIP products

Accounting and Billing is very important for VoIP because the commercial deployment of it depends on whether the service providers can account for the usage and bill properly. We discuss in this section the accounting and billing possibilities for Cisco's VoIP products, but the discussion can be applied to any VoIP product provided they have some kind of support infrastructure in place for gathering accounting information.

Remote Authentication Dial In User Service (RADIUS) specifies the protocol used for authentication and authorization. Later it has been extended to cover delivery of accounting information [34] The Network Access Server (NAS) operates as a client to RADIUS accounting servers and is responsible for passing user accounting information to designated RADIUS accounting server. The RADIUS accounting servers acknowledge the receipt with a response. They can act as a proxy for other accounting servers also. The information will be of the format Attribute-Length-Value and hence is very flexible and extensible.

Cisco voice enabled routers (i.e. VoIP gateways) provide accounting information for VoIP calls by an enhancement to the standard Cisco RADIUS accounting functionality. By this call detail records (CDRs) can be created for calls. The accounting uses standard RADIUS attributes where possible and the attributes, which cannot be mapped to RADIUS, are packed into the Acct-Session-Id attribute field. The contents of such CDRs are based on the media type of the call. Data items are collected for each connection in the gateway and there will be separate connection for incoming portion and outgoing portion of the call. There can be a separate start record (optionally turned off) and stop record for each connection having the same connection ID. Network Time Protocol (NTP) must be configured in the routers to generate records with accurate time. There might be duplicate records in RADIUS when NAS does not receive any response from RADIUS server within a time window. The duplicate CDR will be same in every aspect compared to the original record except that Acct-Delay time would have been incremented (first value of this is 0). Also Cisco has 10200 softswitch that generates call detail information for IP telephony that supports SIP,MGCP and full SS7 interconnect capability with Operational Support System (OSS) interface. OSP client has been implemented in VoIP gateways for intercarrier billing. In future OSP clients will be implemented in GKs and SIP proxies. Alternative approach is the billing and operational measurements can be extracted from the Virtual Switch Controller (VSC) a component of Cisco Open Packet Telephony architecture. Another possibility is using SNMP, gateway's call history MIBs can be accessed to collect the call information regularly. This usage information can be used for billing too.

Another latest development to be noted is IPDR (IP Detailed Record). IPDR.org initiative has been taken by many mediation platforms, billing vendors, equipment providers to define the IP usage in IPDR format. It defines the essential elements of data exchange between network elements, operation support systems and business support systems. It has defined XML data type definitions (DTD)’s for exchange of usage information and has defined the profile/parameters for VoIP also. Although it is in developing stage its adoption by major players might make it as a standard for transfer of usage information. Through IPDR's standard mechanism mediation vendors can collect usage information and aggregate, correlate them before passing on to the billing application. IPDR has interoperability demonstration among its vendors in various conferences and recently it has demonstrated accounting and billing for VoIP (by DynamicSoft) also.

10. VoIP Market

The VoIP Market has grown leaps and bounds the in the past few years and will continue to do so in the coming years. We look at some of the industry analyst's predictions for VoIP.

Gartner believes that through 2003 to 2005, most enterprises will begin the migration from circuit-based phone systems to IP telephony. Cisco and other IP providers such as 3Com constantly improve IP telephony. Other companies, such as Avaya, Nortel Networks and Siemens, are working on both the circuit-switched and IP-based systems. Gartner also believes that the move to IP telephony is a key decision in the evolution of an enterprise's network. Enterprises should begin to plan now for the transition, but they also must be certain why they are changing systems and how much it will cost. As IP voice systems become more entrenched, feature development on circuit-switched alternatives will decline — a good reason why enterprises will need an IP solution. The IP technology market is good and will get stronger. Gartner believes enterprises should continue to monitor IP telephony upgrades and improvements but should move when the time is right for the enterprise, not the IP vendor.

Almost a quarter of Internet users worldwide (166 million) will be regular users of PC-to-phone IP telephony by January 2006, says a report from Ovum, an independent analyst and consulting company. And most analysts believe that most long-distance calls will be terminated on an IP backbone within five to 10 years. Frost & Sullivan (F&S) says that the VoIP industry is entering an exciting new phase. The F&S report says that global wholesale and retail VoIP traffic topped the six and 15 billion minutes marks, respectively, during 2000, and predicts that VoIP will account for 75 percent of the world's voice telecommunications traffic by 2007. Dialpad, PhoneFree, Net2Phone are companies offering Internet Telephony. Microsoft offers free online calling through Net2Phone to users of its MSN Messenger. Yahoo will have to pay Net2Phone for its Net telephony technology a reversal from the original agreement had Net2Phone paying Yahoo for exclusivity.

Some of the recent news flashes that show that VoIP is developing into a major force and has come out of its infancy. New government regulations have permitted China Netcom to lower long-distance voice-over-IP (VOIP) call charges by up to 50 percent. In India Internet Telephony would be allowed from April 1, 2002, two years before the stipulated time. Players within India and abroad, waiting in the wings to start operations, have been in an upbeat mood with the proceedings at the iLocus.com show on IP Telephony, the first of its kind in India. Recently, Nortel announced a 10-year, $1.4 billion outsourcing deal with Cable and Wireless (C&W). Nortel will plan, design, implement and operate a backbone network for C&W in Europe and North America for VOIP traffic. It will migrate C&W's voice services to the new backbone over three years.

11. SIP Implementations Different vendors have adopted SIP and the products from them are coming out in furious pace. Other than Cisco the following are the SIP Implementations today [35].

Columbia University is a pioneer in chartering IETF on various SIP related RFCs, IDs. It has implemented a SIP library and proxy, redirect and registration server with user location programs. It supports CPL (Call Processing Language) for both client and CGI on both server and client. Also it has developed a signaling gateway between SIP and H.323.

3Com offers SIP Telephone System, which includes Signaling Server, SIP Support System, and SIP Phone. It’s a highly reliable, highly scalable, managed telephony system to support 1,000 users and beyond.

DynamicSoft is another company with SIP implementations for user agent, proxy server [redirect server, registration server], and location server. It supports CGI, CPL, in addition to servlets.

Hotsip located in Sweden and Finland develops Presence solutions - interactive applications for real-time communication - based on SIP, for fixed and mobile networks.

Ubiquity's Helmsman SIP Network Server has SIP Proxy, Redirect, Location and Registration server functions and also it has developed Desktop for SIP - a lightweight client application. Its server supports CPL.

Indigo Software is developing a Presence and Instant Messaging system based entirely on SIP. It is developing both a presence client and server, and both instant messaging client and server based on the recent IETF internet drafts "SIP Extensions for Presence" (draft-rosenberg-impp-presence-00.txt) and "SIP Extensions for Instant Messaging" (draft-rosenberg-impp-im-00.txt).

Hughes Software Systems has developed User Agent Framework, SIP Stacks and a Framework for Rapid Application Development for SIP Registrars, Proxies, Redirects and UAs. It also has signaling gateway between H.323 and SIP.

The Indigo Software's ensemble encompasses a SIP User Agent and SIP Proxy, Registrar and Location Servers, each with full CPL support on both client and server side.

Hearme has a SIP-based SoftPHONE, Alcatel has a user agent with GUI, Telogy has a cable modem reference implementation.

T-S Software's Zeus is a SIP stack which supports PSTN interface along with all SIP servers. MediaTrix's Audiotrix Phone Adapter II is a gateway for PSTN. Komodo also has PSTN interface. Neura also has PSTN to IP gateway and softswitch using SIP for call set-up with user agents. sip:hotfoon.com, sip:sipfx.com, sip:sipaccount.wcom.com, sip:sipcenter.com, sip:www-db.research.bell- labs.com, sip:zdots.com, sip:[email protected], are some of the public SIP servers in SIP URL format.

Agilent Technologies has distributed VoIP VQT (voice quality testing) mechanism gives service providers the ability to perform VoIP testing anytime, anywhere.

12. Future of VoIP

In the past few years' voice over IP has become a real alternative to traditional telephony. Although there are two competing signaling protocols like H.323 from ITU and SIP from IETF they will co-exist together for now. H.323 has lot of deployments today, but SIP is becoming very popular and has become the choice of all new products being developed and the adoption of SIP by UMTS is shot in arm for it. Lot of protocols, which have been defined by IETF, and the work in progress has more or less completed the portfolio of internet telephony. Still minor enhancements like the feature interaction related extensions, integration with IN and presence, mobility extensions etc are getting defined. But SIP is an extensible protocol and the existing implementations need not change much to adapt to the new functionalities. One of main areas of concern has been QoS. Lot of that having been addressed by the developments in IETF, they can be used by VoIP. Although we think QoS will be implemented only in private networks and most probably not in the current open public Internet. Hence VoIP will remain to be a best effort only in public internet. Another area of development is accounting and billing. Without that any commercial deployment of them will be unsuccessful. One of the areas that are still getting addressed is scalability and reliability. With megaco protocol some of that concern goes away. Voice over IP will become the main standard with 3rd generation wireless networks. Even in the existing wireless network it can be an alternative. Over time all the new networks built will be both for voice and data with VoIP used for voice. With lot of integration work going on between different networks using VoIP the mobility of the users and flexibility of using any device of choice will be quiet common. It will take the reachability of users to a new high level. It will minimize the costs and maximize the benefits. In brief we see a world of VoIP exploding in the next few years that will definitely improve the experience of communication for users.

Appendix A

A table comparing SIP and H.323 functionality for Mobile UMTS Networks (Source:[21] A Comparison of H.323v4 and SIP by Nortel Networks)

Criteria H.323 SIP Choice/Reason Complexity Very complex Simple SIP – TTM / reduced complexity of development Message Set Complex, many messages Logically numbered SIP – TTM and for similar functionality responses for extension, extensibility smaller set of messages for same funcitonality Debugging Have to alter tools on each Simple Tool developed SIP – TTM / reduced extension. once complexity of development Re-use of code H.323 and H.32x SIP and Web SIP – more modular Service and H.323 and H.32x SIP and Web - more SIP – more modular Protocol modular Interactions Methods for Can support all Can support all Equivalent. implementing services Distributed Call Can Support Can Support SIP – TTM / reduced Signaling complexity Extensibility Extensible More Extensibility SIP – more options for extension Version YES YES – the Requires, SIP – more flexibility to Compatability Supported and Proxy- support for multiple Require headers provide variants co-existing. more flexibility than H.323 Feature Evolution Same as above Same as above Same as above Operators in charge Less Ability – more Higher Ability – text SIP – Operators will be of own services complex ASN.1 formats and extension less dependent on vendors headers. to add new services. Modularity Umbrella Standard – Modular designed around SIP – built for web. H.323 designed for limited other web technologies originally derived from feature set. and can do GSTN circuit world. services too. Codecs Equivalent Equivalent Equivalent 3rd party CC Facility redirect Also header Equivalent Scalability Installed base designed Designed for it Equivalent1 for reliable transport Wide Area Support YES YES Equivalent Large Number of YES YES Equivalent Calls Call States Can do both Can do both Equivalent Elements that must Clients, MC, MGCF – UA, MC, MGCF, CSCF Equivalent maintain states CSCF optional optional Msg processing More processor overhead, Less processor overhead, Comparable – bandwidth

1 There are still issues with reliability for H.323 over UDP as there is no session layer acknowledgement to the connect defined. Installed based designed to be more stateful with reliable underlying transport.

smaller messages larger messages vs. component complexity decision Conferencing All modes – H.224 floor All modes – GCCP, Comparable – No RFC control SCCP or even H.224 floor exists saying which to use control for SIP. DCS Would have to be altered A closer original design SIP – TTM more than SIP fit Resources No opinion No opinion Comparable Air-link bandwidth Smaller Messages Larger Messages H.323 – smaller messages CPU More processing Less processing SIP – less processing QOS/RRM Same Issues Same Issues Equivalent Interactions Services High TTM Low TTM SIP – long term lower TTM and complexity Supported Services H.323 more explicitly SIP defined in Equivalent but H.323 defined whitepapers/drafts better standardization Delay Times Equivalent – still issues Equivalent Equivalent with use of UPD and reliability which are related Billing Needs work – consortia Needs work – consortia Comparable defined imbedded in defined – separate protocol protocol GSTN services YES YES SIP – TTM / less code Capabilities Better for media – worse Worse for media – better SIP – signaling is more of Exchange for signaling extensibility for signaling extensibility an issue. Personal Mobility Added nomadicity later v3 Designed for nomadicity Comparable – location based services – location based services still ongoing still ongoing Legacy H.246 Draft status H.323 interoperability IP telephony Monolithic / OS bundled DCSGROUP/ MGCP/ SIP interoperability client SDP Security H.235 added later. Worse Designed for it originally. Comparable for firewall traversal using Better for firewall UDP. traversal. OA&M Equivalent – consortia Equivalent – consortia Equivalent defined defined Procedures Loop Back Invite with SDP loopback Comparable – both have available media value and no MIBs defined by alerting option consortia. Fault Detection See above See above See above

References

1. Daniel Collins, “Carrier Grade Voice over IP”, McGraw-Hill publication 2001 chapter 1 2. Hong Liu and Petros Mouchtaris, “Voice over IP Signaling: H.323 and Beyond”, IEEE Communications Magazine, October 2000 3. ITU-T Rec. H.323, “Packet Based Multimedia Communication Systems”, Feb 1998 4. http://www.packetizer.com/iptel/h323 5. http://www.openh323.org/ 6. http://www.itn.int/itudoc/itu-t/rec/hs/s_h323.htm 7. http://www.csdmag.com/main/2000/11/0011feat1.htm 8. Jonathan Davidson, “VoIP Fundamentals”, Cisco Press 2000 9. Greene, N.et.al., “Media Gateway Control Protocol Architecture and Requirements”, RFC 2805, IETF, April 2000 10. Tony Rybczynski, “Clients thick, Clients Thin”, Enterprise Solutions. Communications Solutions Magazine, Volume 1, July 2000. 11. H.Schulzrinne, S.Casner, R.Fredrick and V.Jacobson, “RTP: a transport protocol for real-time applications”, RFC1889, IETF, Jan.1996 12. M.Handley, H.Schulzrinne, E.Schooler and J.Rosenberg, “SIP: session initiation protocol”, RFC2543, IETF, Mar.1999 13. Henning Schulzrinne and Jonathan Rosenberg, “Internet Telephony: Architecture and Protocols an IETF Perspective”, Computer Networks and ISDN Systems, vol.31, pp 237-255, Feb.1999 14. S.Donovan, “The SIP INFO Method”, RFC 2976, IETF, Oct.2000 15. H.Schulzrinne, R.Lanphier, and A.Rao, “Real time stream protocol (RTSP)”, RFC2326, Apr 1998 16. V.Jacobson and M.Handley, “SDP: Session Description Protocol”, RFC2327, IETF, Apr.1998 17. M Handley C. Perkins E. Whelan, “SAP: Session Announcement Protocol”, RFC2974, IETF, Oct- 2000 18. S.Petrack, L.Conroy,”The PINT Service Protocol: Extensions to SIP and SDP for IP Access to Telephone Call Services”,RFC2848, June 2000 19. http://www.cs.columbia.edu/sip/papers.html 20. Ismail Dalgic and Hanlin Fang, “Comparison of H.323 and SIP for IP Telephony Signaling”, Proc of Photonics East, Boston, Massachusetts, September 20-22, 1999. 21. Nortel Networks, “A Comparison of H.323v4 and SIP”, 3GPP S2, Tokyo Japan, S2-000505 22. Agarwal et.al., “SIP-H.323 Interworking Requirements”, IETF Internet Draft, Apr 2001 23. Bo.Li et.al., “QoS-Enabled Voice Support in the Next Generation Internet: Issues, Existing Approaches and Challenges”, IEEE Communications Magazine, April 2000 24. Krister Svanbro, Jonas Wiorek, Brigitta Olin, “Voice-over-IP-over-wireless”, Personal, Indoor and Mobile Radio Communications, 2000. PIMRC 2000. Vol.1 11th IEEE International Symposium 25. Andreas Schieder and Tobias Ley, “Enhanced voice over IP Support in GPRS and EGPRS”, Wireless Communications and Networking Confernce, 2000. WCNC. 2000 IEEE , Volume: 2 , 2000 Page(s): 803 -808 26. Lieve Bos and Suresh Leroy, “Toward an All-IP-Based UMTS System Architecture”, IEEE Network, Jan/Feb 2001 27. Chyi-Nan Chen et.al., “The study of Mobile Internet Telephony”, Multimedia Software Engineering, 2000. Proceedings. International Symposium on , 2000 28. Herman C.H. Rao, Yi-Bing Lin, Sheng-Lin, “iGSM: VoIP Service for Mobile Networks”, IEEE Communications Magazine, April 2000 29. Thuan Nguyen et.al., “Voice over IP Service and Performance in Satellite Networks”, IEEE Communications Magazine, March 2001 30. Jonathan Lennox and Henning Schulzrinne, “Feature Interaction in Internet Telephony”, Proceedings of the Sixth International Workshop on Feature Interactions in Telecommunications and Software Systems, May 2000. 31. H.Sinreich et.al., “AAA Usage for IP Telephony with QoS”, IETF Internet Draft, May 2001 32. “Cisco – Ecosystems developer conference 2001 – Voice sessions”

33. Cisco Voice Over IP Implementation and Products, http://www.cisco.com/pcgi- bin/Support/PSP/psp_view.pl?p=Internetworking:VoX:VoIP 34. C.Rigney, “RADIUS Accounting”, RFC2139, IETF, Apr.1997 35. “SIP Implementations” - http://www.cs.columbia.edu/sip/implementations.html 36. http://www.iec.org/tutorials/vfoip/index.html 37. http://www.iec.org/tutorials/int_tele/