<<

multimedia scheme or because the call is being placed onto the PSTN. Figure 1 displays processing that must occur between the user’s voice input and output. The diagram illus- Voice over IP trates the necessary steps to achieve packetized voice, from data processing Hey by the digital signal processor (DSP) to baybee... what’s transmitting packets over IP. There are happening...? numerous packet-handling processes that must be encountered; hence, a nontrivial amount of latency (time delay) is present, which affects per- ceived voice quality. SS7 Sounding good Once a user dials a telephone num- ber (or clicks a name hyperlinked to a telephone number), signaling is on the Internet required to determine the status of the called party—available or busy—and to establish the call. Signaling System 7 (SS7) is the set of protocols (stan- Princy Mehta and Sanjay Udani dards for signaling) used for call setup, teardown, and maintenance in the Public Switched Telephone Network © EXPERT GALLERY/J. NISHINAKA (PSTN). It is currently the one being used in North America to establish and he bulk of information Internet instead of solely over a dedi- terminate telephone calls. SS7 is conveyed over public cated circuit. Finally, two VoIP appli- implemented as a packet-switched net- net- cations can communicate directly with- work and typically uses dedicated T works is voice. To do out accessing the PSTN. links, nodes and facilities. In general, it this, circuit-switched is a non-associated, common channel networks are employed. Components of VoIP out-of-band signaling network allow- While circuit switching The Public Switched Telephone ing switches to communicate during a provides adequate voice quality, it can Network (PSTN) is the collection of all call. SS7 signals may traverse real or be highly inefficient. In contrast, the the switching and networking equip- virtual circuits on links that also carry Internet’s packet-switched networks are ment that belongs to the carriers that are voice traffic. much more efficient but ill suited for involved in providing telephone service. However, the industry is moving voice without judicious implementation. In this context, the PSTN is primarily toward a converged network infrastruc- Voice over Internet Protocol (VoIP) the wired telephone network and its ture to provide a more efficient and wants to provide the efficiency of a access points to wireless networks, such effective way of handling increased packet-switched network while rivaling as cellular. The overall technology call volumes as well as deliver new the voice quality of a circuit-switched requirements of an Internet Protocol and enhanced services. The integration network. (IP) telephony solution can be split into of SS7 and IP will provide significant Because voice applications are real four categories: signaling, encoding, benefits. Figure 2 depicts a type of time, they are intolerant of lengthy transport and gateway control. VoIP network utilizing an SS7-to-IP delays, packet losses, out-of-order The purpose of the signaling proto- gateway. SS7 provides the call control packets and jitter. All these problems col is to create and manage connections on either side of the traditional PSTN, gravely degrade the quality of the voice between endpoints, as well as the calls while H.323/Session Initiation Protocol transmitted to the recipient. themselves. Next, when the conversa- (SIP) provides call control in the IP Unfortunately, wireless networks exac- tion commences, the analog signal pro- network. (Neither H.323 nor SIP alone erbate the problems that are intrinsical- duced by the needs to be has a complete set of IP telephony pro- ly prevalent in their wire line counter- encoded in a digital format suitable for tocols.) The media gateway provides parts: a higher frequency of dropped transmission across an IP network. The circuit-to-voice conversion. packets, larger latency and more jitter. IP network itself must then ensure that VoIP can be implemented in several the real-time conversation is transport- H.323 ways. A Public Switched Telephone ed across the available media in a man- H.323, ratified by the International Network (PSTN)-based telephone can ner that produces acceptable voice qual- Telecommunication Union- communicate with a VoIP application, ity. Finally, it may be necessary for the Telecommunication (ITU-T), is a set of and vice versa. These telephones can IP telephony system to be converted by protocols for voice, video, and data also communicate with each other a gateway to another format-either for conferencing over packet-based net- where part of the call is routed over the interoperation with a different IP-based works, such as the Internet. The H.323

36 0278-6648/01/$10.00 © 2001 IEEE IEEE POTENTIALS protocol stack is designed to operate location. This mobility can be augment- For instance, the evolution of H.323 above the transport layer of the under- ed via wireless VoIP. from versions 1 through 4 has focused lying network. Therefore, H.323 can SIP supports five facets of establish- on decreasing call setup delay from sev- be used on top of any packet-based ing and terminating multimedia com- eral round trips to be on par with SIP’s network transport, for instance TCP/IP, munications: 1.5 round trips. This reduces its signal- to provide real-time multimedia com- • User location: determination of the ing overhead. Obviously, this conver- munication. end system to be used for communication; gence is highly desirable. (Both support H.323 specifies protocols, including • User capabilities: determination the majority of required end-user func- Q.931, H.225, H.245, and ASN.1, for of the media and media parameters to tions comparatively equally, such as call real-time point-to-point audio commu- be used; setup, teardown, call holding, call trans- nication between two terminals on a • User availability: determining the fer, call forwarding, call waiting and packet-based network that does not called party’s willingness to engage in conferencing.) provide a guaranteed quality of service communications; (QoS). The scope of H.323, however, is • Call setup: “ringing,” establishing Voice coders much broader and encompasses net- call parameters at both called and call- An efficient voice encoding and working multipoint conferencing ing party; decoding mechanism is vital for using among terminals that support not only • Call handling: including transfer the packet-switched technology. The audio but also video and data commu- and termination of calls. purpose of a voice coder (vocoder)-also nications. SIP can also initiate multiparty calls referred to as a codec (coding/decod- In a general H.323 imple- ing)-is to use the analog sig- mentation, three logical enti- nal (human ) and ties are required: gateways, transform and compress it gatekeepers and multipoint into digital data. A number control units (MCUs). of factors must be taken Terminals, gateways, and Input/Output Digital Signal Packet into account including MCUs are collectively Processing Handling bandwidth usage, silence known as endpoints. It is DSP compression, intellectual Frames possible to establish an Coding property, look-ahead and H.323-enabled network with frame size, resilience to just terminals, which are Buffering and Jitter Buffer loss, layered coding, and H.323 clients. Yet for more Packetization fixed-point vs. floating- than two endpoints, a MCU point digital signal proces- is required. It can be com- TCP/IP sor (DSPs). bined with a terminal, gate- Protocol Stack The bit-rate of available way or gatekeeper. Fig. 1 VoIP processing narrowband vocoders and handling ranges from 1.2 to 64 kbps, SIP Network Interface with an inevitable effect on Session Initiation Protocol, Device the quality of the restituted SIP, defined by the Internet PHY voice. There is ordinarily, Engineering Task Force Layer but not always, a trade-off (IETF), is a signaling protocol between voice quality and for telephone calls over IP. bandwidth used. Using Unlike H.323, however, SIP was using a multipoint control unitMCU or a today’s most efficient vocoder allows designed specifically for the Internet. It fully-meshed interconnection instead of a quasi-toll quality bandwidth usage to be exploits the manageability of IP and multicast. Gateways that connect PSTN as low as 5 kbps. Toll quality is recog- makes developing a telephony applica- parties can also use SIP to set up calls nized as the standard of a long-distance tion relatively simple. SIP is an applica- between them. The protocol is designed PSTN call. As newer and more sophisti- tion-layer control (signaling) protocol for as part of the overall Internet Engineering cated algorithms are developed, this creating, modifying and terminating ses- Task Force (IETF) multimedia data con- bit-rate will decrease. This will permit sions with one or more participants. trol architecture. It incorporates many more samples to be squeezed more effi- SIP can be employed to initiate ses- protocols, for example Resource ciently while minimally sacrificing sions and invite members to sessions Reservation Protocol (RSVP) and Real- quality, if at all. that have been advertised by other Time Transport Protocol (RTP), for prop- The algorithmic delay introduced by means, such as via multicast protocols. er functionality and operation. a coding/decoding sequence is the The signaling protocol transparently frame length plus the look-ahead size. A supports name mapping and redirection H.323 vs. SIP vocoder with a small frame length has a services. This allows the implementa- H.323 and SIP are competing to shorter delay than one with a longer tion of intelligent network telephony obtain dominance of IP telephony sig- frame length, but it introduces a larger subscriber services. These facilities also naling. Currently, there is no clear-cut overhead. Most implementations choose enable personal mobility-the ability of winner. However, the standards appear to send multiple frames per packet. end users to originate and receive calls to be evolving such that the best fea- Thus, the real frame length to take into and access subscribed telecommunica- tures of each are being implemented in account is the sum of all frames stacked tion services on any terminal in any the other’s protocol. in a single IP packet. The smaller the

OCTOBER/NOVEMBER 2001 37 connection and negotiate the media format that will be used. H.323/SIP RTP does not guarantee deliv- gatekeeper SS7 gateway Call ery or prevent out-of order SS7 terminator delivery, nor does it assume that the network can reliably deliver SS7 gateway packets in sequence. PSTN SS7 IP network RTCP VoIP SS7 gateway Real-Time Control Protocol Call originator H.323/ (RTCP) is based on the periodic PSTN VoIP SIP transmission of control packets gateway to all participants in the session. SS7 It uses the same distribution Fig. 2 SS7-based VoIP network mechanism as the data packets. The underlying protocol must provide of the data and control packets. RTCP per- frame size, the more frames in an IP Transport forms the following functions: packet; thereby, there is minimal influ- Once signaling and encoding occur, • Provide feedback on the quality of ence on latency. Real-time Transport Protocol (RTP) and the data distribution (primary function); Real-Time ITU-T specs Control Protocol Table 1 Summary of ITU vocoders The International Telecommunication (RTCP) are uti- Union-Telecommunication (ITU-T) has lized to move Voice coder Bit-rate Frame length Expected MOS a rigorous process in approving the voice pack- G.711 (PCM) 64 kbps 1 ms 4.1 vocoders. Before a vocoder is chosen, ets. Media G.723.1 (MP-MLQ) 6.3 kbps 30 ms 3.9 the ITU evaluates its mean opinion score streams are (MOS) and often requires toll quality or packetized G.723.1 (ACELP) 5.3 kbps 30 ms 3.65 better. To determine the MOS, trained according to a G.726 (ADPCM) 32 kbps 0.125 ms 3.85 evaluators rate the overall quality of predefined for- G.729A (CS-ACELP) 8 kbps 10 ms 3.7 speech samples and assign a subjective mat. RTP pro- score. Three popular ITU-approved vides delivery monitoring of its payload • Carry a persistent transport-layer vocoders are summarized in Table 1; types through sequencing and time identifier for a Real-time Transport the expected MOS can range from a stamping. RTCP offers insight on the Protocol (RTP) source, canonical scale of 1 (bad) to 5 (excellent). performance and behavior of the media name; stream, such as voice stream jitter. RTP • Controls the rate in order for RTP and RTCP are intended to be inde- to scale up to a large number of partici- Future voice coders pendent of the signaling protocol, encod- pants; and One recently established vocoder is ing schemes, and network layers imple- • Conveys minimal session control the Mixed Excitation Linear Predictive mented. information. (MELP) vocoder, which utilizes a miniscule 2.4 kbps. Another high quali- RTP Gateway control ty speech vocoder is being developed Real-time Transport Protocol (RTP) Gateways are responsible for con- based on the Multi-Band Excitation provides end-to-end delivery services verting packet-based audio formats into (MBE) model operating at both 2.4 for data with real-time characteristics. protocols understandable by PSTN sys- kbps and 1.2 kbps. The trend in indus- Those services include payload type tems. The aforementioned signaling try appears to be developing vocoders identification, sequence numbering, protocols provide more services than that utilize less bandwidth than their time stamping and delivery monitoring. are necessary, such as service creation predecessors do. Applications typically run RTP on top and user authentication, which are irrel- Since the early 1990s, the ITU has of User Datagram Protocol (UDP) to evant for gateways. Vendors have grav- forged ahead from the 64 kbps G.711 to make use of its multiplexing and check- itated towards simplified Device the more recent G.723.1 specification sum services. In fact, both protocols Control Protocols rather than all- that consumes merely one-twelfth of contribute parts of the transport proto- encompassing signaling protocols. that bandwidth. This bandwidth savings col functionality; however, RTP may The IETF standard Media Gateway commonly comes at the cost of lower be used with other apposite network- Control Protocol (MGCP) is a merger quality and robustness to hostile net- layer or transport-layer protocols. between the Internet Protocol Device work environments. Given the RTP does not intrinsically provide Control and the Simple Gateway inevitable increase in the average user’s any mechanism to ensure timely deliv- Control Protocol. The Megaco protocol bandwidth over time, perhaps this ery or provide other Quality of Service (H.248), which is still evolving, is effort would be better directed at guarantees. Instead, RTP relies on MGCP’s progeny. It contains all of improving quality first, then addressing lower-layer services to provide them. A MGCP’s functionality, plus superior bandwidth. signaling protocol also must set up the controls over analog telephone lines

38 IEEE POTENTIALS and the ability to transport multiple those of traditional data applications. packets (or their order), the data will commands in a single packet. Because they are innately real-time, arrive at very inconsistent rates. The Media gateways will be the junc- voice applications tolerate minimal variation in inter-packet arrival rate is tions that provide paths between delay in delivery of their packets. jitter, which is introduced by variable switched and packet networks for Additionally, they are intolerant of transmission delays over the network. voice. When media gateways are ini- packet loss, out-of-order packets, and Removing jitter to allow an equable tially set up for communication, a jitter. To effectively transport voice stream requires collecting packets and vocoder approach normally is used. traffic over IP, mechanisms are storing them long enough to permit the Megaco-related standards will enable required that ensure reliable con- slowest packets to arrive in time to be support of existing and new applica- veyance of packets with low and con- played in the correct sequence. The jit- tions of telephone service over hybrid trolled latency. ter buffer is used to remove the packet telephone networks containing an Another approach utilizes Resource delay variation that each packet assortment of technologies. Reservation Protocol (RSVP) which is encounters transiting the network. Each a relatively new protocol developed to jitter buffer adds to the overall delay. Wireless networks enable the Internet to support QoS. An emerging trend for implementing Using RSVP, a VoIP application can Latency VoIP-and, in general, connecting com- reserve resources along a route from Latency is the time delay incurred in puting devices-is in wireless networks. source to destination. RSVP-enabled speech by the Internet Protocol (IP) A wireless local area network (WLAN) routers will then schedule and prioritize telephony system. One-way latency is is a data transmission system designed packets to fulfill the QoS. RSVP is part the amount of time measured from the to provide location-independent net- of the Internet Integrated Service (IIS) moment the speaker utters a sound until work access between computing devices model, ensuring best-effort service, real- the listener hears it. Round trip latency by using radio waves rather than a cable time service, and controlled link sharing. is the sum of the two one-way latency infrastructure. WLANs give users wire- While QoS is an extension to IPv4- figures that compose the user’s call. less access to the full resources and ser- the current version of IP-IPv6 (the suc- The lower the latency, the more natural vices of the LAN across a building or cessor of IPv4) will inherently support interactive conversation becomes; campus environment. QoS. However, IPv6 also has a much accordingly, the additional delay For voice applications, wireless net- larger packet header, so it is possible incurred by the VoIP system is less works aggravate the problems already that while QoS will alleviate much of noticeable. In PSTN calls, the round prevalent in wireline networks: a higher the jitter and congestion voice packets trip latency of calls originating and ter- frequency of dropped packets, larger presently suffer, it could come at the minating within the continental United latency and more jitter. Furthermore, cost of increased latency. IPv6 headers States is under 150 ms. there are additional security issues: it is necessitate 40 bytes, compared with 20- In a VoIP implementation used to relatively easier for an unauthorized byte IPv4 headers, thus doubling the reduce costs, studies suggest that users device to surreptitiously eavesdrop on a overhead. This may pose trouble for will tolerate one-way latency of up to conversation. Finally, interference vocoders that only succeed with 200 ms. The 1996 ITU Recommendation between different wireless technologies diminutive packets. Nevertheless, this G.114 for one-way end-to-end transmis- must be considered when they are both larger packet overhead can be partially sion time limit is: operating on the same frequency band. offset if IPv6 provides for efficient • Under 150 ms: acceptable for most compression schemes for the header. user applications; QoS • 150 to 400 ms: acceptable provid- The basic routing philosophy on the ed administrators know of the transmis- Internet is “best-effort.” This attitude Packet loss sion time impact on the quality of user serves most users acceptably but it is UDP cannot provide a guarantee that applications; and not adequate for the time-sensitive, packets will be delivered at all, much • Over 400 ms: unacceptable for continuous stream transmission less in order. Packets will be dropped general network planning purposes. required for VoIP. under peak loads and during periods of Two difficulties are echo and talker Quality of Service (QoS) refers to congestion. Due to time sensitivity of overlap that result from a high end-to- the ability of a network to provide bet- voice transmissions, the normal TCP- end delay in a voice network. Echo- ter, more predictable service to selected based retransmission schemes are not wherein the speaker’s voice is reflected network traffic over various underlying appropriate. Approaches used to com- back-becomes a problem when the technologies, including IP-routed net- pensate for packet loss include interpo- round-trip delay is more than 50 ms. works. QoS features are implemented lation of speech by replaying the last Since echo is perceived as a significant in network routers by: packet and sending redundant informa- quality obstacle, the VoIP system must • Supporting dedicated bandwidth; tion. Packet losses greater than 10 per- address the need for echo control by • Improving loss characteristics; cent are generally intolerable, unless implementing echo cancellation. Talker • Avoiding and managing network the encoding scheme provides extraor- overlap-the problem of one caller step- congestion; dinary robustness. ping on the other talker’s speech-is made • Shaping network traffic; and worse when the one-way delay is greater • Setting traffic priorities across the than 250 ms. The end-to-end delay bud- network. Jitter get, therefore, is the major constraint and Voice applications have different Inasmuch as IP networks cannot driving requirement for reducing latency characteristics and requirements from guarantee the delivery time of data through a packet network.

OCTOBER/NOVEMBER 2001 39 phone system can never do: transport The Internet Society, Mar. 1999. Bit-rate vs. voice quality high fidelity stereo audio. A potential • Oliver Hersent, David Gurle, and As previously mentioned, many application would be allowing users to Jean-Pierre Petit, IP Telephony: Packet- developers have focused on designing call another VoIP application to listen Based Multimedia Communications vocoders that consume progressively to high quality, compressed , for Systems. Harlow, England: Addison- lower bandwidth. Moreover, many algo- instance in MP3 format, consuming a Wesley, 2000. rithms were created for using voice over mere 128 kbps. Of course, there are • “Leveraging the intelligence of a reliable circuit-switched connection other issues involved, such as the serv- SS7 to improve IP-based remote access rather than the packet-based network the er’s ability to provide music at this and other IP services,” 3Com Corp., Internet utilizes. This effort might be fidelity while being able to scale. May 19, 1999. misdirected. Most applications of VoIP • Alan Percy, “Understanding laten- rely on connectivity to the Internet, Summary cy in IP telephony,” Brooktrout where the vast majority of its users have It remains to be seen when VoIP can Technology, Feb. 1999. a 28.8 kbps or higher connection. emerge from a specialized application • H. Schulzrinne, S. Casner, R. Nonetheless, developers are still pursu- to mainstream voice communication. Frederick, and V. Jacobson, “RTP: A ing ultra-low bandwidth vocoders While VoIP technology may have pro- Transport Protocol for Real-Time instead of improving the quality of low gressed admirably, as gauged by proto- Applications,” RFC 1889, Jan. 1996. bandwidth vocoders already in exis- col and vocoder maturity, it still has tence. Perhaps this effort is intended to plenty of room for improvement as About the authors allow users to concurrently enjoy other indicated by the following drawbacks: Princy Mehta earned his MS degree bandwidth-consuming applications, • Erratic quality of voice transmis- in and such as browsing the World Wide Web. sions; Networking Engineering from the Some developers are alternatively • Unreliability of IP networks; University of Pennsylvania and his BS constructing higher quality vocoders • Standards battles; degree in Computer Engineering from that consume more bandwidth. They • Encroaching/competing wireless Rutgers University. He is currently are amenable to trading-off bandwidth technologies; and employed with Lockheed Martin Naval to achieve this quality. It is also critical • Confusing human usability factors. Electronics & Surveillance Systems- for the vocoder to tolerate mishandled, Reliability cannot be overempha- Surface Systems as a Member of dropped and out-of-order packets sized. The PSTN operates with at least Engineering Staff. A graduate of the intrinsic in the User Data Protocol 99.999 percent specified availability and company’s Engineering Leadership (UDP). Of equal importance, one-way is available even during power outages. Development Program, his professional latency should be confined to one- This cannot be said of modern VoIP endeavors include Voice over IP and quarter of one second. Finally, the applications; consequently, VoIP’s reli- network systems security, in which he vocoder should maintain an optimally ability must improve in the near future is SANS GIAC certified. Prior to his sized buffer to restrain jitter, echo, and for it to gain wide acceptance and let recent responsibilities, Princy pro- talker overlap. users sound good on the Internet. grammed in C and shell script, adminis- Since users may not endure inferior tered a variety of Unix systems, and performance, the focus should be on Read more about it developed embedded DSP applications high quality instead of ultra-low bit- • Philip Carden, “Building Voice for multiprocessors. E-mail: . users dialing up via analog to • Linden deCarmo, “The media gate- Sanjay Udani received his PhD, connect to the Internet; nevertheless, a way control protocol: A simpler and MSE and BSE/BS Economics degrees higher quality vocoder could be prefer- more reliable voice over the internet,” from the University of Pennsylvania. able to a low quality vocoder. In a cor- Dr. Dobb’s Journal, May 2000. He is currently a Distinguished porate or broadband environment, even • Bill Douskalis, IP Telephony: The Member of Technical Staff in 64 kbps is just in the line when Integration of Robust VoIP Services. Verizon’s Technology organization in the average user is allotted hundreds, if Upper Saddle River, NJ: Prentice-Hall, Arlington, VA. He is an adjunct faculty not thousands, of kilobits per second. 2000. member at the University of Another possibility is developing • M. Handley, H. Schulzrinne, E. Pennsylvania, teaching a graduate higher bandwidth vocoders to allow Schooler, and J. Rosenberg, “SIP: telecommunications course. Prior to something that the traditional tele- Session Initiation Protocol,” RFC 2543, joining Verizon he was involved in VLSI and ASIC design at Intel, and dabbled with large-scale virtual envi- www.ieee.org/gold ronment network design as well as power management for mobile comput- How do you manage your daily personal or work calendar? ing while in graduate school. E-mail: And the survey says... . Electronic hand-held calendar–28% Paper “Day Timer” –29% PC calendar–25% “Post-it” notes–11% Note: The full (unabridged) ver- Calendar on the refrigerator–7% (as of 17 Oct. ’01) sionof this paper is posted at http://www. cis.upenn.edu/~techre- • Check out this web site’s connections, programs and resources, and ports/and is listed as Technical Report • Express your opinion on the latest survey question posted. MS-CIS-01-31.

40 IEEE POTENTIALS