Voice Over IP Trates the Necessary Steps to Achieve Packetized Voice, from Data Processing Hey by the Digital Signal Processor (DSP) to Baybee
Total Page:16
File Type:pdf, Size:1020Kb
multimedia scheme or because the call is being placed onto the PSTN. Figure 1 displays processing that must occur between the user’s voice input and output. The diagram illus- Voice over IP trates the necessary steps to achieve packetized voice, from data processing Hey by the digital signal processor (DSP) to baybee... what’s transmitting packets over IP. There are happening...? numerous packet-handling processes that must be encountered; hence, a nontrivial amount of latency (time delay) is present, which affects per- ceived voice quality. SS7 Sounding good Once a user dials a telephone num- ber (or clicks a name hyperlinked to a telephone number), signaling is on the Internet required to determine the status of the called party—available or busy—and to establish the call. Signaling System 7 (SS7) is the set of protocols (stan- Princy Mehta and Sanjay Udani dards for signaling) used for call setup, teardown, and maintenance in the Public Switched Telephone Network © EXPERT GALLERY/J. NISHINAKA (PSTN). It is currently the one being used in North America to establish and he bulk of information Internet instead of solely over a dedi- terminate telephone calls. SS7 is conveyed over public cated circuit. Finally, two VoIP appli- implemented as a packet-switched net- telecommunication net- cations can communicate directly with- work and typically uses dedicated T works is voice. To do out accessing the PSTN. links, nodes and facilities. In general, it this, circuit-switched is a non-associated, common channel networks are employed. Components of VoIP out-of-band signaling network allow- While circuit switching The Public Switched Telephone ing switches to communicate during a provides adequate voice quality, it can Network (PSTN) is the collection of all call. SS7 signals may traverse real or be highly inefficient. In contrast, the the switching and networking equip- virtual circuits on links that also carry Internet’s packet-switched networks are ment that belongs to the carriers that are voice traffic. much more efficient but ill suited for involved in providing telephone service. However, the industry is moving voice without judicious implementation. In this context, the PSTN is primarily toward a converged network infrastruc- Voice over Internet Protocol (VoIP) the wired telephone network and its ture to provide a more efficient and wants to provide the efficiency of a access points to wireless networks, such effective way of handling increased packet-switched network while rivaling as cellular. The overall technology call volumes as well as deliver new the voice quality of a circuit-switched requirements of an Internet Protocol and enhanced services. The integration network. (IP) telephony solution can be split into of SS7 and IP will provide significant Because voice applications are real four categories: signaling, encoding, benefits. Figure 2 depicts a type of time, they are intolerant of lengthy transport and gateway control. VoIP network utilizing an SS7-to-IP delays, packet losses, out-of-order The purpose of the signaling proto- gateway. SS7 provides the call control packets and jitter. All these problems col is to create and manage connections on either side of the traditional PSTN, gravely degrade the quality of the voice between endpoints, as well as the calls while H.323/Session Initiation Protocol transmitted to the recipient. themselves. Next, when the conversa- (SIP) provides call control in the IP Unfortunately, wireless networks exac- tion commences, the analog signal pro- network. (Neither H.323 nor SIP alone erbate the problems that are intrinsical- duced by the human voice needs to be has a complete set of IP telephony pro- ly prevalent in their wire line counter- encoded in a digital format suitable for tocols.) The media gateway provides parts: a higher frequency of dropped transmission across an IP network. The circuit-to-voice conversion. packets, larger latency and more jitter. IP network itself must then ensure that VoIP can be implemented in several the real-time conversation is transport- H.323 ways. A Public Switched Telephone ed across the available media in a man- H.323, ratified by the International Network (PSTN)-based telephone can ner that produces acceptable voice qual- Telecommunication Union- communicate with a VoIP application, ity. Finally, it may be necessary for the Telecommunication (ITU-T), is a set of and vice versa. These telephones can IP telephony system to be converted by protocols for voice, video, and data also communicate with each other a gateway to another format-either for conferencing over packet-based net- where part of the call is routed over the interoperation with a different IP-based works, such as the Internet. The H.323 36 0278-6648/01/$10.00 © 2001 IEEE IEEE POTENTIALS protocol stack is designed to operate location. This mobility can be augment- For instance, the evolution of H.323 above the transport layer of the under- ed via wireless VoIP. from versions 1 through 4 has focused lying network. Therefore, H.323 can SIP supports five facets of establish- on decreasing call setup delay from sev- be used on top of any packet-based ing and terminating multimedia com- eral round trips to be on par with SIP’s network transport, for instance TCP/IP, munications: 1.5 round trips. This reduces its signal- to provide real-time multimedia com- • User location: determination of the ing overhead. Obviously, this conver- munication. end system to be used for communication; gence is highly desirable. (Both support H.323 specifies protocols, including • User capabilities: determination the majority of required end-user func- Q.931, H.225, H.245, and ASN.1, for of the media and media parameters to tions comparatively equally, such as call real-time point-to-point audio commu- be used; setup, teardown, call holding, call trans- nication between two terminals on a • User availability: determining the fer, call forwarding, call waiting and packet-based network that does not called party’s willingness to engage in conferencing.) provide a guaranteed quality of service communications; (QoS). The scope of H.323, however, is • Call setup: “ringing,” establishing Voice coders much broader and encompasses net- call parameters at both called and call- An efficient voice encoding and working multipoint conferencing ing party; decoding mechanism is vital for using among terminals that support not only • Call handling: including transfer the packet-switched technology. The audio but also video and data commu- and termination of calls. purpose of a voice coder (vocoder)-also nications. SIP can also initiate multiparty calls referred to as a codec (coding/decod- In a general H.323 imple- ing)-is to use the analog sig- mentation, three logical enti- nal (human speech) and ties are required: gateways, transform and compress it gatekeepers and multipoint into digital data. A number control units (MCUs). of factors must be taken Terminals, gateways, and Input/Output Digital Signal Packet into account including MCUs are collectively Processing Handling bandwidth usage, silence known as endpoints. It is DSP compression, intellectual Frames possible to establish an Coding property, look-ahead and H.323-enabled network with frame size, resilience to just terminals, which are Buffering and Jitter Buffer loss, layered coding, and H.323 clients. Yet for more Packetization fixed-point vs. floating- than two endpoints, a MCU point digital signal proces- is required. It can be com- TCP/IP sor (DSPs). bined with a terminal, gate- Protocol Stack The bit-rate of available way or gatekeeper. Fig. 1 VoIP processing narrowband vocoders and handling ranges from 1.2 to 64 kbps, SIP Network Interface with an inevitable effect on Session Initiation Protocol, Device the quality of the restituted SIP, defined by the Internet PHY voice. There is ordinarily, Engineering Task Force Layer but not always, a trade-off (IETF), is a signaling protocol between voice quality and for telephone calls over IP. bandwidth used. Using Unlike H.323, however, SIP was using a multipoint control unitMCU or a today’s most efficient vocoder allows designed specifically for the Internet. It fully-meshed interconnection instead of a quasi-toll quality bandwidth usage to be exploits the manageability of IP and multicast. Gateways that connect PSTN as low as 5 kbps. Toll quality is recog- makes developing a telephony applica- parties can also use SIP to set up calls nized as the standard of a long-distance tion relatively simple. SIP is an applica- between them. The protocol is designed PSTN call. As newer and more sophisti- tion-layer control (signaling) protocol for as part of the overall Internet Engineering cated algorithms are developed, this creating, modifying and terminating ses- Task Force (IETF) multimedia data con- bit-rate will decrease. This will permit sions with one or more participants. trol architecture. It incorporates many more samples to be squeezed more effi- SIP can be employed to initiate ses- protocols, for example Resource ciently while minimally sacrificing sions and invite members to sessions Reservation Protocol (RSVP) and Real- quality, if at all. that have been advertised by other Time Transport Protocol (RTP), for prop- The algorithmic delay introduced by means, such as via multicast protocols. er functionality and operation. a coding/decoding sequence is the The signaling protocol transparently frame length plus the look-ahead size. A supports name mapping and redirection H.323 vs. SIP vocoder with a small frame length has a services. This allows the implementa- H.323 and SIP are competing to shorter delay than one with a longer tion of intelligent network telephony obtain dominance of IP telephony sig- frame length, but it introduces a larger subscriber services. These facilities also naling. Currently, there is no clear-cut overhead. Most implementations choose enable personal mobility-the ability of winner. However, the standards appear to send multiple frames per packet.