<<

Understanding in IP Telephony

By Alan Percy, Senior Sales Engineer

Brooktrout Technology, Inc. 410 First Avenue Needham, MA 02494 Phone: (781) 449-4100 Fax: (781) 449-9009 Internet: www.brooktrout.com email: [email protected] Abstract

With an increasing interest in implementing and deploying IP Telephony applications, there is a rising need to understand the cause and effect of latency in the deployed system. This paper addresses these needs by reviewing the effect of latency on human conversations, analyzing the system components that incur the latency, and methods of managing the latency to maintain sufficient .

Introduction When building and deploying an IP Telephony solution, many different technical attributes will affect the quality of the final system. These attributes include the selection of voice coding (vocoder) algorithm, the system latency, link dependability, and others. Assuming the IP Telephony solution will use an industry standard vocoder, latency becomes the most important attribute that designers have control over and can have the greatest effect on the quality of service.

Latency is the time delay incurred in speech by the IP Telephony system. Latency is typically measured in from the moment that the speaker utters a word until the listener actually hears the word. This is termed as “mouth-to-ear” latency or the “one- way” latency that the users would realize when using the system. The round-trip latency is the sum of the two one-way latency figures that make up a telephone call. In the traditional Public Switched Telephone Network, the round-trip latency for domestic calls is virtually always under 150 milliseconds. At these levels, the latency is not noticeable to most people. Many international calls (especially calls carried via satellite) will have round-trip latency figures that can exceed 1 second, which can be very annoying for users.

What is the effect of latency? A telephone conversation between two people depends on the timing of the speech more than most people realize. Most conversations include little utterances by the listener that serve as acknowledgements back to the speaker, confirming that the listener is actively engaged in the conversation. Listen to yourself carefully the next time you are on the phone with someone. Notice the small utterances that you will naturally say even though the other party is doing most of the talking. If you remove these utterances, the speaker will stop to wait for your feedback. If you delay the utterances, they will come to the speaker at the wrong time , resulting in confusion and an interruption to the flow of conversation. Try delaying or stopping these utterances and notice what happens to your telephone conversation.

Brooktrout Technology, Inc. 2 What is considered an acceptable amount of latency? As with most human factors considerations, everyone has his or her own opinion on this issue, but based on feedback Brooktrout has received from early adopters of IP Telephony systems, there is a definite maximum latency that will be tolerated by users. The exact amount of latency that will be tolerated by users is hard to define because users will balance the degradation of added latency against the perceived value added by the system. Wireless telephone services are prime examples of where reduced connection quality will be accepted when balanced against the added value of high mobility.

Assuming that an IP Telephony system is primarily used in a cost-reduction or “toll bypass” application, Brooktrout has developed the following chart, showing a relationship between user perceived link quality vs. the amount of one-way latency. Other applications with higher perceived value will surely accommodate greater latency figures.

Perceived Link Quality

Unaccept- Excellent Good Poor able

0 150 300 450 One-way latency in milliseconds

Figure 1: Quality Perception vs. Latency As you can see, the user perception of the link quality deteriorates as the one-way latency exceeds 150 milliseconds. If the one-way latency exceeds 450 milliseconds, holding a conversation is very difficult and the latency becomes very annoying. If given a choice, most callers would choose to use a telephone line with less than 200 milliseconds of latency.

This gives you a target figure for your IP telephony system. Keep the one-way latency under 200 milliseconds. Even if a caller can get great sound quality and much lower cost with your solution, that caller will typically go elsewhere if the latency is excessive.

What are the causes of latency? Generally, an IP Telephony system is constructed using gateways to interface existing telephone equipment together over a wide-area-network (WAN). This typical deployment is shown in Figure 2, which shows the two end-point telephones connected to a WAN via gateways and routers.

Brooktrout Technology, Inc. 3 Even if a system integrates the Gateway functionality into either the telephones or the router equipment, the same basic processing must take place. For all practical purposes, you can envision that every call using IP telephony requires two gateways, only the location of the gateways change.

Telephone Telephone Gateway Incurred Latency

Gateway Network Gateway Incurred Latency

Router Router WAN

Figure 2: Typical IP Telephony System Latency in an IP Telephony system is introduced by two primary sources. Some of the latency is incurred in the IP Telephony Gateways at either end, and the remainder is incurred by the IP network that connects the two gateways.

Since latency is cumulative, any latency introduced by a component in an IP Telephony system will directly affect the total latency experienced by the user.

Gateway-Incurred Latency Let us take a look inside a gateway and examine the origin of the latency introduced by the gateway. A high-level block diagram of the processing within a gateway is shown in Figure 3. The block diagram shows the high-level functions that occur in both gateway systems. The interface to the end-point telephone system is on the left side and the interface to the network is on the right side.

Following the path of a voice conversation from one telephone to another, each of the functional blocks has an effect on the gateway-incurred latency. Each of these functions and the associated latency contribution is described in more detail in sections that follow below.

Brooktrout Technology, Inc. 4 Digital Network Packet Signal Interface Handling Processing

DSP PCM Frames Coding

Buffering and Buffer Packetization

Network Interface T1, E1, PRI, TCP/IP Loop-Start Protocol Stack

Ethernet Network Interface T1

IP

Figure 3: Gateway Processing

Network Interface Latency The network interface in a gateway includes any hardware or software that connects the Gateway to the telephone system or network. The typical network interface frames and converts the network-side digitized audio PCM data streams into the internal PCM bus for transport to the DSP. There is typically very little latency induced in this , with typical maximums well below 1 .

Digital Signal Processing Latency The digital signal processing that occurs in an IP Telephony gateway is one of the more complex functions of the gateway. This functionality is typically achieved through the use of dedicated (DSP) hardware and associated software algorithms that compress or decompress the speech, detect tones, detect silence, generate tones, generate comfort noise, and cancel echo. This entire collection of processing is called voice coding or “vocoding”.

Brooktrout Technology, Inc. 5 Figure 4: DSP Voice Compression Subsystem

Framing Latency To most efficiently perform vocoding, DSP implementations depend on processing entire frames (or batches) of data at one time. This allows the DSP to use special instructions that result in the high efficiencies needed for high-density IP Telephony applications.

Next Sample

Frame

Figure 5: Framing Process

The side effect of processing data in frames is that none of the data can be processed until the frame is completely full. Since the rate that the digitized audio comes in from the telephone network is typically at a fixed rate of 8,000 samples per second, the size of the frame used to process the data will directly affect the amount of latency. A 100 sample frame would take 12.5 milliseconds to fill, while a 1000 sample frame would take 125 milliseconds to fill. Deciding on the frame sizes is a compromise: the larger the frame, the greater DSP efficiency, but with that comes greater latency.

Fortunately (or unfortunately depending on your point of view), you don’t need to make this decision. Each of the standard voice coding methods uses a standard frame size. The maximum latency incurred by the framing process is directly dependent on the selection of vocoder.

Voice Coder Bandwidth Frame Frame Size in bits/sec Duration In bytes in milliseconds G.711* 64000 15 120 G.723.1 5300-6300 30 24 G.729a 8000 10 10 SX7300 7300 15 14 SX9600 9600 15 18 Table 1: Voice Coder Frame Sizes * While G.711 is technically not a vocoder, we list it here for comparative purposes. G.711 data streams have greater flexibility when specifying frame sizes, the figures listed here are just one example.

Brooktrout Technology, Inc. 6 Processing Time After the collection of an entire frame is completed, the DSP algorithms must be run on the newly created frame. The time required to complete the processing varies considerably, but never exceeds the frame collection time. (If it did, the DSP would never complete processing one frame before the next frame arrived).

Since most high-density IP Telephony gateway systems will process multiple channels of voice on each DSP, calculating the latency induced by processing the coding or decoding of the speech is rather complex. In this situation, each DSP will process some number of frames from different channels, one after the other in a sequential process. This means that the first channel will be completed much earlier than the later channels. In a fully utilized DSP subsystem, the processing for the last channel would occur just before the data for the first channel begins to arrive.

As a result, you can’t use the number of milliseconds used by the DSP to vocode any single channel in calculating the latency added by processing. Instead the latency incurred due to processing is typically specified as the frame size in milliseconds. This means the total latency from framing and processing can be no more than twice the frame size.

Packet Handling Latency Between the DSP processing and passing the data to the WAN, there are a number of packet handling processes that will occur that will affect the system latency.

Buffering After the voice coding processes, many IP Telephony systems then further buffer the resulting compressed voice data frames before passing them to the network software. This additional buffering many times is done to reduce the number of times the DSP needs to communicate to the main CPU in the gateway. In other situations it is done to make the result of coding algorithms fit into one common frame duration (not length).

For example: If a system is running with G.723.1 on one channel and G.729a on another, the frame sizes are different (see Table 1). A system may, by design, collect three G.729a frames into one buffer for every G.723.1 frame. This would allow the system to transfer one buffer every 30 milliseconds, irrespective of the coding algorithm.

Packetization As the coded voice is being prepared for transport over the WAN, it needs to be assembled into packets. This process is typically done by the TCP/IP protocol stack, using UDP (User Datagram Protocol) and RTP (Real Time Protocol). The selection of these protocols improves timely delivery of the voice data and eliminates the overhead of transmission acknowledgements and retries.

Looking inside a typical IP telephony data packet, each packet starts with an IP, UDP, and RTP header that totals 40 bytes. The header contains the source and destination IP

Brooktrout Technology, Inc. 7 addresses, the IP port number, packet sequence number and other protocol information needed to properly transport the data.

After the IP header, one or more frames of coded voice data would follow. The decision of whether to pack more than one frame of data into a single packet is an important consideration for every IP Telephony system. If a system was using the G.723.1 coder (which produces 24 byte frames every 30 milliseconds), each packet would have 40 bytes of header and 24 bytes of data. That would make the header 167% of the voice data payload!

IP Header UDP Header RTP Header Voice Data 20 Bytes 8 Bytes 12 Bytes

40 Bytes

Figure 6: Anatomy of an IP Telephony Packet The most common way to reduce the inefficiency of the IP packet overhead is to put more than one coded voice frame per IP packet. If two frames are passed per packet, the overhead figure drops to 83%, but with the side effect of adding yet another frame period of latency. This trade-off is another compromise that needs to be considered when deploying an IP Telephony system.

An interesting trick that can reduce the overhead, but not increase the latency in systems is to let voice frames from other channels “piggyback” in the same packet. When voice from another channel in the originating gateway is going to the same destination gateway, the data can be combined into a single packet. This trick is not supported by the standard H.323 protocol, but can be used by proprietary solutions to improve efficiency.

Jitter Buffer Latency Because IP networks cannot guarantee the delivery time of data packets (or their order, for that matter), the data will arrive at a very inconsistent rate. The variability in the arrival rate of data is called “jitter”. During the voice decoding process (data traveling from the network to a telephone), all systems need to compensate for jitter in the data arriving from the network. To compensate for jitter, most systems buffer at least one packet of data from the network before passing it to the DSP.

Having these “jitter buffers” can significantly reduce the occurrence of data starvation and ensure the timing is correct when sending data to the DSP. Without jitter buffers, there is a very good chance that gaps in the data would be heard in the resulting speech.

Brooktrout Technology, Inc. 8 The side effect of jitter buffers is (you guessed it) more latency. The larger the jitter buffers, the more tolerant the system is of jitter in the data from the network, but the additional buffering causes more latency.

Network-Incurred Latency Now that the Gateway has the voice data compressed and packetized, the data is passed to the Wide Area Network for transport to the far-end gateway. Passing data over the WAN introduces yet another set of potential latency additions that will affect the total latency.

Media Access Latency For each point where data is passed to or from physical media, there is a Media Access Delay added to the total latency. Since there are many different physical media used to inter-connect gateways, routers, and other networking equipment, these delays need to be considered.

If a connection to the WAN were to use low-speed serial connections like RS-232 or dial- up modems, the transfer time of the data can add latency to transmission at far greater amounts than higher speed media.

Example:

If a WAN link were using a 28,800 bits per second connection, each byte transferred requires .35 milliseconds. This would yield a total transfer time for a 100 byte packet of 35 milliseconds.

If instead of an RS-232 connection, the WAN interface were using a 1.54 megabits per second dedicated T-1 connection, the transmission latency for the same 100 byte packet drops to .5 milliseconds.

If instead of a RS-232 or T1 connection, the WAN interface were using a 100 megabits per second Ethernet connection, the transmission latency for the same 100 byte packet drops even further to .008 milliseconds. Although the transfer time for Ethernet is very fast, remember that Ethernet is a Carrier Sense Multiple Access (CSMA) media, which means that every computer on the same network has to share the same physical layer. Any collisions or congestion results in increased latency.

Routing Latency Since IP is a routed protocol, all packets simply have a source and destination IP address. This simple design requires that routers examine each packet and, depending on the destination address, direct the packet via the proper route. The queuing logic used by most routers was designed before the concept of IP Telephony existed and therefore has certain weaknesses with respect to the real-time nature of IP Telephony. Many existing routers use Best-Effort routing, which is far from ideal for latency-sensitive voice traffic.

Brooktrout Technology, Inc. 9 The key missing piece is a priority attribute, the absence of which results in the router delaying all data during congestion situations, irrespective of the application.

The Resource Reservation Protocol (RSVP) has been defined by the IETF as a means of creating and managing resouces within routers and gateways. RSVP allows a gateway-to- gateway connection to establish a guaranteed bandwidth commitment on the intermediate network equipment, which would dramatically reduce the variability in packet delivery and improve the quality of the service. Since RSVP is a relatively recent development, the vast majority of existing equipment that is deployed in the public network cannot yet support RSVP. At this point in time, RSVP can only be used in closed systems where the network administrator has control over the equipment from end-to-end. Further developments by the WAN carriers in the future may change this situation.

Firewalls and Proxy Servers Many networks use firewalls or proxy servers to provide security between the corporate LAN and the Internet. Since both of these security devices must examine every incoming and outgoing IP packet, they can incur a sizeable amount of latency, so their use is almost always avoided in IP Telephony applications.

Packet Filter features that are built into routers are typically simple in design compared to stand-alone firewalls or proxy servers and therefore can typically offer some network security without significant added latency. Stand-alone firewalls or proxy servers must receive, decode, examine, validate, encode, and send every packet. All of which comes with increased latency. Ask your router vendor about packet filter capabilities and specifically inquire about the latency incurred by the router.

Proxy servers provide even greater network security, but at an even greater cost in network latency. It is not uncommon for a busy proxy server to incur over 500 milliseconds of latency. This is not a problem to the web-browsing applications for which proxy servers were designed, but it is clearly unacceptable for real-time media.

How can I manage latency? Managing the latency in a deployed IP Telephony system is key to the success of the resulting service. Some key steps that can be taken to reduce and manage the latency are:

· Know the sources of latency in your system (do a latency budget). Having a latency budget helps you set a target and identify areas that can be improved. Without a complete understanding of the various components that contribute to the total latency, you won’t have a clear picture of where latency can be trimmed. · Use routing equipment that supports prioritization of selected ports or provides RSVP to guarantee a certain level of packet . Carefully selecting and managing your routing equipment is key to the success of your deployed system. · Ensure that your network has sufficient bandwidth to avoid congestion. An important tool in managing bandwidth and congestion is the selection of proper vocoders. Use IP telephony platforms that allow dynamic switching of vocoders on a call-by-call

Brooktrout Technology, Inc. 10 basis or even within a call. This will allow the network to respond to available bandwidth conditions in real-time. · Stay away from equipment and media that you do not have control over (the public Internet). Having the ability to set priorities, meter link throughput, and adjust priorities are all required in maintaining a minimum of latency in an IP Telephony system. · If you use a network carrier, ask for a guaranteed route. This will eliminate many time-of-day variables in the system. · Reduce packet overhead. If feasible, use piggybacking in your design to send multiple channels of voice data to the same destination. Efficient use of piggybacking can reduce total network traffic by over 50%, leaving more room for growth.

A Sample Latency Budget Just like in your personal finances, create a latency budget with a target figure in mind. In the example below, the one-way latency is calculated.

Source Latency (in milliseconds) Network Interface 1 (1.54 Mbps T1) Framing 30 (G.723.1) Processing Time 10 (worst case) Buffering 0 (no additional buffering) Packetization 30 (two frames per packet) Media Access Delay 10 (5 – 2msec hops) Routing 50 (router dependent) Jitter Buffering 30 msec (one buffer) Total One-Way Latency 161 msec Table 2: Example Latency Budget Use the latency budget to target areas that can be reduced or critical bottlenecks.

Brooktrout Technology, Inc. 11 About Brooktrout Technology As you can see, building an effective and high-quality IP Telephony solution requires that each component in the system be carefully selected to avoid unnecessary and irritating latency in voice calls. Brooktrout Technology’s TR2001 helps ensure that your final solution minimizes latency by closely coupling some of the vocoding and network access portions of an IP Telephony solution.

Some of the advantages of the Brooktrout TR2001 are:

· Industry standard vocoding algorithms that are optimized to minimize processing latency · Efficient schemes that reduce the accumulative latency effect of buffering. · Efficient driver for reduced host-processor load · Real-time embedded TCP/IP protocol · Single PCI slot solution that includes all network and telephony interfaces · A power-efficient board that runs cooler than other solutions. · Support for dynamic vocoder selection · APIs for precise control of telephony, vocoders and H.323 protocol stack

In addition to providing high-performance telephony platforms, Brooktrout also works closely with development partners to help them optimize and refine their systems.

Brooktrout Technology, Inc. 12