Understanding Latency in IP Telephony

Understanding Latency in IP Telephony By Alan Percy, Senior Sales Engineer Brooktrout Technology, Inc. 410 First Avenue Needham, MA 02494 Phone: (781) 449-4100 Fax: (781) 449-9009 Internet: www.brooktrout.com email: [email protected] Abstract With an increasing interest in implementing and deploying IP Telephony applications, there is a rising need to understand the cause and effect of latency in the deployed system. This paper addresses these needs by reviewing the effect of latency on human conversations, analyzing the system components that incur the latency, and methods of managing the latency to maintain sufficient quality of service. Introduction When building and deploying an IP Telephony solution, many different technical attributes will affect the quality of the final system. These attributes include the selection of voice coding (vocoder) algorithm, the system latency, link dependability, and others. Assuming the IP Telephony solution will use an industry standard vocoder, latency becomes the most important attribute that designers have control over and can have the greatest effect on the quality of service. Latency is the time delay incurred in speech by the IP Telephony system. Latency is typically measured in milliseconds from the moment that the speaker utters a word until the listener actually hears the word. This is termed as “mouth-to-ear” latency or the “one- way” latency that the users would realize when using the system. The round-trip latency is the sum of the two one-way latency figures that make up a telephone call. In the traditional Public Switched Telephone Network, the round-trip latency for domestic calls is virtually always under 150 milliseconds. At these levels, the latency is not noticeable to most people. Many international calls (especially calls carried via satellite) will have round-trip latency figures that can exceed 1 second, which can be very annoying for users. What is the effect of latency? A telephone conversation between two people depends on the timing of the speech more than most people realize. Most conversations include little utterances by the listener that serve as acknowledgements back to the speaker, confirming that the listener is actively engaged in the conversation. Listen to yourself carefully the next time you are on the phone with someone. Notice the small utterances that you will naturally say even though the other party is doing most of the talking. If you remove these utterances, the speaker will stop to wait for your feedback. If you delay the utterances, they will come to the speaker at the wrong time , resulting in confusion and an interruption to the flow of conversation. Try delaying or stopping these utterances and notice what happens to your telephone conversation. Brooktrout Technology, Inc. 2 What is considered an acceptable amount of latency? As with most human factors considerations, everyone has his or her own opinion on this issue, but based on feedback Brooktrout has received from early adopters of IP Telephony systems, there is a definite maximum latency that will be tolerated by users. The exact amount of latency that will be tolerated by users is hard to define because users will balance the degradation of added latency against the perceived value added by the system. Wireless telephone services are prime examples of where reduced connection quality will be accepted when balanced against the added value of high mobility. Assuming that an IP Telephony system is primarily used in a cost-reduction or “toll bypass” application, Brooktrout has developed the following chart, showing a relationship between user perceived link quality vs. the amount of one-way latency. Other applications with higher perceived value will surely accommodate greater latency figures. Perceived Link Quality Unaccept- Excellent Good Poor able 0 150 300 450 One-way latency in milliseconds Figure 1: Quality Perception vs. Latency As you can see, the user perception of the link quality deteriorates as the one-way latency exceeds 150 milliseconds. If the one-way latency exceeds 450 milliseconds, holding a conversation is very difficult and the latency becomes very annoying. If given a choice, most callers would choose to use a telephone line with less than 200 milliseconds of latency. This gives you a target figure for your IP telephony system. Keep the one-way latency under 200 milliseconds. Even if a caller can get great sound quality and much lower cost with your solution, that caller will typically go elsewhere if the latency is excessive. What are the causes of latency? Generally, an IP Telephony system is constructed using gateways to interface existing telephone equipment together over a wide-area-network (WAN). This typical deployment is shown in Figure 2, which shows the two end-point telephones connected to a WAN via gateways and routers. Brooktrout Technology, Inc. 3 Even if a system integrates the Gateway functionality into either the telephones or the router equipment, the same basic processing must take place. For all practical purposes, you can envision that every call using IP telephony requires two gateways, only the location of the gateways change. Telephone Telephone Gateway Incurred Latency Gateway Network Gateway Incurred Latency Router Router WAN Figure 2: Typical IP Telephony System Latency in an IP Telephony system is introduced by two primary sources. Some of the latency is incurred in the IP Telephony Gateways at either end, and the remainder is incurred by the IP network that connects the two gateways. Since latency is cumulative, any latency introduced by a component in an IP Telephony system will directly affect the total latency experienced by the user. Gateway-Incurred Latency Let us take a look inside a gateway and examine the origin of the latency introduced by the gateway. A high-level block diagram of the processing within a gateway is shown in Figure 3. The block diagram shows the high-level functions that occur in both gateway systems. The interface to the end-point telephone system is on the left side and the interface to the network is on the right side. Following the path of a voice conversation from one telephone to another, each of the functional blocks has an effect on the gateway-incurred latency. Each of these functions and the associated latency contribution is described in more detail in sections that follow below. Brooktrout Technology, Inc. 4 Digital Network Packet Signal Interface Handling Processing DSP PCM Frames Coding Buffering and Jitter Buffer Packetization Network Interface T1, E1, PRI, TCP/IP Loop-Start Protocol Stack Ethernet Network Interface T1 IP Figure 3: Gateway Processing Network Interface Latency The network interface in a gateway includes any hardware or software that connects the Gateway to the telephone system or network. The typical network interface frames and converts the network-side digitized audio PCM data streams into the internal PCM bus for transport to the DSP. There is typically very little latency induced in this process, with typical maximums well below 1 millisecond. Digital Signal Processing Latency The digital signal processing that occurs in an IP Telephony gateway is one of the more complex functions of the gateway. This functionality is typically achieved through the use of dedicated digital signal processor (DSP) hardware and associated software algorithms that compress or decompress the speech, detect tones, detect silence, generate tones, generate comfort noise, and cancel echo. This entire collection of processing is called voice coding or “vocoding”. Brooktrout Technology, Inc. 5 Figure 4: DSP Voice Compression Subsystem Framing Latency To most efficiently perform vocoding, DSP implementations depend on processing entire frames (or batches) of data at one time. This allows the DSP to use special instructions that result in the high efficiencies needed for high-density IP Telephony applications. Next Sample Frame Figure 5: Framing Process The side effect of processing data in frames is that none of the data can be processed until the frame is completely full. Since the rate that the digitized audio comes in from the telephone network is typically at a fixed rate of 8,000 samples per second, the size of the frame used to process the data will directly affect the amount of latency. A 100 sample frame would take 12.5 milliseconds to fill, while a 1000 sample frame would take 125 milliseconds to fill. Deciding on the frame sizes is a compromise: the larger the frame, the greater DSP efficiency, but with that comes greater latency. Fortunately (or unfortunately depending on your point of view), you don’t need to make this decision. Each of the standard voice coding methods uses a standard frame size. The maximum latency incurred by the framing process is directly dependent on the selection of vocoder. Voice Coder Bandwidth Frame Frame Size in bits/sec Duration In bytes in milliseconds G.711* 64000 15 120 G.723.1 5300-6300 30 24 G.729a 8000 10 10 SX7300 7300 15 14 SX9600 9600 15 18 Table 1: Voice Coder Frame Sizes * While G.711 is technically not a vocoder, we list it here for comparative purposes. G.711 data streams have greater flexibility when specifying frame sizes, the figures listed here are just one example. Brooktrout Technology, Inc. 6 Processing Time After the collection of an entire frame is completed, the DSP algorithms must be run on the newly created frame. The time required to complete the processing varies considerably, but never exceeds the frame collection time. (If it did, the DSP would never complete processing one frame before the next frame arrived). Since most high-density IP Telephony gateway systems will process multiple channels of voice on each DSP, calculating the latency induced by processing the coding or decoding of the speech is rather complex. In this situation, each DSP will process some number of frames from different channels, one after the other in a sequential process.

Load more