Voice Coding in 3G Networks

Voice Coding in 3G Networks Tommi Koistinen Signal Processing Systems Nokia Networks [email protected] Abstract signal is already band limited to 300-3400 Hz there is no point of using 24-bit converter. Commonly 13 The 3G networks will introduce several new bits per sample is seen to be a practical value for additions to the basic speech service. The adaptive restricted voice band quantisation. The uniform wideband speech codec will enhance the quantisation however is not the most efficient naturalness of speech and the transcoder free quantisation method. operation will remove unnecessary encodings that otherwise would degrade the speech quality. The The main idea behind the G.711 standard is to use speech processing on network side in 3GPP a logarithmic quantizer which results the same reference architecture model is focused around two signal-to-noise ratio (SNR) with only 8 bits per network elements, namely the Media Gateway sample compared to original 13 bits per sample. (MG), and the Media Resource Functions (MRF) unit. However, as the speech applications utilize the This is achieved by allocating more quantisation network more or less in transparent end-to-end steps to lower amplitude levels that in fact are the mode the characteristics and speech enhancement most important to perceived overall speech quality. capabilities of mobile terminals will finally determine The drawback is that the logarithmic scale will the perceived overall speech quality. result a reduced SNR in the area of high-powered input signals but happily the effect of this is insignificant with speech signals. 1 Introduction As a result we can multiply 8000 samples per second (that came from the sampling theorem) with Voice compression techniques have been utilized in 8 bits per sample (that resulted from the logarithmic digital telecommunication networks for decades quantisation) to get the final bit stream of 64 kbit/s. (G.711 standard [1] dates back to 1972). The G.711 standard presents a coding technique that operates The compression ratio of G.711 standard can be at rate of 64 kbit/s and is widely used in all digital seen to be 1.625:1 (13:8). And all compression is switched telephone networks. But where does the usually good. To transfer more telephone calls with exact rate of 64 kbit/s come ? less transmission equipment means money for the operator and this has resulted that several more The most essential frequency range for the human advanced compression techniques have been speech production system (that is the glottis and developed. the vocal tract) and for the auditory system happens to be between 300-3400 Hz. As the Speech coding techniques in general can be sampling theorem says; to reproduce the original separated to waveform coders (e.g. G.711, G.726, signal after sampling we must use a sampling rate G.722) and to analysis-by-synthesis type of coders that is double the desired frequency band. If (e.g. G.723, G.729, GSM FR). The waveform sampling rate is less the reproduced signal will be coders operate in time domain and they are based distorted by image frequencies of the original on sample-by-sample approach that utilizes the signal. Speech in telecommunication networks is correlation between speech samples. Analysis-by- commonly sampled at 8 kHz to obey this law. synthesis types of coders try to imitate the human speech production system by a simplified model of The number of bits per sample that is used to a source (glottis) and a filter (vocal tract) that quantize the analog signal is a compromise shapes the output speech spectrum on frame basis between the quantisation noise that is introduced (typically frame size of 10-30 ms is used). A short and the quality of the original signal. If the input introduction to details of both basic techniques (and their intermediate versions; hybrids) is presented in 3GPP has scheduled its work to releases of R99, [2] on pages 270-287. R4 and R5 and so on. In the following the basic reference architecture model of each release is The waveform coders are mainly used to compress shortly described emphasizing the voice coding and speech on transmission links, for example, on PCM user plane issues. trunks between two switching centers. The compression ratios range from 2:1 to 4:1 and quite Release 99 high speech quality can be maintained. The basic architecture of R99 compatible network is The analysis-by-synthesis types of coders were shown in Figure 1. The IP packet data from UTRAN mainly introduced together with digital mobile (Universal Terrestrial Radio Access Network, that is networks (GSM Full Rate codec [3] dates back to basically base stations and Radio Network 1988). As frequency band in the radio interface Controllers (RNC)) goes through Iu-PS interface to between a mobile terminal and a base station is 3G SGSN. Voice data goes through Iu-CS interface restricted (and regulated) compression techniques to 3G Mobile Switching Center (MSC) that converts are a meaningful way to save money in that the Adaptive Multirate (AMR) coded speech to interface. A typical full rate channel (16 kbps) G.711 format and vice versa for the PSTN network. utilizes a compression rate of 4:1. A half rate The circuit switched speech is transferred in packet channel (8 kbps) is half of that and it operates at mode (ATM/AAL2) from UTRAN (from Radio compression rate of 8:1. Lossy compression has Network Controller) to 3G MSC but the codec level always some effects on speech quality and more packet mode speech is not yet originated from the compression means usually less quality. The G.711 terminal. standard is common reference point for “real” speech codecs and e.g. GSM Enhanced Full Rate codec [4] almost reaches the quality of G.711. Mul ti media SGSN GGSN IP networks The frame based handling that is natural to Iu-PS analysis-by-synthesis coders is also in line with the MT UTRAN characteristics of packet based transmission HLR techniques (IP, ATM) that are becoming quite Iu-CS common not only in core networks (or backbone) 3G MMSC PSTN/l egacy Transcoder but also as building blocks of radio access Transcoder networks networks. This article will discuss the voice coding and user Figure 1. 3GPP Reference Architecture of Release plane issues particularly in 3G networks. The first 99. chapter presented the basic reasons and means for speech coding in general. The second chapter will Release R4 review the basic 3G network architecture models. The most important 3G network elements that The next step that is taken with release 4 (formerly provide speech related processing are discussed in known as Release 2000) is to separate the chapters four and five. The sixth chapter will signaling and the user data in Iu-CS interface. The discuss the issues related to tandeming of speech signaling goes now to MSC Server and the codecs and finally the seventh chapter will conclude transcoder is separated as a standalone media the presentation. gateway. Figure 2 presents the R4 architecture with clear separation to packet side and to circuit switched side. Media gateway in the PSTN 2 Network Architectures interface converts the AMR coded speech to G.711. Speech goes in packet mode from UTRAN to PSTN interface. This chapter will present the basic 3G network evolution according to 3GPP (Third Generation Partnership Project [5]) reference architectures. and announcement and conferencing services. HSS/C/CSCF Control mechanisms for these functionalities have usually been proprietary. In 3G networks, all of Mul t i media these functions must be offered by the Media SGSN GGSN IP networks Gateway that is controlled by the Media Gateway Iu-PS Controller (MGC) with the standard H.248 control MT UTRAN MGW MGW protocol [7]. Iu-CS PSTN/legacy user data networks An example (and quite full) set of functions that Iu-CS Media Gateway could implement is: control MSC MSC Serververer Server • support for several interfaces (A-interface for Figure 2. 3GPP Reference Architecture of Release 4. 2G and Iu-interface for 3G) and for several transmission protocols (ATM, IP, TDM) The final architecture model, also called as All-IP • support for several codecs including the network [6], moves also speech to full end-to-end Adaptive Multirate (AMR) codec and future packet mode. The IP packets that are generated in coming wideband codecs a mobile terminal go as such either to another IP • electric and acoustic echo cancellation terminal or to MGW from GGSN. The architecture is • announcement services presented in Figure 3. A new network entity is also • DTMF and call progress tone generation and introduced, namely the Multimedia Resource detection Functions (MRF) unit that implements mainly • support for fax/modem/data protocols conferencing services for the IP based calls. • support for Tandem Free Operation (TFO) and Transcoder Free Operation (TrFO) • bad frame handling • HSS/C/CSCF IP protocol handling (RTP/RTCP, encryption, QoS support) Mul ti media SGSN GGSN IP networks Iu-PS Some functions, especially the conferencing service and possible speech enhancement services, are MT UTRAN MRF MGW basically thought to be provided by the Multimedia PSTN/legacy Resource Functions (MRF) unit, but they may networks optionally be added to Media Gateway responsibilities. A lot of signal processing (DSP) power is required Figure 3. All-IP reference architecture. to provide the Media Gateway’s functions. Typically, one DSP chip may process 4-16 Of course, the different phases of 3GPP releases channels, and on one processor card there might may coexist at the same time depending on be 8-32 DSPs which totals 32-512 channels per operators’ needs.

Load more