Author Guidelines for 8
Total Page:16
File Type:pdf, Size:1020Kb
ADVANCES IN SOURCE-CONTROLLED VARIABLE BIT RATE WIDEBAND SPEECH CODING Milan Jelinek†, Redwan Salami‡, Sassan Ahmadi*, Bruno Bessette†, Philippe Gournay†, Claude Laflamme†, Roch Lefebvre† †University of Sherbrooke, Sherbrooke, Quebec, Canada ‡VoiceAge Corporation, Montreal, Quebec, Canada * Nokia Inc., San Diego, CA, USA ABSTRACT (HR), 2.7 (QR), and 1.0 (ER) kbit/s, corresponding to channel bit rates of 14.4, 7.2, 3.6, and 1.8 kbit/s inclusive of error This paper presents novel techniques for source controlled protection bits. variable rate wideband speech coding. These techniques have In January 2002, a process started in 3GPP2 for been used in the variable-rate multimode wideband (VMR-WB) standardizing a variable-rate multi-mode codec for wideband speech codec recently selected by 3GPP2 for wideband (WB) speech services compliant with cdma2000 Rate-Set II speech telephony, streaming, and multimedia messaging services requirements [1]. The specifications required three modes of in the cdma2000 third generation wireless system. The coding operation: a Premium mode (mode 0) with ABR equivalent to algorithm contains several innovations that enable very good that of QCELP13 (IS-733) [13], an Economy mode (mode 2) performance at average bit rates as low as 4.0 kbit/s in typical with ABR equivalent to that of EVRC (IS-127) [12], and a conversational operating conditions. These innovations include: Standard mode (mode 1) with an ABR at the average of EVRC Efficient noise suppression algorithm, signal classification and and QCELP13 ABRs. The quality requirements were such that rate selection algorithm that enables high quality operation at the Premium mode is at least equivalent to AMR-WB [2,3,4] at low average bit rates, efficient post-processing techniques 14.85 kbit/s, the Standard mode to AMR-WB at 12.65 kbit/s, tailored for wideband signals, and novel frame erasure and the Economy mode to AMR-WB at 8.85 kbit/s. An optional concealment techniques including supplementary information for mode Interoperable with AMR-WB at 12.65 kbit/s was reconstruction of lost onsets and improving decoder recommended [5]. Following a competitive selection process convergence. Further, the coder utilizes efficient coding types including five candidate codecs in early 2003, the VMR-WB optimized for different classes of speech signal including a codec from Nokia/VoiceAge was selected as a clear winner. generic coding type based on AMR-WB for transients and This paper presents the description of the VMR-WB codec onsets, voiced coding type optimized for stable voiced signals and details some of the codec novel features. The codec has and utilizing novel signal modification procedure resulting in three modes of operation as described above with an additional good wideband quality at 6.2 kbit/s, unvoiced coding types mode that is interoperable with AMR-WB. The paper is optimized for unvoiced segments, and efficient comfort noise organized as follows. Section 2 describes the VMR-WB codec generation coding. The article describes in detail some of the design and structure. Section 3 gives the details of the source codec novel features. and channel controlled operation of the codec. Different coding types are summarized in Section 4. Section 5 describes the 1. INTRODUCTION VMR-WB decoder. Finally, Section 5 summarizes the codec The variable-bit-rate (VBR) speech coding concept is crucial for performance. optimal operation of the CDMA systems and interference control. In source-controlled VBR coding, the codec operates at 2. OVERVEIW OF VMR-WB CODEC DESIGN AND several bit rates, and a rate selection mechanism is used to STRUCTURE determine the bit rate suitable for encoding each speech frame 2.1 VMR-WB Features based on the characteristics of the speech signal (e.g. voiced, VMR-WB is a variable-rate multimode speech codec designed unvoiced, transient, background noise). The goal is to attain the for encoding/decoding wideband speech (50-7000 Hz) sampled best speech quality at a given average bit rate (ABR). The codec at 16 kHz using 20 ms frames. The operation of VMR-WB is can operate in different modes by tuning the rate selection controlled by instantaneous speech signal characteristics (i.e., algorithm to obtain different ABRs where the codec performance source-controlled) and by traffic condition of the network (i.e., is improved by increasing ABR. The mode of operation is network-controlled mode switching). imposed by the system depending on network capacity and the VMR-WB is based on 3GPP/AMR-WB (ITU-T/G.722.2) desired quality of service. This enables the codec with a codec [2,3,4], recently standardized by 3GPP and ITU-T for 3G mechanism of trade-off between speech quality and system high-quality multimedia applications, and is interoperable with capacity. In CDMA systems (e.g. cdmaOne and cdma2000), 4 bit this codec in one of its operational modes. VMR-WB is fully rates are used and they are referred to as full-rate (FR), half-rate compliant with cdma2000 rate-set II (Radio Configuration 4 (HR), quarter-rate (QR), and eighth-rate (ER). In this system two (reverse link)/5 (forward link)) [1]. rate sets are supported referred to as Rate-Set I and Rate-Set II. Depending on the traffic conditions, one of 4 operational In Rate Set II, a variable-rate codec with rate selection modes is selected. The transition from one mode to another is mechanism operates at source-coding bit rates of 13.3 (FR), 6.2 seamless, memoryless (i.e. no transition period to achieve the ABR of the new mode) and there is no need to transmit the mode some other logic would be needed to resume the noise information to the decoder. Modes 0, 1, and 2 are specific to adaptation. CDMA systems (i.e., TIA/EIA/IS-95, cdma2000) with mode 0 Then signal classification is performed on frame basis to providing the highest quality and mode 2 the lowest ABR. Mode detect unvoiced frames, voiced frames and frames with low 3 is interoperable with AMR-WB at 12.65 kbit/s. Its ABR is perceptual significance. Finally, the frame is divided into 4 slightly higher than the ABR of mode 0. The active speech subframes and the signal is encoded on a subframe basis ABRs are approximately 12.8, 10.5, 8.1 and 13.3 kbit/s for according to the selected coding type to find the adaptive modes 0, 1, 2 and 3 respectively which translates into 5.7, 4.8, codebook and fixed codebook indices and gains. Supplementary 3.8 and 6.1 kbit/s for a typical conversational speech of about information is estimated and send in FR coding type to enhance 40% of speech activity. Additionally, the system can also force the codec robustness to frame erasures (FER). maximum and minimum rates. Speech While its design has been focused on WB input/WB output signals, VMR-WB accepts narrow-band (NB) input signals Preprocessing sampled at 8 kHz and it can also synthesize NB speech. The codec look-ahead varies depending on the sampling frequency of the input and the output being 13.75 ms for WB input, WB output and 15.0625 ms for NB input, NB output. - Spectral analysis Finally, VMR-WB is equipped with an integrated novel - VAD - Noise reduction noise reduction (NR) algorithm with adjustable maximum allowed reduction and with a sophisticated frame error concealment technique making it very robust for background - LP analysis noise and channel impairments. - Open loop pitch analysis 2.2 VMR-WB Encoder Overview Noise estimation The flow diagram of the VMR-WB encoder is shown in - Rate selection Figure 1. - Signal modification Subframe loop The input signal is preprocessed as in AMR-WB (down- sampling, high-pass filtering, and pre-emphasis). Spectral analysis is then performed twice per frame on the preprocessed signal for use in noise reduction and voice activity detection FR & Voiced HR Unvoiced HR Generic HR (VAD). The signal energy is computed for each critical band [6]. -Compute ACB exc. Unvoiced QR -Compute ACB exc. The Spectral analysis is done using a sine window. The sine CNG and gain -random CB exc. and gain encoding - Compute FCB exc. and gain window (or square root of Hanning window) has been chosen - Compute FCB exc. - Gain quantization - Gain quantization because it is well suited for overlap-add methods and because - Gain quantization the VMR-WB noise reduction is based on spectral subtraction technique using an overlap-add with 50% overlap. The position of the sine window is not centred but offset to take advantage of FER protection the whole available look-ahead. Linear Prediction (LP) analysis and open-loop (OL) pitch Output bitstream analysis are performed on a frame basis on the denoised signal. The LP filter parameters are estimated similar to AMR-WB Figure 1: Flow diagram of the VMR-WB encoder. specifications. A new OL pitch-tracker is used in VMR-WB to improve the smoothness of the pitch contour by considering adjacent values. Three open loop pitch lags are jointly estimated 3. SOURCE AND CHANNEL CONTROLED OPERATION every frame, two for every half-frame and one for the look- The rate determination is done implicitly during the selection of ahead. a particular encoding technique to encode the current frame. The Background noise estimation is then updated during rate selection is dependent on the mode of operation and the inactive speech frames to be used in the VAD and the NR class of input speech. modules of the next frame. The parameters used for the noise Figure 2 shows a simplified high-level description of the update decision are: pitch stability, signal non-stationarity, signal classification procedure.