
Chapter 8: Speech Coding School of Information Science and Engineering, SDU l The performance of speech coders determines the quality of the recovered speech and the capacity of the system. l In mobile communication systems, bandwidth is a precious commodity, and service providers are continuously met with the challenge of accommodating more users within a limited allocated bandwidth. l The lower the bit rate at which the coder can deliver toll quality speech, the more speech channels can be compressed within a given bandwidth. For this reason, manufacturers and service providers are continuously in search of speech coders that will provide toll quality speech at lower bit rates. 8.1 Introduction l The goal of all speech coding systems: to transmit speech with the highest possible quality using the least possible channel capacity. This has to be accomplished while maintaining certain required levels of complexity of implementation and communication delay. l In general, there is a positive correlation between coder bit-rate efficiency and the algorithmic complexity required to achieve it. A balance needs to be struck between these conflictingfactors. Two categories of coders: (Based on the means by which they achieve compression) l Waveform Coders l Vocoders. (1)Waveform coders: reproduce the time waveform of the speech signal as closely as possible. l Source independent l Code equally well a variety of signals. l Robust for a wide range of speech characteristics and for noisy environments. l With minimal complexity l Achieves only moderate economy in transmission bit rate. Examples: 1. Pulse code modulation (PCM) 2. Differential pulse code modulation (DPCM) 3. Adaptive differential pulse code modulation (ADPCM) 4. Delta modulation (DM) 5. Continuously variable slope delta modulation (CVSDM) 6. Adaptive predictive coding (APC). 8.2 Characteristics of Speech Signals l Speech waveforms have a number of useful properties that can be exploited when designing efficient coders. l Nonuniformprobability distribution of speech amplitude l Nonzero autocorrelation between successive speech samples l Nonflatnature of the speech spectra l Existence of voiced and unvoiced segments in speech l Quasiperiodicityof voiced speech signals l The most basic property is bandlimited. Time discretizedpossible at a finite rate and reconstructed completely from its samples. 1) Probability Density Function (pdf) l Characteristics of speech signal pdf: l very high probability of near-zero amplitudes l Significant probability of very high amplitudes l Monotonically decreasing function of amplitudes between these extremes. Exact distribution depends on the input bandwidth and recording conditions. Nonuniformquantizers, including the vector quantizers. attempt to match the distribution of quantization levels to that of the pdfof the input speech signal. l An approximation to the long-term pdf of telephone quality speech signals: l Two-sided exponential (Laplacian) function equation l There is a distinct peak at zero due to the existence of frequent pauses and low level speech segments. l Short-time pdfsof speech segments are also single-peaked functions and are usually approximated as a Gaussian distribution. 2) Autocorrelation Function (ACF) l There exists much correlation between adjacent samples of a segment of speech. allow easily predicting. All differential and predictive coding schemes are based on this l Definition: l ACF gives a quantitative measure of the closeness between samples. l Typical signals have an adjacent sample correlation, C(1) , as high as 0.85 to 0.9. 3) Power Spectral Density Function (PSD) l PSD is nonflat. High frequency components contribute very little to the total speech energy. l Can be used to obtain significant compression in frequency domain. l Coding speech separately in different frequency bands can lead to significant coding gain. Though high frequency is insignificant in energy, they are very important carriers of speech information, and hence need to be adequately represented. l A qualitative measure of the theoretical maximum coding gain that can be obtained by exploiting the nonflatcharacteristics of the PDF, is given by the spectral flatness measure (SFM). l SFM is defined as the ratio of the arithmetic to geometric mean of the samples of the PSD taken at uniform intervals in frequency. 8.3 Quantization Techniques (1) Uniform Quantization l Quantization is the process of mapping a continuous range of amplitudes of a signal into a finite set of discrete amplitudes. l The operation is irreversible. l Introduces distortion. determines to a great extent the overall distortion l One of the most frequently used measures of distortion: MSE (mean square error) l The distortion introduced by a quantizeris often modeled as additive quantization noise l The performance of a quantizeris measured as the output signal-to-quantization noise ratio (SQNR). l The SQNR of a PCM encoder: where a = 4.77 for peak SQNR and a = 0 for the average SQNR. with one additional bit, the output SQNR improves by 6 dB. (2) NonuniformQuantization l Distribute the quantization levels in accordance with the pdf of the input waveform. l Mean square distortion: l To design an optimal nonuniformquantizer, we need to determine the quantization levels which will minimize the distortion of a signal with a given pdf. l The Lloyd-Max algorithm provides a method to determine the optimum quantization levels by iteratively changing the quantization levels in manner that minimizes the mean square distortion. l A simple and robust implementation: logarithmic quantizer. l Different compandingtechniques: l m-law (U.S) l A-law (Europe) (3) Adaptive Quantization l There is a distinction between the long term and short term pdf of speech waveforms. because of the nonstationaritycharacteristic. usually the dynamic range is 40 dB or more. l Time varying quantization technique is useful. varies the step size in accordance to the input signal power. (4) Vector Quantization l Shannon's Rate-Distortion Theorem: There exists a mapping from a source waveform to output code words such that for a given distortion D, R(D) bits per sample are sufficient to reconstruct the waveform with an average distortion arbitrarily close to D. l R(D) is called the rate-distortion function, represents a fundamental limit on the achievable rate for a given distortion l The actual rate R has to be greater than R(D). l Shannon predicted that better performance can be achieved by coding many samples at a time instead of one sample at a time. l Vector quantization (VQ) a delayed-decision coding technique which maps a group of input samples (typically a speech frame), called a vector, to a code book index. l A code book is set up consisting of a finite set of vectors covering the entire anticipated range of values. l In each quantizing interval, the code-book is searched and the index of the entry that gives the best match to the input signal frame isselected. l VQ can yield better performance even when the samples are independent of one another. l The number of samples in a block (vector) is called the dimension L of the vector quantizer. l The rate R of the vector quantizeris defined as: n is the size of the VQ code book. R may take fractional values. l Quantization vectors are used instead of quantization levels l Distortion is measured as the squared Euclidean distance between the quantization vector and the input vector. l VQ is most efficient at very low bit rates (R = 0.5 bits/sample or less). l But VQ is a computationally intensive l Not often used to code speech signals directly. l Usually used to quantize the speech analysis parameters, such as l Linear prediction coefficients l spectral coefficients l filter bank energies, etc. 8.4 Adaptive Differential Pulse Code Modulation (ADPCM) l Amore efficient coding scheme l Exploits the redundancies present in the speech signal between adjacent samples. l The difference between adjacent samplesis transmitted. l Allows speech to be encoded at a bit rate of 32kbps. The CCITT standard G.721 ADPCM algorithm for 32 kbps speech coding is used in cordless telephone systems like CT2 and DECT. l Signal prediction techniques is used. 8.5 Frequency Domain Coding of Speech l Speech signal is divided into a set of frequency components which are quantized and encoded separately. l Different frequency bands can be preferentiallyencoded according to some perceptual criteria for each band. l The quantization noise can be contained within bands and prevented from creating harmonic distortions outside the band. Advantage: The number of bits used to encode each frequency component can be dynamically varied and shared among the different bands. (1) Sub-band Coding l The human ear does not detect the quantization distortion at all frequenciesequally well. l It is therefore possible to achieve substantial improvement in quality by coding the signal in narrower bands. l In a sub-band coder, speech istypically divided into four or eight sub-bands by a bank of filters, and each subbandis sampled at a bandpass Nyquistrate and encoded with different accuracy in accordance to a perceptualcriteria. (1) Sub-band Coding Ways of Band-splitting: l Divide the entire speech band into unequal sub-bands that contribute equally tothe articulation index(清晰度指数). method suggested by Crochiere: Sub-band Number Frequency Range 1 200-700 Hz 2 700-1310 Hz 3 1310-2020 Hz 4 2020-3200 Hz l Divide band into equal sub-bands and assign to each sub-band number of bits proportional to perceptual significance. octave(音阶) band splitting is often employed instead of equal splitting. As the human ear has an exponential decreasing sensitivity to frequency, this kind of splitting is more in tunewith the perception process. (1) Sub-band Coding Method for processing the sub-band signals: make a low pass translation of the sub-band signal to zero frequency by a modulation process equivalent to single sideband modulation.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages50 Page
-
File Size-