Vocoders
1 The Channel Vocoder (analyzer): The channel vocoder employs a bank of bandpass filters, Each having a bandwidth between 100 Hz and 300 Hz. Typically, 16-20 linear phase FIR filter are used. The output of each filter is rectified and lowpass filtered. The bandwidth of the lowpass filter is selected to match the time variations in the characteristics of the vocal tract. For measurement of the spectral magnitudes, a voicing detector and a pitch estimator are included in the speech analysis.
2 The Channel Vocoder (analyzer block diagram):
Bandpass Lowpass A/D Rectifier Filter Filter Converter
Bandpass Lowpass A/D Rectifier Encoder Filter Filter Converter To S(n) Channel
Voicing detector
Pitch detector
3 The Channel Vocoder (synthesizer):
16-20 linear-phase FIR filters Covering 0-4 kHz Each having a bandwidth between 100- 300 Hz 20-ms frames, or 50 Hz changing of spectral magnitude LPF bandwidth: 20-25 Hz Sampling rate of the output of the filters: 50 Hz
4 The Channel Vocoder (synthesizer):
Bit rate: 1 bit for voicing detector 6 bits for pitch period For 16 channels, each coded with 3-4 bits, updated 50 times per second Then the total bit rate is 2400-3200 bps Further reductions to 1200 bps can be achieved by exploiting frequency correlations of the spectrum magnitude
5 The Channel Vocoder (synthesizer):
At the receiver the signal samples are passed through D/A converters.
The outputs of the D/As are multiplied by the voiced or unvoiced signal sources.
The resulting signal are passed through bandpass filters.
The outputs of the bandpass filters are summed to form the synthesized speech signal.
6 The Channel Vocoder (synthesizer block diagram):
D/A Bandpass Converter Filter
Output ∑ speech D/A Bandpass
Decoder Converter Filter From Channel
Voicing Information Switch
Random Pitch Pulse Noise period generator generator
7 The Phase Vocoder : The phase vocoder is similar to the channel vocoder.
However, instead of estimating the pitch, the phase vocoder estimates the phase derivative at the output of each filter.
By coding and transmitting the phase derivative, this vocoder destroys the phase information .
8 The Phase Vocoder (analyzer block diagram, kth channel)
Short-term magnitude
ak n Lowpasscosk n sink n Decimator Filter
Differentiator Compute Short-term Encoder S(n) To Magnitude And Channel Phase Differentiator Derivative Lowpass Decimator Filter bk n Short-term phase sink n derivative
9 The Phase Vocoder (synthesizer block diagram, kth channel) Decimated Short-term amplitude
cosk n Decoder From Channel Cos Interpolator Integrator ∑
Decimated Sin Interpolator Short-term sin n Phase k derivative
10 The Phase Vocoder :
LPF bandwidth: 50 Hz Demodulation separation: 100 Hz Number of filters: 25-30 Sampling rate of spectrum magnitude and phase derivative: 50-60 samples per second Spectral magnitude is coded using PCM or DPCM Phase derivative is coded linearly using 2-3 bits The resulting bit rate is 7200 bps
11 The Formant Vocoder :
The formant vocoder can be viewed as a type of channel vocoder that estimates the first three or four formants in a segment of speech.
It is this information plus the pitch period that is encoded and transmitted to the receiver.
12 The Formant Vocoder :
Example of formant: (a) : The spectrogram of the utterance “day one” showing the pitch and the harmonic structure of speech. (b) : A zoomed spectrogram of the fundamental and the second harmonic.
(a) (b)
13 The Formant Vocoder (analyzer block diagram):
F3 F3 B3
F2 F2 B2 Input Speech F1 F1 B1
Pitch V/U And V/U F0 Decoder
Fk :The frequency of the kth formant Bk :The bandwidth of the kth formant
14 The Formant Vocoder (synthesizer block diagram):
F3 F3 B3
F2 F2 ∑ B2
F1 F1 B1
V/U Excitation F0 Signal
The objective of LP analysis is to estimate parameters of an all-pole model for the vocal tract.
Several methods have been devised for generating the excitation sequence for speech synthesizes.
Various LPC-type speech analysis and synthesis methods differ primarily in the type of excitation signal generated for speech synthesis.
16 LPC 10 :
This methods is called LPC-10 because of 10 coefficient are typically employed.
LPC-10 partitions the speech into the 180 sample frame.
Pitch and voicing decision are determined by using the AMDF and zero crossing measures.
17 A General Discrete-Time Model For Speech Production
Pitch Gain s(n) DT G(z) Speech Impulse Glottal Signal Voiced U(n) generator Filter Voiced Volume V H(z) R(z) velocity Vocal tract LP U Filter Filter
Uncorrelated Unvoiced Noise generator Gain
18 پيشگويي خطي تعيين مرتبه پيشگويي
صفحه 19 از 54 پيشگويي خطي تعيين مرتبه پيشگويي
صفحه 20 از 54 پيشگويي خطي تعيين مرتبه پيشگويي
m s 2[n] nmM 1 PG 10log m e2[n] nmM 1
صفحه 21 از 54 پيشگويي خطي مثال
M=4
M=10
صفحه 22 از 54 پيشگويي خطي مثال
M=2
M=10
M=54
صفحه 23 از 54 پيشگويي خطي ايده پيشگويي خطي بلند مدت
M=10 M=50
صفحه 24 از 54 پيشگويي خطي پيشگويي خطي بلند مدت
صفحه 25 از 54 وكدر LPC10 مشخصات عمومي
LPC10
صفحه 26 از 54 وكدر LPC10 كد كننده
PCM
LPC LPC
LPC
Bit Encoder
صفحه 27 از 54 تشخيص پريود پيچ
m R[l,m] s[n]s[nl] n m N 1 m MDF[l,m] s[n] s[n l] n m N 1 YMC
s[n] b. s[n N] e[n], m N 1 m
صفحه 28 از 54 وكدر LPC10
MDF
T=20,21,…,39,40,42,…,80,84 ,…,154
صفحه 29 از 54 وكدر LPC10 كد كننده
LPC
RC
صفحه 30 از 54 وكدر LPC10 سنتز گفتار
سيگنال اصلي بخش كد كننده • تعيين صدادار/بي صدا بودن فريم • تعيين دوره گام فثط براي حالت صدادار • محاسبه بهره سيگنال
V/U قطار ضربه با پريود G يراير دوره گام
گفتار سنتز شده نويز تصادفي صفحه 31 از 54 وكدر LPC10 محدوديتها
AR
صفحه 32 از 54 Residual Excited LP Vocoder :
Speech quality can be improved at the expense of a higher bit rate by computing and transmitting a residual error, as done in the case of DPCM.
One method is that the LPC model and excitation parameters are estimated from a frame of speech.
33 Residual Excited LP Vocoder :
The speech is synthesized at the transmitter and subtracted from the original speech signal to form the residual error.
The residual error is quantized, coded, and transmitted to the receiver
At the receiver the signal is synthesized by adding the residual error to the signal generated from the model.
34 Residual Excited LP Vocoder :
The residual signal is low-pass filtered at 1000 Hz in the analyzer to reduce bit rate
In the synthesizer, it is rectified and spectrum flattened (using a HPF), the lowpass and highpass signals are summed and the resulting residual error signal is used to excite the LPC model.
RELP vocoder provides communication-quality speech at about 9600 bps.
35 RELP Analyzer (type 1):
S(n) Buffer f (n; m) e (n; m) And ∑ Residual window error
LP Parameters {aˆ(i;m)} stLP Encoder analysis To Θˆ , gain estimate Excitation 0 Channel V/U, decision parameters Pˆ, pitch estimate LP Synthesis model
36 RELP Analyzer (type 2):
Prediction Residual S(n) Buffer f (n; m) Inverse ( n ;m) Lowpass To And Filter Decimator DFT Encoder Filter Channel window Aˆ ( z;m)
LP stLP Parameters analysis {aˆ(i;m)}
37 Synthesizer for a RELP vocoder
Buffer From Residual Highpass Decoder And Interpolator Rectifier Filter Channel Controller ∑
LP model Parameter updates
LP Excitation synthesizer
38 Multipulse LPC Vocoder
RELP needs to regenerate the high- frequency components at the decoder. A crude approximation of the high frequencies The multipulse LPC is a time domain analysis-by-synthesis method that results in a better excitation signal for the LPC vocal system filter.
39 Multipulse LPC Vocoder
The information concerning the excitation sequence includes: the location of the pulses an overall scale factor corresponding to the largest pulse amplitude The pulse amplitudes relative to the overall scale factor The scale factor is logarithmically quantized into 6 bits. The amplitudes are linearly quantized into 4 bits. The pulse locations are encoded using a differential coding scheme. The excitation parameters are updated every 5 msec. The LPC vocal-tract parameters and the pitch period are updated every 20 msec. The bit rate is 9600 bps.
40 Analysis-by-synthesis coder
A stored sequence from a Gaussian excitation codebook is scaled and used to excite the cascade of a pitch synthesis filter and the LPC synthesis filter The synthetic speech is compared with the original speech Residual error signal is weighted perceptually by a filter ˆ(z / c) Aˆ(z) W (z) ˆ(z) Aˆ(z / c)
41 Obtaining the multipulse excitation: (Analysis by synthesis method)
Input speech s( n ) Buffer And Pˆ LP analysis
f ( n ;m) + (n;m) Pitch LP - Synthesis Synthesis ∑ filter fˆ(n;m) filterΘp (z) Perceptual Weighting filter W(z) Multipulse Error Excitation minimization generator W (n;m)
42 Code Excited LP :
CELP is an analysis-by-synthesis method in which the excitation sequence is selected from a codebook of zero-mean Gaussian sequence.
The bit rate of the CELP is 4800 bps.
43 CELP (analysis-by-synthesis coder) :
Speech samples
Buffer and Side LP LP analysis information Gain parameters Gaussian Pitch Spectral Envelope ∑ Excitation Synthesis (LP) codebook filter Synthesis filter
Perceptual Weighting Filter W(z)
Computer Index of Energy Excitation (square and sum) sequence
44 Analysis-by-synthesis coder
This weighted error is squared and summed over a subframe block to give the error energy By performing an exhaustive search through the codebook we find the excitation sequence that minimize the error energy
45 Analysis-by-synthesis coder
The gain factor for scaling the excitation sequence is determined for each codeword in the codebook by minimizing the error energy for the block of samples
46 CELP (synthesizer) :
From Buffer Gaussian Pitch LP decoder And Excitation Synthesis Synthesis Channel controller codebook filter filter
LP parameters, gain and pitch estimate updates
47 CELP synthesizer Cascade of two all-pole filter with coefficients that are updated periodically First filter is a long-delay pitch filter used to generate the pitch periodicity in voiced speech This filter has this form
(z) p p 1bz p
48 CELP
Parameters of the filter can be determined by minimizing the prediction error energy, after pitch estimation ,over a frame duration of 5msec Second filter is a short-delay all-pole (vocal-tract) filter and has 10-12 coefficients that are determined every 10- 20msec
49 Example:
sampling frequency is 8khz subframe block duration for the pitch estimation and excitation sequence is performed every 5msec. We have 40 samples per 5-msec The excitation sequence consist of 40 samples
50 Example: A codebook of 1024 sequences gives good-quality speech For such codebook size ,we require 10bits to send codebook index Hence the bit rate is reduced by a factor of 4 The transmission of pitch predictor parameters and spectral predictor brings the bit rate to about 4800 bps
51 Low-delay CELP coder
CELP has been used to achieve toll- quality speech at 16000 bps with low delay. Although other types of vocoders produces high quality speech at 16000 bps these vocoders buffer 10-20msec of speech samples
52 Low-delay CELP coder
The one way delay is of the order of 20-40 msec With modification of CELP, it is possible to reduce the one-way delay to about 2ms Low-delay CELP is achieved by using a backward-adaptive predictor with a gain parameter and an excitation vector size as small as 5 samples
53 Low-delay CELP coder
Input Speech s( n )
Buffer and window Excitation f ( n ;m) ˆ + Vector LP (high-order) f( n ;m) Gain ∑ quantizer Synthesis filter - codebook (n;m)
Predictor Perceptual Gain adaptation Weighting adaptation Filter W(z)
W (n;m) Error minimization
54 Low-delay CELP coder
Pitch predictor used in the conventional forward-adaptive coder is eliminated
In order to compensate for the loss in pitch information, the LPC predictor order is increased significantly , to an order of 50
55 Low-delay CELP coder
LPC coefficients are updated more frequently, every 2.5 ms
5-sample excitation vector corresponds to an excitation block duration of 0.625 msec at 8-kHz sampling rate
56 Low-delay CELP coder
The logarithm of the excitation gain is adapted every subframe excitation block by employing a 10th-order adaptive linear predictor in the logarithmic scale
The coefficients of the logarithmic-gain predictor are updated every four blocks by performing an LPC analysis of previously quantized excitation signal blocks
57 Low-delay CELP coder
The perceptual weighting filter is also 10th order and is updated once every four blocks by employing an LPC analysis on frames of the input speech signal of duration 2.5 msec The excitation codebook in the low-delay CELP is also modified compared to conventional CELP 10-bit excitation codebook is employed
58 Vector Sum Excited LP : The VSELP coder and decoder basically differ in method by which the excitation sequence is formed
In the next block diagram of the VSELP, there are three excitation sources
One excitation is obtained from the pitch period state
The other two excitation sources are obtained from two codebooks
59 VSELP Decoder :
Long-term Filter state
0 Spectral Pitch Codebook envelop Spectral Synthetic ∑ synthesis (LP) 1 post filter filter synthesis Speech filter
1
Codebook 2
2
60 VSELP Decoder
LPC synthesis filter is implemented as a 10-pole filter and its coefficients are coded and transmitted every 20ms Coefficients are updated in each 5-ms frame by interpolation Excitation parameters are also updated every 5ms
61 VSELP Decoder
128 codewords in each of the two codebooks codewords are constructed from two sets of seven basis codewords by forming linear combinations of the seven basis codewords The long-term filter state is also a codebook with 128 codeword sequences
62 VSELP Decoder In each 5-msec frame, the codewords from this codebook are filtered through the speech system filter ˆ ( z ) and correlated with the input speech sequence
The filtered codeword is used to update the history and the lag is transmitted to the decoder
63 VSELP Decoder
Thus the update occurs by appending the best-filtered codeword to the history codebook The oldest sample in the history array is discarded The result is that the long-term state becomes an adaptive codebook
64 VSELP Decoder
The three excitation sequences are selected sequentially from each of three codebooks Each codebook search attempts to find the codeword that minimizes the total energy of the perceptually weighted error Once the codewords have been selected the three gain parameters are optimized
65 VSELP Decoder
Joint gain optimization is sequentially accomplished by orthogonalizing each weighted codeword vectors prior to the codebook search These parameters are vector quantized to one of 256 eight-bit vectors and transmitted in every 5-ms frame
66 Vector Sum Excited LP :
The bit rate of the VSELP is about 8000 bps. Bit allocations for 8000-bps VSELP
Parameters Bits/5-ms Frame Bits/20ms 10 LPC coefficients - 38 Average speech energy - 5 Excitation codewords from two VSELP codebooks 14 56 Gain parameters 8 32 Lag of pitch filter 7 28 Total 29 159
67 VSELP Decoder Finally, an adaptive spectral post filter is employed in VSELP following the LPC synthesis filter; this post filter is a pole-zero filter of the form
ˆ(z / c) Aˆ(z) W (z) ˆ(z) Aˆ(z / c)
68 DEMO
Speech Codec Male Female Music Speaker Speaker Original Speech/Music (16-bit sampled at 8KHz) FS-1015 (LPC-10e 2.4 kb/s)
FS-1016(CELP 4.8 kb/s)
IS-54 ( VSELP 7.95 kb/s)
G.721 (32 kb/s ADPCM)
69 Standard Voice Algorithms G.711 The most widely used digital representation of voice signals is that of the G.711 or PCM (Pulse Code Modulation) This codec represents a 4 kHz band limited voice signal sampled at 8 kHz using 8 bits per sample A-law or m-law coding. G.726 The protocol for the G.726 codec requires a 64 kbps A-Law or m-law PCM signal to be encoded into four different bit rate options ranging from 2 bits per sample to 5 bits per sample The algorithm is based on Adaptive Differential Pulse Code Modulation (ADPCM) and is based on 1 sample backward prediction scheme.
70 G.728 The G.728 algorithm compresses PCM codec voice signals to a bit rate of 16 kbps. This algorithm is based on a strong backward prediction scheme and is by far considered as one of the most complex voice algorithms to be produced by the ITU standard organization. G.729 For compression of voice signals at 8 kbps the G.729 algorithm offers toll quality with built in algorithmic delays of less than 15 msec Additional features described in the G.729 Annex ensure VAD1 and Comfort Noise Generation functionalities to enhance the quality and reduce the overall bit rate G.723.1 The most widely used algorithm for band limited channels, such as VoIP and video conferencing, is that of G.723.1 The algorithm has two operating bit rates of 6.3 kbps and 5.3 kbps Although the delay is not as low as that of the other ITU standards its quality is near toll quality for the given low bit rates, making it very efficient in bit usage.
71 GSM2—AMR The latest GSM standard is the multi rate Adaptive Code Excited Linear Prediction that provides compression in the range of 4.75 to 12.2 kbps In total the codec provides 12 bit rates that cover the half rate to full rate channel capacity. GSM—FR The first digital codec used in a mobile environment is the GSM Full Rate vocoder The codec compresses 13 bit PCM sample signals to a rate of 13 kbps The algorithm is based on a very simple Regular Pulse Excited – Linear Prediction Coding technique. GSM—HR To increase capacity, the GSM committee decided on a lower bit rate of 5.6 kbps for the voice channel The algorithm is based on the Vector Sum Excited Linear Predictive (VSELP) and is computationally as complex as other low bit rate algorithms.
72