<<

Vocoders

1 The Channel Vocoder (analyzer):  The channel vocoder employs a bank of bandpass filters,  Each having a bandwidth between 100 Hz and 300 Hz.  Typically, 16-20 linear phase FIR filter are used.  The output of each filter is rectified and lowpass filtered.  The bandwidth of the lowpass filter is selected to match the time variations in the characteristics of the vocal tract.  For measurement of the spectral magnitudes, a voicing detector and a pitch estimator are included in the analysis.

2 The Channel Vocoder (analyzer block diagram):

Bandpass Lowpass A/D Rectifier Filter Filter Converter

Bandpass Lowpass A/D Rectifier Encoder Filter Filter Converter To S(n) Channel

Voicing detector

Pitch detector

3 The Channel Vocoder ():

 16-20 linear-phase FIR filters  Covering 0-4 kHz  Each having a bandwidth between 100- 300 Hz  20-ms frames, or 50 Hz changing of spectral magnitude  LPF bandwidth: 20-25 Hz  Sampling rate of the output of the filters: 50 Hz

4 The Channel Vocoder (synthesizer):

 Bit rate: 1 bit for voicing detector 6 bits for pitch period For 16 channels, each coded with 3-4 bits, updated 50 times per second Then the total bit rate is 2400-3200 bps Further reductions to 1200 bps can be achieved by exploiting frequency correlations of the spectrum magnitude

5 The Channel Vocoder (synthesizer):

 At the receiver the signal samples are passed through D/A converters.

 The outputs of the D/As are multiplied by the voiced or unvoiced signal sources.

 The resulting signal are passed through bandpass filters.

 The outputs of the bandpass filters are summed to form the synthesized speech signal.

6 The Channel Vocoder (synthesizer block diagram):

D/A Bandpass Converter Filter

Output ∑ speech D/A Bandpass

Decoder Converter Filter From Channel

Voicing Information Switch

Random Pitch Pulse period generator generator

7 The Phase Vocoder :  The phase vocoder is similar to the channel vocoder.

 However, instead of estimating the pitch, the phase vocoder estimates the phase derivative at the output of each filter.

 By coding and transmitting the phase derivative, this vocoder destroys the phase information .

8 The Phase Vocoder (analyzer block diagram, kth channel)

Short-term magnitude

ak n Lowpasscosk n sink n Decimator Filter

Differentiator Compute Short-term Encoder S(n) To Magnitude And Channel Phase Differentiator Derivative Lowpass Decimator Filter bk n Short-term phase sink n derivative

9 The Phase Vocoder (synthesizer block diagram, kth channel) Decimated Short-term amplitude

cosk n Decoder From Channel Cos Interpolator Integrator ∑

Decimated Sin Interpolator Short-term sin n Phase k derivative

10 The Phase Vocoder :

 LPF bandwidth: 50 Hz  Demodulation separation: 100 Hz  Number of filters: 25-30  Sampling rate of spectrum magnitude and phase derivative: 50-60 samples per second  Spectral magnitude is coded using PCM or DPCM  Phase derivative is coded linearly using 2-3 bits  The resulting bit rate is 7200 bps

11 The Vocoder :

 The formant vocoder can be viewed as a type of channel vocoder that estimates the first three or four in a segment of speech.

 It is this information plus the pitch period that is encoded and transmitted to the receiver.

12 The Formant Vocoder :

 Example of formant:  (a) : The spectrogram of the utterance “day one” showing the pitch and the structure of speech.  (b) : A zoomed spectrogram of the fundamental and the second harmonic.

(a) (b)

13 The Formant Vocoder (analyzer block diagram):

F3 F3 B3

F2 F2 B2 Input Speech F1 F1 B1

Pitch V/U And V/U F0 Decoder

Fk :The frequency of the kth formant Bk :The bandwidth of the kth formant

14 The Formant Vocoder (synthesizer block diagram):

F3 F3 B3

F2 F2 ∑ B2

F1 F1 B1

V/U Excitation F0 Signal

15 :

 The objective of LP analysis is to estimate parameters of an all-pole model for the vocal tract.

 Several methods have been devised for generating the excitation sequence for speech synthesizes.

 Various LPC-type speech analysis and synthesis methods differ primarily in the type of excitation signal generated for .

16 LPC 10 :

 This methods is called LPC-10 because of 10 coefficient are typically employed.

 LPC-10 partitions the speech into the 180 sample frame.

 Pitch and voicing decision are determined by using the AMDF and zero crossing measures.

17 A General Discrete-Time Model For Speech Production

Pitch Gain s(n) DT G(z) Speech Impulse Glottal Signal Voiced U(n) generator Filter Voiced Volume V H(z) R(z) velocity Vocal tract LP U Filter Filter

Uncorrelated Unvoiced Noise generator Gain

18 پيشگويي خطي تعيين مرتبه پيشگويي

صفحه 19 از 54 پيشگويي خطي تعيين مرتبه پيشگويي

صفحه 20 از 54 پيشگويي خطي تعيين مرتبه پيشگويي

m  s 2[n]  nmM 1  PG  10log m  e2[n]  nmM 1 

صفحه 21 از 54 پيشگويي خطي مثال

M=4

M=10

صفحه 22 از 54 پيشگويي خطي مثال

M=2

M=10

M=54

صفحه 23 از 54 پيشگويي خطي ايده پيشگويي خطي بلند مدت

M=10 M=50

صفحه 24 از 54 پيشگويي خطي پيشگويي خطي بلند مدت

صفحه 25 از 54 وكدر LPC10 مشخصات عمومي

LPC10     

صفحه 26 از 54 وكدر LPC10 كد كننده

PCM

LPC LPC

LPC

Bit Encoder

صفحه 27 از 54 تشخيص پريود پيچ

 m R[l,m] s[n]s[nl] n  m N 1  m MDF[l,m]   s[n] s[n l] n  m  N 1  YMC

s[n]  b. s[n  N] e[n], m  N 1 m

صفحه 28 از 54 وكدر LPC10

MDF 

T=20,21,…,39,40,42,…,80,84 ,…,154

صفحه 29 از 54 وكدر LPC10 كد كننده

LPC

RC

صفحه 30 از 54 وكدر LPC10 سنتز گفتار

سيگنال اصلي بخش كد كننده • تعيين صدادار/بي صدا بودن فريم • تعيين دوره گام فثط براي حالت صدادار • محاسبه بهره سيگنال

V/U قطار ضربه با پريود G يراير دوره گام

گفتار سنتز شده نويز تصادفي صفحه 31 از 54 وكدر LPC10 محدوديتها

AR

صفحه 32 از 54 Residual Excited LP Vocoder :

 Speech quality can be improved at the expense of a higher bit rate by computing and transmitting a residual error, as done in the case of DPCM.

 One method is that the LPC model and excitation parameters are estimated from a frame of speech.

33 Residual Excited LP Vocoder :

 The speech is synthesized at the transmitter and subtracted from the original speech signal to form the residual error.

 The residual error is quantized, coded, and transmitted to the receiver

 At the receiver the signal is synthesized by adding the residual error to the signal generated from the model.

34 Residual Excited LP Vocoder :

 The residual signal is low-pass filtered at 1000 Hz in the analyzer to reduce bit rate

 In the synthesizer, it is rectified and spectrum flattened (using a HPF), the lowpass and highpass signals are summed and the resulting residual error signal is used to excite the LPC model.

 RELP vocoder provides communication-quality speech at about 9600 bps.

35 RELP Analyzer (type 1):

S(n) Buffer f (n; m) e (n; m) And ∑ Residual window error

LP Parameters {aˆ(i;m)} stLP Encoder analysis To Θˆ , gain estimate Excitation 0 Channel V/U, decision parameters Pˆ, pitch estimate LP Synthesis model

36 RELP Analyzer (type 2):

Prediction Residual S(n) Buffer f (n; m) Inverse ( n ;m) Lowpass To And Filter Decimator DFT Encoder Filter Channel window Aˆ ( z;m)

LP stLP Parameters analysis {aˆ(i;m)}

37 Synthesizer for a RELP vocoder

Buffer From Residual Highpass Decoder And Interpolator Rectifier Filter Channel Controller ∑

LP model Parameter updates

LP Excitation synthesizer

38 Multipulse LPC Vocoder

 RELP needs to regenerate the high- frequency components at the decoder. A crude approximation of the high frequencies  The multipulse LPC is a time domain analysis-by-synthesis method that results in a better excitation signal for the LPC vocal system filter.

39 Multipulse LPC Vocoder

 The information concerning the excitation sequence includes:  the location of the pulses  an overall scale factor corresponding to the largest pulse amplitude  The pulse amplitudes relative to the overall scale factor  The scale factor is logarithmically quantized into 6 bits.  The amplitudes are linearly quantized into 4 bits.  The pulse locations are encoded using a differential coding scheme.  The excitation parameters are updated every 5 msec.  The LPC vocal-tract parameters and the pitch period are updated every 20 msec.  The bit rate is 9600 bps.

40 Analysis-by-synthesis coder

 A stored sequence from a Gaussian excitation codebook is scaled and used to excite the cascade of a pitch synthesis filter and the LPC synthesis filter  The synthetic speech is compared with the original speech  Residual error signal is weighted perceptually by a filter ˆ(z / c) Aˆ(z) W (z)   ˆ(z) Aˆ(z / c)

41 Obtaining the multipulse excitation: (Analysis by synthesis method)

Input speech s( n ) Buffer And Pˆ LP analysis

f ( n ;m) +  (n;m) Pitch LP - Synthesis Synthesis ∑ filter fˆ(n;m) filterΘp (z) Perceptual Weighting filter W(z) Multipulse Error Excitation minimization generator W (n;m)

42 Code Excited LP :

 CELP is an analysis-by-synthesis method in which the excitation sequence is selected from a codebook of zero-mean Gaussian sequence.

 The bit rate of the CELP is 4800 bps.

43 CELP (analysis-by-synthesis coder) :

Speech samples

Buffer and Side LP LP analysis information Gain parameters Gaussian Pitch Spectral Envelope ∑ Excitation Synthesis (LP) codebook filter Synthesis filter

Perceptual Weighting Filter W(z)

Computer Index of Energy Excitation (square and sum) sequence

44 Analysis-by-synthesis coder

 This weighted error is squared and summed over a subframe block to give the error energy  By performing an exhaustive search through the codebook we find the excitation sequence that minimize the error energy

45 Analysis-by-synthesis coder

 The gain factor for scaling the excitation sequence is determined for each codeword in the codebook by minimizing the error energy for the block of samples

46 CELP (synthesizer) :

From Buffer Gaussian Pitch LP decoder And Excitation Synthesis Synthesis Channel controller codebook filter filter

LP parameters, gain and pitch estimate updates

47 CELP synthesizer  Cascade of two all-pole filter with coefficients that are updated periodically  First filter is a long-delay pitch filter used to generate the pitch periodicity in voiced speech  This filter has this form

  (z)  p p 1bz  p

48 CELP

 Parameters of the filter can be determined by minimizing the prediction error energy, after pitch estimation ,over a frame duration of 5msec  Second filter is a short-delay all-pole (vocal-tract) filter and has 10-12 coefficients that are determined every 10- 20msec

49 Example:

 sampling frequency is 8khz  subframe block duration for the pitch estimation and excitation sequence is performed every 5msec.  We have 40 samples per 5-msec  The excitation sequence consist of 40 samples

50 Example:  A codebook of 1024 sequences gives good-quality speech  For such codebook size ,we require 10bits to send codebook index  Hence the bit rate is reduced by a factor of 4  The transmission of pitch predictor parameters and spectral predictor brings the bit rate to about 4800 bps

51 Low-delay CELP coder

 CELP has been used to achieve toll- quality speech at 16000 bps with low delay.  Although other types of vocoders produces high quality speech at 16000 bps these vocoders buffer 10-20msec of speech samples

52 Low-delay CELP coder

 The one way delay is of the order of 20-40 msec  With modification of CELP, it is possible to reduce the one-way delay to about 2ms  Low-delay CELP is achieved by using a backward-adaptive predictor with a gain parameter and an excitation vector size as small as 5 samples

53 Low-delay CELP coder

Input Speech s( n )

Buffer and window Excitation f ( n ;m) ˆ + Vector LP (high-order) f( n ;m) Gain ∑ quantizer Synthesis filter - codebook  (n;m)

Predictor Perceptual Gain adaptation Weighting adaptation Filter W(z)

W (n;m) Error minimization

54 Low-delay CELP coder

 Pitch predictor used in the conventional forward-adaptive coder is eliminated

 In order to compensate for the loss in pitch information, the LPC predictor order is increased significantly , to an order of 50

55 Low-delay CELP coder

 LPC coefficients are updated more frequently, every 2.5 ms

 5-sample excitation vector corresponds to an excitation block duration of 0.625 msec at 8-kHz sampling rate

56 Low-delay CELP coder

 The logarithm of the excitation gain is adapted every subframe excitation block by employing a 10th-order adaptive linear predictor in the logarithmic scale

 The coefficients of the logarithmic-gain predictor are updated every four blocks by performing an LPC analysis of previously quantized excitation signal blocks

57 Low-delay CELP coder

 The perceptual weighting filter is also 10th order and is updated once every four blocks by employing an LPC analysis on frames of the input speech signal of duration 2.5 msec  The excitation codebook in the low-delay CELP is also modified compared to conventional CELP  10-bit excitation codebook is employed

58 Vector Sum Excited LP :  The VSELP coder and decoder basically differ in method by which the excitation sequence is formed

 In the next block diagram of the VSELP, there are three excitation sources

 One excitation is obtained from the pitch period state

 The other two excitation sources are obtained from two codebooks

59 VSELP Decoder :

Long-term Filter state

0 Spectral Pitch Codebook envelop Spectral Synthetic ∑ synthesis (LP) 1 post filter filter synthesis Speech filter

1

Codebook 2

2

60 VSELP Decoder

 LPC synthesis filter is implemented as a 10-pole filter and its coefficients are coded and transmitted every 20ms  Coefficients are updated in each 5-ms frame by interpolation  Excitation parameters are also updated every 5ms

61 VSELP Decoder

 128 codewords in each of the two codebooks  codewords are constructed from two sets of seven basis codewords by forming linear combinations of the seven basis codewords  The long-term filter state is also a codebook with 128 codeword sequences

62 VSELP Decoder  In each 5-msec frame, the codewords from this codebook are filtered through the speech system filter  ˆ ( z ) and correlated with the input speech sequence

 The filtered codeword is used to update the history and the lag is transmitted to the decoder

63 VSELP Decoder

 Thus the update occurs by appending the best-filtered codeword to the history codebook  The oldest sample in the history array is discarded  The result is that the long-term state becomes an adaptive codebook

64 VSELP Decoder

 The three excitation sequences are selected sequentially from each of three codebooks  Each codebook search attempts to find the codeword that minimizes the total energy of the perceptually weighted error  Once the codewords have been selected the three gain parameters are optimized

65 VSELP Decoder

 Joint gain optimization is sequentially accomplished by orthogonalizing each weighted codeword vectors prior to the codebook search  These parameters are vector quantized to one of 256 eight-bit vectors and transmitted in every 5-ms frame

66 Vector Sum Excited LP :

 The bit rate of the VSELP is about 8000 bps.  Bit allocations for 8000-bps VSELP

Parameters Bits/5-ms Frame Bits/20ms 10 LPC coefficients - 38 Average speech energy - 5 Excitation codewords from two VSELP codebooks 14 56 Gain parameters 8 32 Lag of pitch filter 7 28 Total 29 159

67 VSELP Decoder  Finally, an adaptive spectral post filter is employed in VSELP following the LPC synthesis filter; this post filter is a pole-zero filter of the form

ˆ(z / c) Aˆ(z) W (z)   ˆ(z) Aˆ(z / c)

68 DEMO

Speech Codec Male Female Speaker Speaker Original Speech/Music (16-bit sampled at 8KHz) FS-1015 (LPC-10e 2.4 kb/s)

FS-1016(CELP 4.8 kb/s)

IS-54 ( VSELP 7.95 kb/s)

G.721 (32 kb/s ADPCM)

69  Standard Voice Algorithms  G.711  The most widely used digital representation of voice signals is that of the G.711 or PCM (Pulse Code )  This codec represents a 4 kHz band limited voice signal sampled at 8 kHz using 8 bits per sample A-law or m-law coding.  G.726  The protocol for the G.726 codec requires a 64 kbps A-Law or m-law PCM signal to be encoded into four different bit rate options ranging from 2 bits per sample to 5 bits per sample  The algorithm is based on Adaptive Differential Pulse Code Modulation (ADPCM) and is based on 1 sample backward prediction scheme.

70  G.728  The G.728 algorithm compresses PCM codec voice signals to a bit rate of 16 kbps.  This algorithm is based on a strong backward prediction scheme and is by far considered as one of the most complex voice algorithms to be produced by the ITU standard organization.  G.729  For compression of voice signals at 8 kbps the G.729 algorithm offers toll quality with built in algorithmic delays of less than 15 msec  Additional features described in the G.729 Annex ensure VAD1 and Comfort Noise Generation functionalities to enhance the quality and reduce the overall bit rate  G.723.1  The most widely used algorithm for band limited channels, such as VoIP and video conferencing, is that of G.723.1  The algorithm has two operating bit rates of 6.3 kbps and 5.3 kbps  Although the delay is not as low as that of the other ITU standards its quality is near toll quality for the given low bit rates, making it very efficient in bit usage.

71  GSM2—AMR  The latest GSM standard is the multi rate Adaptive Code Excited Linear Prediction that provides compression in the range of 4.75 to 12.2 kbps  In total the codec provides 12 bit rates that cover the half rate to full rate channel capacity.  GSM—FR  The first digital codec used in a mobile environment is the GSM Full Rate vocoder  The codec compresses 13 bit PCM sample signals to a rate of 13 kbps  The algorithm is based on a very simple Regular Pulse Excited – Linear Prediction Coding technique.  GSM—HR  To increase capacity, the GSM committee decided on a lower bit rate of 5.6 kbps for the voice channel  The algorithm is based on the Vector Sum Excited Linear Predictive (VSELP) and is computationally as complex as other low bit rate algorithms.

72