Speech Processing (Vocoders)

Vocoders 1 The Channel Vocoder (analyzer): The channel vocoder employs a bank of bandpass filters, Each having a bandwidth between 100 Hz and 300 Hz. Typically, 16-20 linear phase FIR filter are used. The output of each filter is rectified and lowpass filtered. The bandwidth of the lowpass filter is selected to match the time variations in the characteristics of the vocal tract. For measurement of the spectral magnitudes, a voicing detector and a pitch estimator are included in the speech analysis. 2 The Channel Vocoder (analyzer block diagram): Bandpass Lowpass A/D Rectifier Filter Filter Converter Bandpass Lowpass A/D Rectifier Encoder Filter Filter Converter To S(n) Channel Voicing detector Pitch detector 3 The Channel Vocoder (synthesizer): 16-20 linear-phase FIR filters Covering 0-4 kHz Each having a bandwidth between 100- 300 Hz 20-ms frames, or 50 Hz changing of spectral magnitude LPF bandwidth: 20-25 Hz Sampling rate of the output of the filters: 50 Hz 4 The Channel Vocoder (synthesizer): Bit rate: 1 bit for voicing detector 6 bits for pitch period For 16 channels, each coded with 3-4 bits, updated 50 times per second Then the total bit rate is 2400-3200 bps Further reductions to 1200 bps can be achieved by exploiting frequency correlations of the spectrum magnitude 5 The Channel Vocoder (synthesizer): At the receiver the signal samples are passed through D/A converters. The outputs of the D/As are multiplied by the voiced or unvoiced signal sources. The resulting signal are passed through bandpass filters. The outputs of the bandpass filters are summed to form the synthesized speech signal. 6 The Channel Vocoder (synthesizer block diagram): D/A Bandpass Converter Filter Output ∑ speech D/A Bandpass Decoder Converter Filter From Channel Voicing Information Switch Random Pitch Pulse Noise period generator generator 7 The Phase Vocoder : The phase vocoder is similar to the channel vocoder. However, instead of estimating the pitch, the phase vocoder estimates the phase derivative at the output of each filter. By coding and transmitting the phase derivative, this vocoder destroys the phase information . 8 The Phase Vocoder (analyzer block diagram, kth channel) Short-term magnitude ak n Lowpasscosk n sink n Decimator Filter Differentiator Compute Short-term Encoder S(n) To Magnitude And Channel Phase Differentiator Derivative Lowpass Decimator Filter bk n Short-term phase sink n derivative 9 The Phase Vocoder (synthesizer block diagram, kth channel) Decimated Short-term amplitude cosk n Decoder From Channel Cos Interpolator Integrator ∑ Decimated Sin Interpolator Short-term Phase sink n derivative 10 The Phase Vocoder : LPF bandwidth: 50 Hz Demodulation separation: 100 Hz Number of filters: 25-30 Sampling rate of spectrum magnitude and phase derivative: 50-60 samples per second Spectral magnitude is coded using PCM or DPCM Phase derivative is coded linearly using 2-3 bits The resulting bit rate is 7200 bps 11 The Formant Vocoder : The formant vocoder can be viewed as a type of channel vocoder that estimates the first three or four formants in a segment of speech. It is this information plus the pitch period that is encoded and transmitted to the receiver. 12 The Formant Vocoder : Example of formant: (a) : The spectrogram of the utterance “day one” showing the pitch and the harmonic structure of speech. (b) : A zoomed spectrogram of the fundamental and the second harmonic. (a) (b) 13 The Formant Vocoder (analyzer block diagram): F3 F3 B3 F2 F2 B2 Input Speech F1 F1 B1 Pitch V/U And V/U F0 Decoder Fk :The frequency of the kth formant Bk :The bandwidth of the kth formant 14 The Formant Vocoder (synthesizer block diagram): F3 F3 B3 F2 F2 ∑ B2 F1 F1 B1 V/U Excitation F0 Signal 15 Linear Predictive Coding : The objective of LP analysis is to estimate parameters of an all-pole model for the vocal tract. Several methods have been devised for generating the excitation sequence for speech synthesizes. Various LPC-type speech analysis and synthesis methods differ primarily in the type of excitation signal generated for speech synthesis. 16 LPC 10 : This methods is called LPC-10 because of 10 coefficient are typically employed. LPC-10 partitions the speech into the 180 sample frame. Pitch and voicing decision are determined by using the AMDF and zero crossing measures. 17 A General Discrete-Time Model For Speech Production Pitch Gain s(n) DT G(z) Speech Impulse Glottal Signal Voiced U(n) generator Filter Voiced Volume V H(z) R(z) velocity Vocal tract LP U Filter Filter Uncorrelated Unvoiced Noise generator Gain 18 پيشگويي خطي تعيين مرتبه پيشگويي صفحه 19 از 54 پيشگويي خطي تعيين مرتبه پيشگويي صفحه 20 از 54 پيشگويي خطي تعيين مرتبه پيشگويي m s 2[n] nmM 1 PG 10log m e2[n] nmM 1 صفحه 21 از 54 پيشگويي خطي مثال M=4 M=10 صفحه 22 از 54 پيشگويي خطي مثال M=2 M=10 M=54 صفحه 23 از 54 پيشگويي خطي ايده پيشگويي خطي بلند مدت M=10 M=50 صفحه 24 از 54 پيشگويي خطي پيشگويي خطي بلند مدت صفحه 25 از 54 وكدر LPC10 مشخصات عمومي LPC10 صفحه 26 از 54 وكدر LPC10 كد كننده PCM LPC LPC LPC Bit Encoder صفحه 27 از 54 تشخيص پريود پيچ m R[l,m] s[n]s[nl] n m N 1 m MDF[l,m] s[n] s[n l] n m N 1 YMC s[n] b. s[n N] e[n], m N 1 m صفحه 28 از 54 وكدر LPC10 MDF T=20,21,…,39,40,42,…,80,84 ,…,154 صفحه 29 از 54 وكدر LPC10 كد كننده LPC RC صفحه 30 از 54 وكدر LPC10 سنتز گفتار سيگنال اصلي بخش كد كننده • تعيين صدادار/بي صدا بودن فريم • تعيين دوره گام فثط براي حالت صدادار • محاسبه بهره سيگنال V/U قطار ضربه با پريود G يراير دوره گام گفتار سنتز شده نويز تصادفي صفحه 31 از 54 وكدر LPC10 محدوديتها AR صفحه 32 از 54 Residual Excited LP Vocoder : Speech quality can be improved at the expense of a higher bit rate by computing and transmitting a residual error, as done in the case of DPCM. One method is that the LPC model and excitation parameters are estimated from a frame of speech. 33 Residual Excited LP Vocoder : The speech is synthesized at the transmitter and subtracted from the original speech signal to form the residual error. The residual error is quantized, coded, and transmitted to the receiver At the receiver the signal is synthesized by adding the residual error to the signal generated from the model. 34 Residual Excited LP Vocoder : The residual signal is low-pass filtered at 1000 Hz in the analyzer to reduce bit rate In the synthesizer, it is rectified and spectrum flattened (using a HPF), the lowpass and highpass signals are summed and the resulting residual error signal is used to excite the LPC model. RELP vocoder provides communication-quality speech at about 9600 bps. 35 RELP Analyzer (type 1): S(n) Buffer f (n; m) e (n; m) And ∑ Residual window error LP Parameters {aˆ(i;m)} stLP Encoder analysis To Θˆ , gain estimate Excitation 0 Channel V/U, decision parameters Pˆ, pitch estimate LP Synthesis model 36 RELP Analyzer (type 2): Prediction Residual S(n) Buffer f (n; m) Inverse (n;m) Lowpass To And Filter Decimator DFT Encoder Filter Channel window Aˆ (z;m) {aˆ(i;m)} LP stLP Parameters analysis 37 Synthesizer for a RELP vocoder Buffer From Residual Highpass Decoder And Interpolator Rectifier Filter Channel Controller ∑ LP model Parameter updates LP Excitation synthesizer 38 Multipulse LPC Vocoder RELP needs to regenerate the high- frequency components at the decoder. A crude approximation of the high frequencies The multipulse LPC is a time domain analysis-by-synthesis method that results in a better excitation signal for the LPC vocal system filter. 39 Multipulse LPC Vocoder The information concerning the excitation sequence includes: the location of the pulses an overall scale factor corresponding to the largest pulse amplitude The pulse amplitudes relative to the overall scale factor The scale factor is logarithmically quantized into 6 bits. The amplitudes are linearly quantized into 4 bits. The pulse locations are encoded using a differential coding scheme. The excitation parameters are updated every 5 msec. The LPC vocal-tract parameters and the pitch period are updated every 20 msec. The bit rate is 9600 bps. 40 Analysis-by-synthesis coder A stored sequence from a Gaussian excitation codebook is scaled and used to excite the cascade of a pitch synthesis filter and the LPC synthesis filter The synthetic speech is compared with the original speech Residual error signal is weighted perceptually by a filter ˆ(z / c) Aˆ(z) W (z) ˆ(z) Aˆ(z / c) 41 Obtaining the multipulse excitation: (Analysis by synthesis method) Input speech s(n) Buffer And Pˆ LP analysis f(n;m) + (n;m) Pitch LP - Synthesis Synthesis ∑ filter fˆ(n;m) filterΘp (z) Perceptual Weighting filter W(z) Multipulse Error Excitation minimization generator W (n;m) 42 Code Excited LP : CELP is an analysis-by-synthesis method in which the excitation sequence is selected from a codebook of zero-mean Gaussian sequence. The bit rate of the CELP is 4800 bps. 43 CELP (analysis-by-synthesis coder) : Speech samples Buffer and Side LP LP analysis information Gain parameters Gaussian Pitch Spectral Envelope ∑ Excitation Synthesis (LP) codebook filter Synthesis filter Perceptual Weighting Filter W(z) Computer Index of Energy Excitation (square and sum) sequence 44 Analysis-by-synthesis coder This weighted error is squared and summed over a subframe block to give the error energy By performing an exhaustive search through the codebook we find the excitation sequence that minimize the error energy 45 Analysis-by-synthesis coder The gain factor for scaling the excitation sequence is determined for each codeword in the codebook by minimizing

Speech Processing (Vocoders)

The Phase Vocoder: a Tutorial Author(S): Mark Dolson Source: Computer Music Journal, Vol

TAL-Vocoder-II

The Futurism of Hip Hop: Space, Electro and Science Fiction in Rap

Overview of Voice Over IP

Software Sequencers and Cyborg Singers

A Vocoder (Short for Voice Encoder) Is a Synthesis System, Which Was

Mediated Music Makers. Constructing Author Images in Popular Music

Robotic Voice Effects

Microkorg Owner's Manual

History of Electronic Sound Modification'

Advanced Speech Compression VIA Voice Excited Linear Predictive Coding Using Discrete Cosine Transform (DCT)

Revoicer Manual