Information Rate Speech Information Rates

AnalogAnalog--toto--DigitalDigital Conversion (Sampling and Quantization)

Digital Speech ProcessingProcessing—— Lecture 15

Speech Coding Methods Based on Speech Waveform Representations and Speech ModelsModels——UniformUniform and NonNon-- Uniform Coding Methods Class of “waveform coders” can be 1 represented in this manner 2

i Waveform coder information rate, I , • Production level: w – 10-15 phonemes/second for continuous speech of the digital representation of the signal, – 32-64 phonemes per language => 6 bits/phoneme xt(), defined as: – Information Rate=60-90 bps at the source a • Waveform level IBFBTwS=⋅ =/ – speech bandwidth from 4 – 10 kHz => sampling rate from 8 – 20 kHz where B is the number of bits used to – need 12-16 bit quantization for high quality digital 1 coding represent each sample and FS = , is the – Information Rate=96-320 Kbps => more than 3 T orders of magnitude difference in Information Rates number of samples/second between the production and waveform levels

3 4

Speech Analysis/Synthesis Systems Speech Coder Comparisons

• Speech parameters (the chosen representation) are encoded for transmission or storage • analysis and encoding gives a data parameter vector • data parameter vector computed at a sampling rate much lower than the signal sampling rate • Second class of digital speech coding systems: • denote the “frame rate” of the analysis as Ffr o analysis/synthesis systems o model-based systems • total information rate for model-based coders is: o hybrid coders Imcfr= BF⋅ o vocoder (voice coder) systems • where B is the total number of bits required to • Detailed waveform properties generally not preserved c o coder estimates parameters of a model for speech production represent the parameter vector o coder tries to preserve intelligibility and quality of reproduction from the digital representation 5 6

1 Speech Coder Comparisons Introduction to Waveform Coding

xn()= xa ( nT ) • waveform coders characterized by: ∞ • high bit rates (24 Kbps – 1 Mbps) jTΩ 12jkπ •low complexity Xe()=Ω+∑ Xa ( j ) • low flexibility TTk =−∞ • analysis/synthesis systems characterized by: 12∞ jω jkπ • low bit rates (10 Kbps – 600 bps) ω =ΩTXe ⇒()jω = X ( + ) • high complexity ∑ a TTTk =−∞ • great flexibility (e.g., time expansion/compression) 7 8

Introduction to Waveform Coding Sampling Speech Sounds

T=1/ Fs

•to perfectly recover xa(t) (or equivalently a lowpass filtered version of it) from the set of digital samples (as yet unquantized) we require that Fs = 1/T > twice the highest frequency in the input signal

• this implies that xa(t) must first be lowpass filtered since speech is not inherently lowpass ¾ for telephone bandwidth the frequency range of interest is 200-3200 Hz

(filtering range) => Fs = 6400 Hz, 8000 Hz • notice high frequency components of vowels and fricatives (up to 10 kHz) => need F > 20 kHz without lowpass filtering ¾ for wideband speech the frequency range of interest is 100-7000 Hz s • need only about 4 kHz to estimate formant frequencies (filtering range) => Fs = 16000 Hz 9 10 • need only about 3.2 kHz for telephone speech coding

Telephone Channel Response Statistical Model of Speech

• assume xa(t) is a sample function of a continuous-time random process

• then x(n) (derived from xa(t) by sampling) is also a sample sequence of a discrete-time random process

• xa(t) has first order probability density, p(x), with autocorrelation and power spectrum of the form

φaaa()ττ=+Ex[ () tx ( t )] ∞ −Ωj τ it is clear that 4 kHz bandwidth is sufficient for most ΦΩ=aa()φ ()ττed applications using telephone speech because of inherent ∫ −∞ channel band limitations from the transmission path 11 12

2 Speech Probability Density Function Statistical Model of Speech • probability density function for x(n) is the same as for xa(t) since • xn ( ) has autocorrelation and power spectrum of the form: x(n)= xa(nT) => the mean and variance are the same for both x(n) and x (t) φ (mExnxnm )=+[] ( ) ( ) a • need to estimate probability density and power spectrum from =+=Ex[]aa()( nTx nTmT )φ a ( mT ) speech waveforms ∞ – probability density estimated from long term histogram of amplitudes Φ= (emejTΩ−Ω )∑ φ ( ) jTm m=−∞ – good approximation is a gamma distribution, of the form: ∞ 12πk 12/ 3 x =ΦΩ+∑ a ( ) − TT ⎡⎤3 2σ k =−∞ px()= ⎢⎥ ex p ()0 =∞ • since φ (m ) is a sampled version of φτ ( ), its transform is an infinite sum a ⎣⎦⎢⎥8πσ x ||x 2πk of the power spectrum, shifted by => aliased version of the power – simpler approximation is Laplacian density, of the form: T 2||x spectrum of the original analog signal 11− px()== eσ x p ()0 22σ xxσ 13 14

LongLong--TimeTime Autocorrelation and Power Measured Speech Densities Spectrum • analog signal • distribution normalized so ∞ φτ()=+⇔ΦΩ=Ex () txt ( τ ) ( ) φτ () e−Ωj τ d τ mean is 0 and variance is 1 aaaaa{}∫ −∞ (x=0, σx=1) • discrete-time signal • gamma density more φφ ()m=+=+= E xnxn ()( m ) E x ( nTx )( nT mT ) ( mT ) closely approximates {}{}aa a ∞∞ measured distribution for jTΩΩΩ − jTΩ m 12πk ⇔ Φ=()eme∑∑φ () =ΦΩ+a ( ) speech than Laplacian mk=−∞TT =−∞ • estimated correlation and power spectrum • Laplacian is still a good Estimates ≠ Expectations model and is used in 1 Lm−−1 φˆ (mxnxnmmL )=+≤≤−∑ ( ) ( ),01 | | , • tapering with m due to analytical studies L n=0 finite windows • small amplitudes much L is large integer • need window on estimated more likely than large M correlation for smoothing Φ=ˆ ()ewmmejTΩ−Ω∑ ()()φˆ jTm because of discontinuity at amplitudes—by 100:1 ratio mM=− ±M

15 16

Speech AC and Power Spectrum Speech Power Spectrum

• can estimate long term Lowpass filtered speech autocorrelation and power spectrum using time-series analysis methods • power spectrum estimate derived from one minute of speech

Bandpass filtered speech φˆ[]m • peaks at 250-500 Hz −1 ≤ ρˆ[]m = (region of maximal spectral φˆ[]0 information) Lm−−1 1 • spectrum falls at about 8- xnxn()(+ m ) • 8 kHz sampled speech for several L ∑ 10 db/octave =≤n=0 1, speakers 1 Lm−−1 • computed from set of xn()2 • high correlation between adjacent ∑ bandpass filters L n=0 samples 01≤≤−mL, L is window length • lowpass speech more highly correlated than bandpass speech

17 18

3 Alternative Power Spectrum Estimating Power Spectrum via Estimate Method of Averaged Periodograms

• estimate long term correlation, φˆ (m ), i Periodogram defined as: 2 using sampled speech, then 1 L−1 Pe (jjnωω )= xnwne ( ) ( ) − weight and transform, giving: ∑ LU n=0 M Φ=ˆ (ejkN22ππ// )∑ w ( m )φˆ ( m ) e− jkmN i where wn [ ] is an L -point window (e.g., Hamming window), and m=−M L−1 1 2 • this lets us use φˆ (m ) to get a power Uwn= ∑()() L n=0 spectrum estimate Φˆ (e jkN2π / ) i U is a normalizing constant that compensates for window tapering via the weighting window, wm ( ) i use the DFT to compute the periodogram as: 2 1 L−1 Pe (jkN()/22ππ )=≤≤− xnwne ( ) ( )− j (/) Nkn ,01 k N , Contrast linear versus ∑ LU n=0 logarithmic scale for power spectrum plots N is size of DFT 19 20

88,00088,000--PointPoint FFT – Female Averaging (Short) Periodograms Speaker i variability of spectral estimates can be reduced by averaging short periodograms, computed over a long segment of speech i using an L -point window, a short periodogram is the same as the STFT of the weighted speech interval, namely: rR+− L 1 jkN2/ππ− jkmN 2 / XkrrR[]== X ( e )∑ xmwrRme [ ][ − ] ,0 ≤≤− k N 1 mrR= i where N is the DFT transform size (number of frequency estimates)

i Consider using NS samples of speech (where NS is large); the averaged periodogram is defined as:

K −1 2 jkN2/ππ1 jkN 2/ Φ=()eXekN∑ rR (),01 ≤≤− KLU r=0

i where KN is the number of windowed segments in S samples, LU is the window length, and is the window normalization factor i use RL= /2 (shift window by half the window duration between adjacent periodogram estimates) 21 Single spectral estimate using all signal samples 22

Long Time Average Spectrum Long Time Average Spectrum Female speaker

Male speaker

23 24

4 Instantaneous Quantization Quantization and Coding Coding is a two-stage process 1. quantization process: • separating the processes of sampling and xn()→ xnˆ () quantization 2. encoding process: • assume x(n) obtained by sampling a xnˆ()→ cn () bandlimited signal at a rate at or above the - where Δ is the (assumed fixed) Nyquist rate quantization step size • assume x(n) is known to infinite precision in Decoding is a single-stage process 1. decoding process: amplitude cn′′()→ xnˆ () • need to quantize x(n) in some suitable •=if cn′ ( ) cn ( ), (no errors in manner transmission) then xnˆˆ′ ( )= xn ( ) assume Δ’=Δ xnˆ′( )≠⇒ xn ( ) coding and 25 quantization loses information 26

BB--bitbit Quantization Quantization Basics

•use B-bit binary numbers to represent the quantized samples => 2B quantization levels • assume |x(n)| ≤ Xmax (possibly ∞) • Information Rate of Coder: I=B FS= total bit rate in bits/second – for Laplacian density (where Xmax = ∞), – B=16, FS= 8 kHz => I=128 Kbps can show that 0.35% of the samples fall – B=8, F = 8 kHz => I=64 Kbps S outside the range -4σ ≤ x(n) ≤ 4σ => – B=4, FS= 8 kHz => I=32 Kbps x x • goal of waveform coding is to get the highest quality at a large quantization errors for 0.35% of fixed value of I (Kbps), or equivalently to get the lowest the samples value of I for a fixed quality

•since FS is fixed, need most efficient quantization – can safely assume that Xmax is methods to minimize I proportional to σx

27 28

Quantization Process Uniform Quantization 3-bit quantizer => 8 levels • quantization => dividing amplitude range into a finite set of ranges, • choice of quantization ranges and levels so that and assigning the same bin to all samples in a given range signal can easily be processed digitally

xxnx122<≤⇒() xˆ (101 )

xxnx233<≤⇒() xˆ (110 )

xxn34<<∞⇒() xˆ (111 ) xxnx−−101<≤=⇒()0011 xˆ ( )

xxnx−−−212<≤⇒() xˆ (010 )

xxnx−−−323<≤⇒() xˆ (001 ) • codewords are arbitrary!! => xxii−=Δ−1 −∞

5 MidMid--RiserRiser and MidMid--TreadTread Quantizers AA--toto--DD and D-D-toto--AA Conversion xˆ [n] • mid-riser x[n] –origin (x=0) in middle of rising part of the staircase – same number of positive and negative levels – symmetrical around origin • mid-tread –origin (x=0) in middle of quantization level – one more negative level than positive – one quantization level of 0 (where a lot of activity occurs) • code words have direct numerical significance (sign-magnitude representation for mid-riser, two’s complement for mid-tread) - for mid-riser quantizer: Δ xnˆ()=+Δ signcn[] () cn () 2 en[]=− xnˆ [] xn [] where signcn[] ( )=+10 if first bit of cn ( ) = Quantization error =−11 if first bit of cn ( ) = - for mid-tread quantizer code words are 3-bit two's complement representation, giving xˆ B[n] 31 32 xnˆ ( )=Δ cn ( )

Quantization of a Sine Wave Quantization of Complex Signal Unquantized x[nn ]= sin(0.1 )+ 0.3cos(0.3 n ) sinewave (a) Unquantized signal 3-bit quantization

waveform (b) 3-bit quantized x[n] 3-bit Δ quantization error (c) 3-bit quantization error, e[n] Δ 8-bit quantization (d) 8-bit quantization error error, e[n] 33 34

Uniform Quantizers Quantization Noise Model 1. quantization noise is a zero-mean, stationary • Uniform Quantizers characterized by: – number of levels—2B (B bits) white noise process – quantization step size-Δ 2 Eenen[ ()(+= m )] σ e , m =0 •if |x(n)|≤Xmax and x(n) is a symmetric density, then B = 0 otherwise Δ2 = 2 Xmax B Δ= 2 Xmax/ 2 2. quantization noise is uncorrelated with the •if we let input signal xnˆ()=+ xn () en () Exnen[ ()(+ m )] =∀0 m •with x(n) the unquantized speech sample, and e(n) the quantization 3. distribution of quantization errors is uniform error (noise), then over each quantization interval 2 ΔΔ(except for last quantization 2 Δ pee()=Δ−Δ≤≤Δ122 / / e / ⇒=e 0, σ = −≤en() ≤ level which can exceed Xmax e 22and thus the error can exceed = 0 otherwise 12 Δ/2) 35 36

6 Quantization Examples Typical Amplitude Distributions

Laplacian distribution is often used as a model for speech signals

Speech Signal 3-bit Quantization Error A 3-bit how good is the quantized model of the signal has only previous slide? 8 different values

8-bit Quantization 37 38 Error (scaled)

33--BitBit Speech Quantization 33--BitBit Speech Quantization Error

Input to Input to quantizer quantizer

3-bit 3-bit quantization quantization waveform error

39 40

55--BitBit Speech Quantization 55--BitBit Speech Quantization Error

Input to Input to quantizer quantizer

5-bit 5-bit quantization quantization waveform error

41 42

7 Spectra of Quantization Noise Histograms of Quantization Noise

3-bit quantization histogram

≈ 12 dB

8-bit quantization histogram

43 44

Sound Demo Correlation of Quantization Error • “Original” speech sampled at 16kHz, 16 bits/sample

• Quantized to 10 bits/sample

• Quantization error (x32) for 10 bits/sample 3-bit Quantization

• Quantized to 5 bits/sample

• Quantization error for 5 bits/sample 8-bit Quantization assumptions look good at 8-bit quantization, not as good at 3-bit levels (however only 6 db variation in power spectrum level) 45 46

SNR for Quantization SNR for Quantization • can determine SNR for quantized speech as • can determine SNR for quantized speech as

2 2 xn() 2 2 Ex() n ∑ 2 xn() σ xn() 2 Ex() n ∑ SNR == = σ xn() 222 SNR == = σ e Ee()() n ∑en() 222 n σ e Ee()() n ∑en() n 2 2 2 Δ Xmax σ e == 2B 2Xmax 12() 3 2 Δ= B (uniform quantizer step size) 2 ()322B ⎡⎤σ 2 ⎡ X ⎤ SNR===+− ; SNR ( dB )10 logx 6 B 4 . 77 20 log max 1 ΔΔ 2 10⎢⎥2 10 ⎢ ⎥ ⎡⎤ σσex •=−≤≤assume pe ( ) e (uniform distribution) Xmax ⎣⎦ ⎣ ⎦ Δ 22 ⎢⎥ ⎣⎦σ x = 0 otherwise •==−if we choose XSNRBmax 4672σ x , then . 2 2 2 Δ Xmax BS= 16 , NR= 88. 8 dB σ e == 2B The term X /σ tells how 12() 3 2 BSNRdB==8408 , . max x big a signal can be BSNRdB==3108 , . accurately represented 47 48

8 Variation of SNR with Signal Level Clipping Statistics

Overload SNR improves 6 dB/bit, from but it decreases 6 dB clipping for each halving of the input signal amplitude

12 dB

49 50

ReviewReview----LinearLinear Noise Model Review of Quantization Assumptions 1. input signal fluctuates in a complicated manner so a statistical model is valid 22 Exn{( [ ]) } = σ x 2. quantization step size is small enough to • assume speech is stationary random signal. remove any signal correlated patterns in quantization error • error is uncorrelated with the input. E{[][]} xnen== E {[]}{[]} xn E en 0 3. range of quantizer matches peak-to-peak range of signal, utilizing full quantizer • error is uniformly distributed over the interval range with essentially no clipping −Δ(/)22

Uniform Quantizer SNR Issues Instantaneous Companding • to order to get constant percentage error (rather than constant SNR=6B-7.2 variance error), need logarithmically spaced quantization levels • to get an SNR of at least 30 dB, need at least B≥6 bits – quantize logarithm of input signal rather than input signal itself (assuming Xmax=4 σx) – this assumption is weak across speakers and different transmission environments since σx varies so much (order of 40 dB) across sounds , speakers , and input conditions – SNR(dB) predictions can be off by significant amounts if full quantizer range is not used; e.g., for unvoiced segments => need more than 6 bits for real communication systems, more like 11-12 bits – need a quantizing system where the SNR is independent of the signal level => constant percentage error rather than constant variance error => need non-uniform quantization

53 54

9 μ--LawLaw Companding μ--LawLaw Companding

x[n] μ−Law y[n] yˆ[ n] • assume that ε (nxn ) is independent of log | ( ) |. The inverse is Compander Quantizer x(n)ˆˆ=⋅ exp⎣⎦⎡⎤ y(n) sign[] x ( n ) =⋅|()|xn signxn[][] ()exp()ε n yn ( )= ln | xn ( ) | =⋅xn()exp()[]ε n xn()=⋅ex pp()[yyg ()nsi] gnxn[ ()] •≈++ assume εεε (nnn ) is small, then exp[ ( )] 1 ( ) ... x(n)ˆ =+ x ( n )[]1 εε()n=+ xn () ()() nxn =+ xn () fn () •=+≥ where signxn[] ()10 xn () • since we assume xn ( ) and ε ( n ) are independent, then =−10xn() < 222 σσσfx=⋅ε the quantized log magnitude is σ 2 1 • SNR ==x σσ22 ˆ f ε y(n)= Q[] log | x(n) | 2 • SNR is independent of σ x --it depends only on step size =+log |xn ( ) |ε ( n ) new error signal 55 56

PseudoPseudo--LogarithmicLogarithmic Compression μ--lawlaw Compression

• unfortunately true logarithmic compression is not practical, since the dynamic y(n)= F[ x(n)] range (ratio between the largest and smallest values) is infinite => need an ⎡⎤|()|xn infinite number of quantization levels log⎢⎥1+ μ ⎣⎦Xmax • need an approximation to logarithmic compression => μ-law/A-law =⋅Xsignx(n)max [] compression log(1+ μ )

• when xn()=⇒00 yn () = •==⇒ when μ 0 ,yn ( ) xn ( ) linear compression • when μ is large, and for large |xn ( ) |

Xmax ⎡⎤μ |()|xn |yn ( ) |≈⋅ log⎢⎥ logμ ⎣⎦Xmax

57 58

Histogram for μ --LawLaw Companding μ--lawlaw approximation to log

•− μ law encoding gives a good approximation to constant percentage error (|yn ( ) |∝ log | xn ( ) |) Speech waveform

Output of μ-Law compander

59 60

10 SNR for μ--lawlaw Quantizer μ--LawLaw Companding

2 ⎡⎤⎛⎞XX ⎛⎞ max max Mu-law SNR() dB=+6 B 4 . 77 − 20 logln()10[] 1 +−μ 10 log 10 ⎢⎥ 1 +⎜⎟ + 2 ⎜⎟ μσ μσ ⎣⎦⎢⎥⎝⎠xx ⎝⎠ compressed signal utilizes almost the •⇒ 6BB dependence on good full dynamic range X (±1) much more •⇒ much less dependence on max good σ effectively than the x original speech X •⇒ for large μ ,SNR is less sensitive to changes in max good signal σ x -μ -law quantizer used in wireline telephony for more than 40 years

61 62

Comparison of Linear and μ--lawlaw Quantizers μ--LawLaw Quantization Error Dashed line – (a) Input speech linear (uniform) signal quantizers with 6, 7, 8 and 12 bit quantizers

Solid line – μ- (b) Histogram of law quantizers μ=255 with 6, 7 and 8 bit compressed quantizers speech (μ=100)

(c) Histogram of 8- bit companded • can see in these plots that Xmax characterizes the quantizer (it quantization specifies the ‘overload’ amplitude), and σx characterizes the signal (it error specifies the signal amplitude), with the ratio (Xmax/σx) showing how 63 the signal is matched to the quantizer 64

Comparison of Linear and μ--lawlaw Analysis of μ--LawLaw Performance Quantizers • curves show that μ-law quantization can maintain roughly the same SNR over a wide range of Dashed line – Xmax/σx, for reasonably large values of μ linear (uniform) –for μ=100, SNR stays within 2 dB for quantizers with 6, 8< Xmax/σx<30 7, 8 and 13 bit – for μ=500, SNR stays within 2 dB for quantizers 8< Xmax/σx<150 – loss in SNR in going from μ=100 to μ=500 is about 2.6 Solid line – μ- dB => rather small sacrifice for much greater dynamic law quantizers range with 6, 7 and 8 bit – B=7 gives SNR=34 dB for μ=100 => this is 7-bit μ-law quantizers PCM-the standard for toll quality speech coding in the (μ=255) PSTN => would need about 11 bits to achieve this dynamic range and SNR using a linear quantizer

65 66

11 CCITT G.711 Standard Summary of Uniform and μ--LawLaw PCM

• quantization of sample values is unavoidable in DSP • Mu-law characteristic is approximated by applications and in digital transmission and storage of 15 linear segments with uniform speech • we can analyze quantization error using a random noise quantization within a segment. model – uses a mid -tread quantizer. +0 and –0are0 are • the more bits in the number representation, the lower the noise. The ‘fundamental theorem’ of uniform defined. quantization is that “the signal-to-noise ratio increases 6 – decision and reconstruction levels defined to dB with each added bit in the quantizer”; however, if the signal level decreases while keeping the quantizer step- be integers size the same, it is equivalent to throwing away one bit for each halving of the input signal amplitude • A-law characteristic is approximated by 13 • μ-law compression can maintain constant SNR over a linear segments wide dynamic range, thereby reducing the dependency on signal level remaining constant – uses a mid-riser quantizer 67 68

Quantization for Optimum SNR Quantization for MMSE (MMSE) • if we know the size of the signal (i.e., we know 2 • goal is to match quantizer to actual signal the signal variance, σx ) then we can design a density to achieve optimum SNR quantizer to minimize the mean-squared quantization error. – μ-law tries to achieve constant SNR over wide – the basic idea is to quantize the most probable range of signal variances => some sacrifice samples w ith low error an d leas t pro ba ble w ith hig her over SNR performance when quantizer step error. size is matched to signal variance – this would maximize the SNR –if σ is known, you can choose quantizer – general quantizer is defined by defining M x reconstruction levels and a set of M decision levels levels to minimize quantization error variance defining the quantization “slots”. and maximize SNR – this problem was first studied by J. Max, “Quantizing for Minimum Distortion,” IRE Trans. Info. Theory, 69 March,1960. 70

Quantizer Levels for Maximum Optimum Quantization Levels SNR • variance of quantization noise is:

2 σ 22==−Ee⎡⎤ ( n ) E⎡⎤ xnˆ ( ) xn ( ) •=− by definition, exxˆ ; thus we can make a change of variables to give: e ⎣⎦⎣⎦() pe ( )=−= px (ˆˆ x ) p ( xx / ) = px ( ) •= with xnˆ ( ) Qxn[] ( ) . Assume M quantization levels eexx/ ˆ x ⎡⎤xxˆˆ , ,..., xxx ˆˆˆ , ,..., • giving ⎣⎦−−+−(/)MM221112 (/) (/) M M / 2 xi • associating quantization level with signal intervals as: σ 22=− (xxpxdxˆ ) ( ) eix∑ ∫ iM=−/ 21 + xxxˆ quantization level for interval ⎡⎤ , xi −1 jjj= ⎣⎦−1 •∞ for symmetric, zero-mean distributions, with large amplitudes ( ) •=− assuming px ( ) p ( x ) so that the optimum quantizer is antisymmetric, then

it makes sense to define the boundary points: xxˆˆii=−−− and xx ii =−

xx02==±∞0 (central boundary point), ±M / • thus we can write the error variance as

• the error variance is thus M / 2 xi 22 ∞ σ eix=−2 (xxpxdxˆ ) ( ) 22 ∑ ∫ σ = ep ( ede ) i =1 x ee∫ i −1 −∞ 2 • goal is to minimize σ e through choices of {xxii } and {ˆ }

71 72

12 Optimum Quantizers for Laplace Density Solution for Optimum Levels

• to solve for optimum values for {xxˆ } and { }, we differentiate σ 2 wrt ii e Assumes the parameters, set the derivative to 0, and solve numerically:

xi σ x = 1 (xxpxdxˆ −== )2 ( )0122 i , ,..., M / ( 1 ) ∫ ix xi −1 1 x = (xxˆˆ+=−) i12 , ,..., M / 2 1( 2) i 2 ii+1

•==±∞ with boundary conditions of xx0203 , ±M / () • can also constrain quantizer to be uniform and solve for value of Δ that maximizes SNR - optimum boundary points lie halfway between M /2 quantizer levels

- optimum location of quantization level xˆi is at the centroid of the probability

density over the interval xxii−1 to

- solve the above set of equations (1,2,3) iteratively for {}xxˆii ,{} M. D. Paez and T. H. Glisson, “Minimum Mean-Squared Error Quantization 73 In Speech,” IEEE Trans. Comm., April 1972. 74

Optimum Quantizer for 33--bitbit Laplace Density; Performance of Optimum Quantizers Uniform Case

quantization levels get further apart as step size decreases roughly exponentially 75 76 the probability density decreases with increasing number of bits

Summary • examined a statistical model of speech showing probability densities, autocorrelations, and power spectra • studied instantaneous uniform quantization model and derived SNR as a function of the number of bits in the quantizer, and the ratio of signal peak to signal variance • studied a companded model of speech that approxitdlithiimated logarithmic compress ion an dhdthtd showed that the resulting SNR was a weaker function of the ratio of signal peak to signal variance • examined a model for deriving quantization levels that were optimum in the sense of matching the quantizer to the actual signal density, thereby achieving optimum SNR for a given number of bits in the quantizer