Speech Coding Techniques (I)

Speech Coding Techniques (I) Introduction to Quantization Scalar quantization Uniform quantization Nonuniform quantization Waveform-based coding Pulse Coded Modulation (PCM) Differential PCM (DPCM) Adaptive DPCM (ADPCM) Model-based coding Channel vocoder Analysis-by-Synthesis techniques Harmonic vocoder Origin of Speech Coding ªWatson, if I can get a mechanism which will make a current of electricity vary its intensity as the air varies in density when sound is passing through it, I can telegraph any sound, even the sound of speech.º -A. G. Bell'1875 analog communication N Entropy formula H X =−∑ pi log2 pi (bits/sample) i=1 or bps of a discrete source digital communication -C. E. Shannon'1948 Digitization of Speech Signals x(n)=xc(nT) x(n) xc(t) Sampler Quantizer continuous-time discrete sequence speech signal speech samples Sampling Sampling Theorem when sampling a signal (e.g., converting from an analog signal to digital), the sampling frequency must be greater than twice the bandwidth of the input signal in order to be able to reconstruct the original perfectly from the sampled version. Sampling frequency: >8K samples/second (human speech is roughly bandlimited at 4KHz) Quantization In Physics To limit the possible values of a magnitude or quantity to a discrete set of values by quantum mechanical rules In speech coding To limit the possible values of a speech sample or prediction residue to a discrete set of values by information theoretic rules (tradeoff between Rate and Distortion) Quantization Examples Examples Continuous to discrete a quarter of milk, two gallons of gas, normal temperature is 98.6F, my height is 5 foot 9 inches Discrete to discrete Round your tax return to integers The mileage of my car is about 60K. Play with bits Precision is finite: the more precise, the more bits you need (to resolve the uncertainty) Keep a card in secret and ask your partner to guess. He/she can only ask Yes/No questions: is it bigger than 7? Is it less than 4? ... However, not every bit has the same impact How much did you pay for your car? (two thousands vs. $2016.78) Scalar vs. Vector Quantization Scalar: for a given sequence of speech samples, we will process (quantize) each sample independently Input: N samples → output: N codewords Vector: we will process (quantize) a block of speech samples each time Input: N samples → output: N/d codewords (block size is d) SQ is a special case of VQ (d=1) Scalar Quantization In SQ, quantizing N samples is not fundamentally different from Quantizing one sample (since they are processed independently) original quantization quantized value index value ^ x f s∈S f -1 x Quantizer ^ x Q x A quantizer is defined by codebook (collection of codewords) and mapping function (straightforward in the case of SQ) Rate-Distortion Tradeoff Rate: How many Distortion codewords (bits) are used? Example: 16-bit audio vs. 8-bit PCM speech Distortion: How much distortion is introduced? SQ Example: mean absolute VQ difference(L1), mean square error (L2) Rate (bps) Question: which quantizer is better? Uniform Quantization Uniform Quantization A scalar quantization is called uniform quantization (UQ) if all its codewords are uniformly distributed (equally-distanced) Example (quantization stepsize ∆=16) ∆ ∆ ∆ ∆ 8 24 40 ... 248 Uniform Distribution denoted by U[-A,A] f(x) 1 1/2A x∈[−A , A ] f x= 2A x { 0 else -A A 6dB/Bit Rule f(e) 1/ ∆ e - ∆/2 ∆/2 Note: Quantization noise of UQ on uniform distribution is also uniformly distributed For a uniform source, adding one bit/sample can reduce MSE or increase SNR by 6dB (The derivation of this 6dB/bit rule will be given in the class) Nonuniform Quantization Motivation Speech signals have the characteristic that small-amplitude samples occur more frequently than large-amplitude ones Human auditory system exhibits a logarithmic sensitivity More sensitive at small-amplitude range (e.g., 0 might sound different from 0.1) Less sensitive at large-amplitude range (e.g., 0.7 might not sound different much from 0.8) histogram of typical speech signals From Uniform to Non-uniform F: nonlinear compressing function F-1: nonlinear expanding function F and F-1: nonlinear compander y ^ y -1 ^ x F Q F x Example F: y=log(x) F-1: x=exp(x) We will study nonuniform quantization by PCM example next Speech Coding Techniques (I) Introduction to Quantization Scalar quantization Uniform quantization Nonuniform quantization Waveform-based coding Pulse Coded Modulation (PCM) Differential PCM (DPCM) Adaptive DPCM (ADPCM) Model-based coding Channel vocoder Analysis-by-Synthesis techniques Harmonic vocoder Pulse Code Modulation Basic idea: assign smaller quantization stepsize for small-amplitude regions and larger quantization stepsize for large-amplitude regions Two types of nonlinear compressing functions Mu-law adopted by North American telecommunications systems A-law adopted by European telecommunications systems Mu-Law (µ-law) Mu-Law Examples y x A-Law A-Law Examples y x Comparison y x PCM Speech Mu-law(A-law) compresses the signal to 8 bits/sample or 64Kbits/second (without compandor, we would need 12bits/sample) A Look Inside WAV Format MATLAB function [x,fs]=wavread(filename) Change the Gear Strictly speaking, PCM is merely digitization of speech signals ± no coding (compression) at all By speech coding, we refer to representing speech signals at the bit rate of <64Kbps To understand how speech coding techniques work, I will cover some basics of data compression Data Compression Basics Discrete source Information=uncertainty Quantification of uncertainty Source entropy Variable length codes Motivation Prefix condition Huffman coding algorithm Data compression = source modeling Shannon's Picture on Communication (1948) channel channel source channel destination encoder decoder source source encoder decoder The goal of communication is to move information from here to there and from now to then Examples of source: Human speeches, photos, text messages, computer programs ¼ Examples of channel: storage media, telephone lines, wireless network ¼ Information What do we mean by information? ªA numerical measure of the uncertainty of an experimental outcomeº ± Webster Dictionary How to quantitatively measure and represent information? Shannon proposes a probabilistic approach How to achieve the goal of compression? Represent different events by codewords with varying code-lengths Information = Uncertainty Zero information WVU lost to FSU in Gator Bowl 2005 (past news, no uncertainty) Yao Ming plays for Houston Rocket (celebrity fact, no uncertainty) Little information It is very cold in Chicago in winter time (not much uncertainty since it is known to most people) Dozens of hurricanes form in Atlantic ocean every year (not much uncertainty since it is pretty much predictable) Large information Hurricane xxx is going to hit Houston (since Katrina, we all know how difficult it is to predict the trajectory of hurricanes) There will be an earthquake in LA around X'mas (are you sure? an unlikely event) Quantifying Uncertainty of an Event Self-information Ip=−log2 p p - probability of the event x (e.g., x can be X=H or X=T) p Ip notes 1 0 must happen (no uncertainty) 0 ∞ unlikely to happen (infinite amount of uncertainty) Intuitively, I(p) measures the amount of uncertainty with event x Discrete Source A discrete source is characterized by a discrete random variable X Examples Coin flipping: P(X=H)=P(X=T)=1/2 Dice tossing: P(X=k)=1/6, k=1-6 Playing-card drawing: P(X=S)=P(X=H)=P(X=D)=P(X=C)=1/4 How to quantify the uncertainty of a discrete source? Weighted Self-information p Ip Iw p=p⋅Ip ∞ 0 0 1/2 1 1/2 1 0 0 As p evolves from 0 to 1, weighted self-information Iw p=−p⋅log2 p first increases and then decreases Question: Which value of p maximizes Iw(p)? Maximum of Weighted Self-information p=1/e 1 I p= w e ln2 Uncertainty of a Discrete Source • A discrete source (random variable) is a collection (set) of individual events whose probabilities sum to 1 X is a discrete random variable x∈{1,2,...,N} N p =probx=i ,i=1,2,...,N i ∑ pi=1 i=1 • To quantify the uncertainty of a discrete source, we simply take the summation of weighted self- information over the whole set Shannon's Source Entropy Formula N H X =∑ Iw pi i=1 N (bits/sample) H X =−∑ pi log2 pi i=1 or bps Source Entropy Examples • Example 1: (binary Bernoulli source) Flipping a coin with probability of head being p (0<p<1) p=probx=0 ,q=1−p=probx=1 H X =− plog2 pq log2q Check the two extreme cases: As p goes to zero, H(X) goes to 0 bps → compression gains the most As p goes to a half, H(X) goes to 1 bps → no compression can help Entropy of Binary Bernoulli Source Source Entropy Examples N • Example 2: (4-way random walk) 1 1 W E prob x=S = ,prob x=N= 2 4 1 prob x=E =prob x=W = S 8 1 1 1 1 1 1 1 1 H X =− log log log log =1.75bps 2 2 2 4 2 4 8 2 8 8 2 8 Source Entropy Examples (Con't) • Example 3: (source with geometric distribution) A jar contains the same number of balls with two different colors: blue and red. Each time a ball is randomly picked out from the jar and then put back. Consider the event that at the k-th picking, it is the first time to see a red ball ± what is the probability of such event? 1 1 p=probx=red= ,1−p=prob x=blue= 2 2 Prob(event)=Prob(blue in the first k-1 picks)Prob(red in the k-th pick ) =(1/2)k-1(1/2)=(1/2)k Morse Code (1838) A B C D E F G H I J K L M .08 .01 .03 .04 .12 .02 .02 .06 .07 .00 .01 .04 .02 .- -¼ -.-.

Load more