Lecture 16 – Perceptual Audio Coding

EECS 225D Audio Signal Processing in Humans and Machines Lecture 16 – Perceptual Audio Coding 2012-3-14 Professor Nelson Morgan today’s lecture by John Lazzaro www.icsi.berkeley.edu/eecs225d/spr12/ “Hero” sound Play EECS 225D L16: Perceptual Audio Coding UC Regents Spring 2012 © UCB Today’s lecture: Audio Coding Compression: Lossless and Lossy Quantization and Noise Psychoacoustic Masking Time-Frequency Tradeoffs Research Topics EECS 225D L16: Perceptual Audio Coding UC Regents Spring 2012 © UCB OS X System Sound: Hero.aiff 1 second of 44.1 kHz, 16-bit, stereo audio 186, 450 byte disk file, 1.4112 Mb/s b = bit, B = 8 bit bytes Play EECS 225D L16: Perceptual Audio Coding UC Regents Spring 2012 © UCB How well does gzip work on audio files? File size reduced by a factor of 1.56 (measured in units of 4KB disk blocks) “Lossless” Lossless algorithms remove compression. redundant bits -- bits that are not (decompression needed to exactly reconstruct is bit-accurate). the original file. “Shorten”: Redundancy removal can be Tony Robinson, improved if the algorithm can be Cambridge, 1992. specialized for audio waveforms. EECS 225D L16: Perceptual Audio Coding UC Regents Spring 2012 © UCB Apple Lossless (after shorten, FLAC, ...) File size reduced by a factor of 3 . 1 (double the performance of gzip on the same file) Lossless, just like gzip. To reduce file size by larger factors, we need to go beyond removing redundancy. One approach: Remove information that is irrelevant information for a particular use case. Example: Remove audio information whose loss a human listener cannot perceive. EECS 225D L16: Perceptual Audio Coding Lossy perceptual audio coding. UC Regents Spring 2012 © UCB MPEG 4 Advanced Audio Codec (AAC) Request 128 kb/s Encoder adjusts quality to meet request File size reduced by a factor of 9.4 Lossy: Decompression does not restore original file. Today’s Lecture How it works Listening Test: Original: Play 128 kb/s (9.4X): Play 16 kb/s (23.5X): Play EECS 225D L16: Perceptual Audio Coding UC Regents Spring 2012 © UCB Today’s lecture: Audio Coding Compression: Lossless and Lossy Quantization and Noise Psychoacoustic Masking Time-Frequency Tradeoffs Research Topics EECS 225D L16: Perceptual Audio Coding UC Regents Spring 2012 © UCB Quantization, noise, and compression ... To compress a real-valued discrete-time waveform, quantize the samples to reduce bits/sample, and then apply lossless compression. s(t) Play s(t) + e(t) Play Quantization corrupts the signal s(t) with noise term e(t). In this example, quantizing to 1 bit is clearly objectionable. However, a 40 dB reduction in e(t) yield a better result. Quantizing with more bits s(t) + 0.01*e(t) Play acts to reduce e(t). s(t) + 0.1*e(t) Play EECS 225D L16: Perceptual Audio Coding UC Regents Spring 2012 © UCB 540 CHAPTER 35 PERCEPTUAL AUDIO CODING 540 CHAPTERWhich 35 PERCEPTUAL leads AUDIO CODING to this architecture ... Encoder Bandpass Downsample Quantize Subband 1 quantized Filter bank splits Encoder f M Q[•] samples Bandpass Downsample Quantize Subband 1 Input quantized audio input into Analysisf M (M subbands)Q[•] filters samples M sub-bands. Input Analysis (M subbands) Subband M filters MQ[•]quantized f samples Quantize to Subband M quantized f MQ[•] minimize the Psychoacoustic samples model Minimum masking number of bits Psychoacoustic thresholds needed across model Minimum masking thresholds Decoder all M channels. Subband 1 quantized Q-1[•] M samplesDecoder f Subband 1 Constraint: quantized Q-1[•] M Output Dequantize Upsample Reconstructionf + Human samples filters Output imperceptibility Subband M Dequantize Upsample Reconstruction + quantized Q-1[•] M filters samples f of the encode -> Subband M quantized Q-1[•] M decode process. FIGURE 35.7 Structure of a subband coder andf corresponding decoder. The input signal samplesis dividedEECS into 225DM bandlimited L16: Perceptual Audio signals, Coding each accounting for 1/Mth of the total bandwidth,UC Regents Spring 2012 © UCB FIGUREwhich are 35.7 thenStructure downsampled of a subband by a factor coder of andM (critically corresponding sampled), decoder. quantized The input based signal on the is dividedpsychoacoustic into M bandlimited model’s estimate signals, of each the tolerable accounting quantization for 1/Mth of noise the totalat that bandwidth, moment in that whichband, are and then transmitted. downsampled At the by decoding a factor of end,M (critically the dequantized sampled), samples quantized are interpolated based on the up psychoacousticto full sampling model’s rate, again estimate bandpass-filtered of the tolerable to quantization recover the appropriate noise at that frequencies, moment in that then band,summed and transmitted. to reconstruct At the the decoding full-bandwidth end, signal.the dequantized samples are interpolated up to full sampling rate, again bandpass-filtered to recover the appropriate frequencies, then summed to reconstruct the full-bandwidth signal. full band, and the effort to approach such ideal “brick-wall” filters would result in time- domain properties that were increasingly problematic – long ringing that would compromise fullthe band, time and localization the effort we to approach wish to achieve.such ideal Practical “brick-wall” filters filters with would finite-duration result in impulse time- domainresponses properties will exhibit that were a finite increasingly transition problematic region between – long the ringing passband that would and stopband, compromise and if thethe time passbands localization are set we up wish to fully to achieve. capture all Practical the original filters signal, with finite-durationthis transition region impulse will responsesend up straying will exhibit beyond a finite the transition1/Mth of the region spectrum between associated the passband with the and particular stopband, subband, and if theas passbands illustrated are in Figureset up to 35.8. fully Decimation, capture all however, the original will signal, alias (fold) this transition any signal region components will endoutside up straying the principal beyond subband the 1/Mth backof the into spectrum that band, associated leading withto a particularly the particular nasty subband, kind of asdistortion. illustrated in Figure 35.8. Decimation, however, will alias (fold) any signal components outside theThis principal problem subband is solved back through into alias that band, cancellation leading [12]. to a Although particularly a single nasty decimated kind of distortion.subband will introduce alias terms resulting from energy just beyond the ideal edge of theThis band, problem the corresponding is solved through imperfections alias cancellation in the reconstruction [12]. Although filters a single in the decoderdecimated will subband will introduce alias terms resulting from energy just beyond the ideal edge of the band, the corresponding imperfections in the reconstruction filters in the decoder will 540 CHAPTER 35 PERCEPTUAL AUDIO CODING Encoder Bandpass Downsample Quantize Subband 1 quantized f M Q[•] samples Input Analysis (M subbands) filters Subband M MQ[•]quantized f samples 540 CHAPTER 35 PERCEPTUAL AUDIO CODING Psychoacoustic model Minimum masking Quantization noise in a sub-band ... thresholds 538 CHAPTER 35 PERCEPTUAL AUDIO CODING Encoder Decoder Bandpass Downsample QuantizeSubband 1 Subband 1 quantized quantizedQ-1[•] M f M samplesQ[•] samples f Input Output Analysis (M subbands) DequantizeA tone Upsample in the sub-band,Reconstruction + filters filters Masking tone scaled to fully use B Subband M Subband M MQ[•]quantized quantizedQ-1quantize[•] M bits. f samples samples f The noise floor level FIGURE 35.7 StructureMasked of a subband coder and corresponding decoder. The input signal is 6*B PsychoacousticdB is divided into M bandlimitedthreshold signals, each accounting for 1/Mth of the total bandwidth, model Minimumwhich masking are then downsampled by a factor of M (critically sampled), quantized based on the thresholds below the tone. SNR ~ 6·B psychoacoustic model’s estimate of the tolerable quantization noise at that moment in that band, and transmitted. At the decodingSafe noise end, the level dequantized samples are interpolated up Decoder to full sampling rate, again bandpass-filtered to recover the appropriate frequencies, then Subband 1 summed to reconstruct the full-bandwidth signal. quantized Q-1[•] M freq samples Quantization noise f Output Dequantize Upsample ReconstructionSubband + N filtersfull band, and the effort to approach such ideal “brick-wall” filters would result in time- FIGURE 35.6 Quantizationdomain properties noise that were within increasingly one problematic subband – long should ringing that be would kept compromise below the estimated Subband M (Approximate result.the See time localizationthe book we for wish the to fine achieve. print) Practical filters with finite-duration impulse quantized Q-1[•] M masked threshold. responsesf will exhibit a finite transition region between the passband and stopband, and if samples EECS 225D L16: Perceptual Audio Coding UC Regents Spring 2012 © UCB the passbands are set up to fully capture all the original signal, this transition region will FIGURE 35.7 Structure of a subband coder andend corresponding up straying decoder. beyond the The1 input/Mth signalof the spectrum associated with the particular subband, is divided into M bandlimited signals, each accounting for 1/Mth of the total bandwidth, as illustrated in Figure 35.8. Decimation, however, will alias (fold) any signal components which are then downsampled

Load more