Audio Coding
Total Page:16
File Type:pdf, Size:1020Kb
Friday, March 29, 2019 29.3.2019 Audio Coding ELEC-E5620 - Audio Signal Processing, Lecture #10 Vesa Välimäki Sound check Course Schedule in 2019 (Periods III-VI) 0. General issues (Vesa & Benoit) 11.1.2019 1. History and future of audio DSP (Vesa) 18.1.2019 2. Digital filters in audio (Vesa) 25.1.2019 3. Audio filter design (Vesa) 1.2.2019 4. Analysis of audio signals (Vesa) 8.2.2019 5. Audio effects processing (Benoit Vesa) 15.2.2019 * No lecture (Evaluation week for Period III) 22.2.2019 6. Synthesis of audio signals (Fabian) 1.3.2019 7. Reverberation and 3-D sound (Benoit) 8.3.2019 8. Physics-based sound synthesis (Vesa) 15.3.2019 9. Sampling rate conversion (Vesa) 22.3.2019 10. Audio coding (Vesa) 29.3.2019 ©2001-2019 Vesa Välimäki 29.3.2019 2 1 Friday, March 29, 2019 Outline • Introduction • Lossless Audio Coding • Perceptual (Lossy) Audio Coding – Subband coding, time-to-frequency mapping, psychoacoustic models, parametric coding • MPEG standards and some new codes – MP3, AAC, USAC, OPUS • Applications Part of this lecture material was produced by Ms. Azadeh Haghparast (TKK Dept. of Signal Processing and Acoustics, 2007) 29.3.2019 ©2001-2019 Vesa Välimäki 3 Bit-rate of Audio Signals • Bit-rate of one audio channel without compression (at 44.1 kHz): 16 bit 44100 samples/s = 700 kbit/s • Stereo signal (44.1 kHz): 2 16 bit 44100 samples/s = 1.4 Mbit/s • Additionally, bits are needed for error correction and synchronization – On a CD disk, 33 extra bits for each 16 bits of audio data are needed, so the total bit-rate is 4.3 Mbit/s 29.3.2019 ©2001-2019 Vesa Välimäki 4 2 Friday, March 29, 2019 Bit-Rate for Various Digital Audio Schemes Application Format Sample Audio Overhead Total bit- rate bit-rate bit-rate rate Compact Disc PCM 44.1 kHz 1.41 Mb/s 2.91 Mb/s 4.32 Mb/s (CD) Digital Audio Tape PCM 44.1 kHz 1.41 Mb/s 1.67 Mb/s 3.08 Mb/s (DAT) MiniDisc (MD) ATRAC 44.1 kHz 292 kb/s 718 kb/s 1.01 Mb/s Digital Audio MPEG-1 48 kHz 256 kb/s 256 kb/s 512 kb/s Broadcast (DAB) Layer II,III 29.3.2019 ©2001-2019 Vesa Välimäki 5 Applications of Audio Coding • Storage – Music archives – Movie soundtracks – Music for electronic games • Communication – Mobile multimedia – Internet streaming • Broadcasting – Digital radio and TV • Wireless audio – Hands-free headphones and headsets – Wireless speakers 29.3.2019 ©2001-2019 Vesa Välimäki 6 3 Friday, March 29, 2019 Classification of Audio Coding Techniques • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution – The original signal values can be obtained by decoding – Shorten, FLAC, Monkey’s Audio, MPEG-4 ALS, Windows Media Audio 9 Lossless, RealAudio Lossless, APT-X… • Lossy Audio Coding – Reducing the size of audio signal using irrelevancy reduction – Use of limitations of human hearing, e.g., auditory masking – MPEG Audio (Layer 1, 2, 3), Dolby Digital, Ogg Vorbis, MPEG-AAC, HILN (MPEG-4 Parametric Audio Coding), WMA (Windows Media Audio), … 29.3.2019 ©2001-2019 Vesa Välimäki 7 Applications of Lossless Audio Coding • Archiving of original recordings • Studio operations, such as mixing • Digital music distribution over the Internet • Portable music players/recorders • Multi-channel audio (e.g., DVD-Audio) • Bluetooth audio (headsets, speakers) 29.3.2019 ©2001-2019 Vesa Välimäki 8 4 Friday, March 29, 2019 Principles of Lossless Audio Coding • A lossless audio coder comprises of three main blocks: – Framing Divides the audio signal into frames, e.g., 100 ms – Decorrelation Removes redundancy (spectral whitening) – Entropy encoding Statistically efficient code book The histogram of audio signals is often close to Laplace distribution: more small sample values than large ones Short code for common sample values Long code for rare sample values 29.3.2019 ©2001-2019 Vesa Välimäki 9 Principles of Lossless Audio Coding • Two approaches to decorrelate the audio signal 1. Linear predictive model – Lossless representation Predictor coefficients + Error signal 2. Linear transform model – Lossless representation Transform coefficients + Error signal 29.3.2019 ©2001-2019 Vesa Välimäki 10 5 Friday, March 29, 2019 Linear Prediction • Predictor coefficients are determined – Usually using the Autocorrelation or Covariance method • Each sample is estimated from its previous samples using predictor coefficients kM xˆn Q a k x n k , k 1 29.3.2019 ©2001-2019 Vesa Välimäki 11 Decorrelation With a Polynomial Predictor • Try several simple polynomial predictors and select the best one – The best predictor is the one that produces error signal with the smallest amplitude – The spectrum is whitened – Integer coefficients to avoid rounding errors – For example, try the following polynomial predictors • xp0(n) = 0 (next sample is zero) • xp1(n) = x(n – 1) (same value repeats) • xp2(n) = 2x(n –1) –x(n – 2) (linear trend) • xp3(n) = 3x(n –1) –3x(n –2) + x(n –3) (Ref. Hans and Schafer 2001) 29.3.2019 ©2001-2019 Vesa Välimäki 12 6 Friday, March 29, 2019 Principles of Lossless Audio Coding • Decorrelation by Linear Transform Model 29.3.2019 ©2001-2019 Vesa Välimäki 13 LTAC • LTAC: Lossless Transform Audio Coding • Fixed or variable frame length • Orthonormal Discrete Cosine Transform (DCT) • Groups of 32 adjoining transform coefficients • Rice coding for transform coefficients • Arithmetic coding for the error signal 29.3.2019 ©2001-2019 Vesa Välimäki 14 7 Friday, March 29, 2019 MPEG-4 ALS • MPEG-4 Audio Lossless Coding standard, 2005- 29.3.2019 ©2001-2019 Vesa Välimäki 15 MPEG-4 ALS • Based on Linear Prediction – Optimal predictor coefficients are calculated based on an iterative procedure • Optimal order of predictor optimal predictor coefficients the smallest bit-rate • Coefficients converted to arcsine • Linear 8-bit quantization of arcsine coefficients • Rice entropy coding 29.3.2019 ©2001-2019 Vesa Välimäki 16 8 Friday, March 29, 2019 Comparison of Lossless Audio Codecs ≈ 50% Ref: Coalson, 2005. http://flac.sourceforge.net/comparison.html 29.3.2019 ©2001-2019 Vesa Välimäki 17 Lossy Audio Coding • High compression ratios can be achieved, when the signal is allowed to change – The goal is the minimal disturbance for human listeners • Technology for end users – Listen to the coded material as is (no further processing, EQ etc.) – Unsuitable for high-quality recordings or archiving • Subband audio coding – MP3, Dolby AC-3, Vorbis, WMA (Windows Media Player) • Parametric audio coding – HILN (MPEG-4) 29.3.2019 ©2001-2019 Vesa Välimäki 18 9 Friday, March 29, 2019 Applications of Lossy Audio Coding • Portable music players and mobile phones – Also MiniDisk players • Internet audio • Digital TV • Digital radio • Movie soundtracks 29.3.2019 ©2001-2019 Vesa Välimäki 19 Subband Audio Coding • A.k.a. perceptual audio coding • Frequency-domain representation of audio signal • Based on a psychoacoustic model – Model of the threshold of hearing – Shape the quantization below the threshold of hearing 29.3.2019 ©2001-2019 Vesa Välimäki 20 10 Friday, March 29, 2019 Subband Audio Coding • General block diagram of subband coder 29.3.2019 ©2001-2019 Vesa Välimäki 21 Time to Frequency Mapping • Time to frequency mapping techniques – The simplest technique Fourier Transform (FFT) – Filter bank technique – Pseudo-Quadrature Mirror Filter bank (PQMF) – Modified Discrete Cosine Transform (MDCT) 29.3.2019 ©2001-2019 Vesa Välimäki 22 11 Friday, March 29, 2019 Filter Bank • N-channel filter bank N parallel bandpass filters • Uniform or non-uniform bandwidth • Magnitude response of a uniform bandwidth N-channel filter bank 29.3.2019 ©2001-2019 Vesa Välimäki 23 Filter Bank • Analysis-synthesis filter bank Perfect Reconstruction filter bank 29.3.2019 ©2001-2019 Vesa Välimäki 24 12 Friday, March 29, 2019 Filter Bank • Down-sampling – Preserves data rate – Problem: limiting the spectral bandwidth aliasing (folding) • Up-sampling – Restores data rate – Problem: expanding the spectral bandwidth imaging distortion 29.3.2019 ©2001-2019 Vesa Välimäki 25 Pseudo-Quadrature Mirror Filter Bank • Pseudo-Quadrature Mirror Filter (PQMF) Bank • Design a narrow lowpass filter Prototype filter • Other filters obtained by cosine modulation of the prototype filter • MPEG-1 and MPEG-2 – 32 channels – Prototype filter of order 512 29.3.2019 ©2001-2019 Vesa Välimäki 26 13 Friday, March 29, 2019 2-Channel PQMF • Design of a 2-channel analysis-synthesis filter bank • Challenge: – Define the filters H 0 (z), H 1 ( z ) , G 0 ( z ) , G 1 ( z ) – For Perfect Reconstruction: G0 z H1 (z), G1 z H 0 z . – and also H1z H0 z . 29.3.2019 ©2001-2019 Vesa Välimäki 27 Modified DCT Filter Bank • Modified Discrete Cosine Transform Filter Bank • Also called Time-Domain Aliasing Cancellation (TDAC) • Special case of PQMF – Length of the prototype filter is twice that of the PQMF – 50% overlap with the previous frame • Prototype filter Sine function • Choice of window length – Long window length good for stationary signal – Short window length suitable for transients 29.3.2019 ©2001-2019 Vesa Välimäki 28 14 Friday, March 29, 2019 Subband Coding • General block diagram of subband coder 29.3.2019 ©2001-2019 Vesa Välimäki 29 Psychoacoustics • Absolute threshold of hearing • Masking phenomenon – Simultaneous masking, also called frequency masking – Non-simultaneous masking, also called temporal masking • Critical bandwidth • Spread of masking 29.3.2019 ©2001-2019 Vesa Välimäki 30 15 Friday, March 29, 2019 Sound Pressure Level • Quantity for measuring the sound pressure P 2 SPL(dB)10log10 ( ) ,