Friday, March 29, 2019
29.3.2019
Audio Coding
ELEC-E5620 - Audio Signal Processing, Lecture #10
Vesa Välimäki Sound check
Course Schedule in 2019 (Periods III-VI)
0. General issues (Vesa & Benoit) 11.1.2019 1. History and future of audio DSP (Vesa) 18.1.2019 2. Digital filters in audio (Vesa) 25.1.2019 3. Audio filter design (Vesa) 1.2.2019 4. Analysis of audio signals (Vesa) 8.2.2019 5. Audio effects processing (Benoit Vesa) 15.2.2019 * No lecture (Evaluation week for Period III) 22.2.2019 6. Synthesis of audio signals (Fabian) 1.3.2019 7. Reverberation and 3-D sound (Benoit) 8.3.2019 8. Physics-based sound synthesis (Vesa) 15.3.2019 9. Sampling rate conversion (Vesa) 22.3.2019 10. Audio coding (Vesa) 29.3.2019
©2001-2019 Vesa Välimäki 29.3.2019 2
1 Friday, March 29, 2019
Outline • Introduction • Lossless Audio Coding • Perceptual (Lossy) Audio Coding – Subband coding, time-to-frequency mapping, psychoacoustic models, parametric coding • MPEG standards and some new codes – MP3, AAC, USAC, OPUS • Applications
Part of this lecture material was produced by Ms. Azadeh Haghparast (TKK Dept. of Signal Processing and Acoustics, 2007) 29.3.2019 ©2001-2019 Vesa Välimäki 3
Bit-rate of Audio Signals • Bit-rate of one audio channel without compression (at 44.1 kHz): 16 bit 44100 samples/s = 700 kbit/s • Stereo signal (44.1 kHz): 2 16 bit 44100 samples/s = 1.4 Mbit/s • Additionally, bits are needed for error correction and synchronization – On a CD disk, 33 extra bits for each 16 bits of audio data are needed, so the total bit-rate is 4.3 Mbit/s
29.3.2019 ©2001-2019 Vesa Välimäki 4
2 Friday, March 29, 2019
Bit-Rate for Various Digital Audio Schemes
Application Format Sample Audio Overhead Total bit- rate bit-rate bit-rate rate Compact Disc PCM 44.1 kHz 1.41 Mb/s 2.91 Mb/s 4.32 Mb/s (CD) Digital Audio Tape PCM 44.1 kHz 1.41 Mb/s 1.67 Mb/s 3.08 Mb/s (DAT) MiniDisc (MD) ATRAC 44.1 kHz 292 kb/s 718 kb/s 1.01 Mb/s
Digital Audio MPEG-1 48 kHz 256 kb/s 256 kb/s 512 kb/s Broadcast (DAB) Layer II,III
29.3.2019 ©2001-2019 Vesa Välimäki 5
Applications of Audio Coding • Storage – Music archives – Movie soundtracks – Music for electronic games • Communication – Mobile multimedia – Internet streaming • Broadcasting – Digital radio and TV • Wireless audio – Hands-free headphones and headsets – Wireless speakers 29.3.2019 ©2001-2019 Vesa Välimäki 6
3 Friday, March 29, 2019
Classification of Audio Coding Techniques • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution – The original signal values can be obtained by decoding – Shorten, FLAC, Monkey’s Audio, MPEG-4 ALS, Windows Media Audio 9 Lossless, RealAudio Lossless, APT-X… • Lossy Audio Coding – Reducing the size of audio signal using irrelevancy reduction – Use of limitations of human hearing, e.g., auditory masking – MPEG Audio (Layer 1, 2, 3), Dolby Digital, Ogg Vorbis, MPEG-AAC, HILN (MPEG-4 Parametric Audio Coding), WMA (Windows Media Audio), …
29.3.2019 ©2001-2019 Vesa Välimäki 7
Applications of Lossless Audio Coding • Archiving of original recordings • Studio operations, such as mixing • Digital music distribution over the Internet • Portable music players/recorders • Multi-channel audio (e.g., DVD-Audio) • Bluetooth audio (headsets, speakers)
29.3.2019 ©2001-2019 Vesa Välimäki 8
4 Friday, March 29, 2019
Principles of Lossless Audio Coding • A lossless audio coder comprises of three main blocks: – Framing Divides the audio signal into frames, e.g., 100 ms – Decorrelation Removes redundancy (spectral whitening) – Entropy encoding Statistically efficient code book The histogram of audio signals is often close to Laplace distribution: more small sample values than large ones Short code for common sample values Long code for rare sample values
29.3.2019 ©2001-2019 Vesa Välimäki 9
Principles of Lossless Audio Coding • Two approaches to decorrelate the audio signal
1. Linear predictive model – Lossless representation Predictor coefficients + Error signal
2. Linear transform model – Lossless representation Transform coefficients + Error signal
29.3.2019 ©2001-2019 Vesa Välimäki 10
5 Friday, March 29, 2019
Linear Prediction • Predictor coefficients are determined – Usually using the Autocorrelation or Covariance method • Each sample is estimated from its previous samples using predictor coefficients
kM xˆn Q a k x n k , k 1
29.3.2019 ©2001-2019 Vesa Välimäki 11
Decorrelation With a Polynomial Predictor • Try several simple polynomial predictors and select the best one – The best predictor is the one that produces error signal with the smallest amplitude – The spectrum is whitened – Integer coefficients to avoid rounding errors – For example, try the following polynomial predictors
• xp0(n) = 0 (next sample is zero)
• xp1(n) = x(n – 1) (same value repeats)
• xp2(n) = 2x(n –1) –x(n – 2) (linear trend)
• xp3(n) = 3x(n –1) –3x(n –2) + x(n –3)
(Ref. Hans and Schafer 2001) 29.3.2019 ©2001-2019 Vesa Välimäki 12
6 Friday, March 29, 2019
Principles of Lossless Audio Coding • Decorrelation by Linear Transform Model
29.3.2019 ©2001-2019 Vesa Välimäki 13
LTAC • LTAC: Lossless Transform Audio Coding • Fixed or variable frame length • Orthonormal Discrete Cosine Transform (DCT) • Groups of 32 adjoining transform coefficients • Rice coding for transform coefficients • Arithmetic coding for the error signal
29.3.2019 ©2001-2019 Vesa Välimäki 14
7 Friday, March 29, 2019
MPEG-4 ALS • MPEG-4 Audio Lossless Coding standard, 2005-
29.3.2019 ©2001-2019 Vesa Välimäki 15
MPEG-4 ALS • Based on Linear Prediction – Optimal predictor coefficients are calculated based on an iterative procedure • Optimal order of predictor optimal predictor coefficients the smallest bit-rate • Coefficients converted to arcsine • Linear 8-bit quantization of arcsine coefficients • Rice entropy coding
29.3.2019 ©2001-2019 Vesa Välimäki 16
8 Friday, March 29, 2019
Comparison of Lossless Audio Codecs
≈ 50% Ref: Coalson, 2005. http://flac.sourceforge.net/comparison.html 29.3.2019 ©2001-2019 Vesa Välimäki 17
Lossy Audio Coding • High compression ratios can be achieved, when the signal is allowed to change – The goal is the minimal disturbance for human listeners • Technology for end users – Listen to the coded material as is (no further processing, EQ etc.) – Unsuitable for high-quality recordings or archiving • Subband audio coding – MP3, Dolby AC-3, Vorbis, WMA (Windows Media Player) • Parametric audio coding – HILN (MPEG-4)
29.3.2019 ©2001-2019 Vesa Välimäki 18
9 Friday, March 29, 2019
Applications of Lossy Audio Coding • Portable music players and mobile phones – Also MiniDisk players • Internet audio • Digital TV • Digital radio • Movie soundtracks
29.3.2019 ©2001-2019 Vesa Välimäki 19
Subband Audio Coding
• A.k.a. perceptual audio coding • Frequency-domain representation of audio signal • Based on a psychoacoustic model – Model of the threshold of hearing – Shape the quantization below the threshold of hearing
29.3.2019 ©2001-2019 Vesa Välimäki 20
10 Friday, March 29, 2019
Subband Audio Coding • General block diagram of subband coder
29.3.2019 ©2001-2019 Vesa Välimäki 21
Time to Frequency Mapping
• Time to frequency mapping techniques
– The simplest technique Fourier Transform (FFT)
– Filter bank technique – Pseudo-Quadrature Mirror Filter bank (PQMF) – Modified Discrete Cosine Transform (MDCT)
29.3.2019 ©2001-2019 Vesa Välimäki 22
11 Friday, March 29, 2019
Filter Bank
• N-channel filter bank N parallel bandpass filters • Uniform or non-uniform bandwidth • Magnitude response of a uniform bandwidth N-channel filter bank
29.3.2019 ©2001-2019 Vesa Välimäki 23
Filter Bank
• Analysis-synthesis filter bank Perfect Reconstruction filter bank
29.3.2019 ©2001-2019 Vesa Välimäki 24
12 Friday, March 29, 2019
Filter Bank
• Down-sampling – Preserves data rate – Problem: limiting the spectral bandwidth aliasing (folding) • Up-sampling – Restores data rate – Problem: expanding the spectral bandwidth imaging distortion
29.3.2019 ©2001-2019 Vesa Välimäki 25
Pseudo-Quadrature Mirror Filter Bank
• Pseudo-Quadrature Mirror Filter (PQMF) Bank • Design a narrow lowpass filter Prototype filter • Other filters obtained by cosine modulation of the prototype filter • MPEG-1 and MPEG-2 – 32 channels – Prototype filter of order 512
29.3.2019 ©2001-2019 Vesa Välimäki 26
13 Friday, March 29, 2019
2-Channel PQMF • Design of a 2-channel analysis-synthesis filter bank • Challenge:
– Define the filters H 0 (z), H 1 ( z ) , G 0 ( z ) , G 1 ( z )
– For Perfect Reconstruction:
G0 z H1 (z),
G1 z H 0 z .
– and also H1z H0 z .
29.3.2019 ©2001-2019 Vesa Välimäki 27
Modified DCT Filter Bank • Modified Discrete Cosine Transform Filter Bank • Also called Time-Domain Aliasing Cancellation (TDAC) • Special case of PQMF – Length of the prototype filter is twice that of the PQMF – 50% overlap with the previous frame • Prototype filter Sine function • Choice of window length – Long window length good for stationary signal – Short window length suitable for transients
29.3.2019 ©2001-2019 Vesa Välimäki 28
14 Friday, March 29, 2019
Subband Coding
• General block diagram of subband coder
29.3.2019 ©2001-2019 Vesa Välimäki 29
Psychoacoustics • Absolute threshold of hearing • Masking phenomenon – Simultaneous masking, also called frequency masking – Non-simultaneous masking, also called temporal masking • Critical bandwidth • Spread of masking
29.3.2019 ©2001-2019 Vesa Välimäki 30
15 Friday, March 29, 2019
Sound Pressure Level • Quantity for measuring the sound pressure
P 2 SPL(dB)10log10 ( ) , P0 • P: pressure of sound
• P0: standard pressure level = 20 μPa – Sound pressure level at the hearing threshold, when f = 2 kHz
29.3.2019 ©2001-2019 Vesa Välimäki 31
Limits of Human Hearing
• Frequency range: 20 Hz … 20 kHz • Most sensitive range: 1 kHz … 5 kHz
• Dynamic range: 20 dB … 95 dB (for safe listening) • Threshold of pain: About 120 dB
29.3.2019 ©2001-2019 Vesa Välimäki 32
16 Friday, March 29, 2019
Absolute Threshold of Hearing
• Also called Threshold in Quiet • Minimum level of sound of a pure tone perceived by an average human being in noiseless conditions – About 0 dB in mid frequencies – Frequency components below this curve are inaudible
29.3.2019 ©2001-2019 Vesa Välimäki 33
Threshold of Hearing
29.3.2019 ©2001-2019 Vesa Välimäki 34
17 Friday, March 29, 2019
Masking Phenomenon • The most important psychoacoustic concept for transparent audio coding
• Frequency masking – Concurrent masker and maskee sounds
• Temporal masking – Extends beyond the time duration in which the masker occurs
29.3.2019 ©2001-2019 Vesa Välimäki 35
Frequency Masking • The threshold of audibility of one sound is raised in the presence of sound energy at neighboring frequencies
29.3.2019 ©2001-2019 Vesa Välimäki 36
18 Friday, March 29, 2019
Frequency Masking Examples • Examples of the raising of the hearing threshold Sine Narrow-band noise
White noise
29.3.2019 ©2001-2019 Vesa Välimäki 37
Masking Curves • Four masking curves for measurements – Narrow-band noise masking narrow-band noise (NMN) – Narrow-band noise masking tone (NMT) – Tone masking narrow-band noise (TMN) – Tone masking tone (TMT)
29.3.2019 ©2001-2019 Vesa Välimäki 38
19 Friday, March 29, 2019
Narrow-Band Noise Masking Tones (NMT) • Shape of the masking curve depends on frequency
29.3.2019 ©2001-2019 Vesa Välimäki 39
Narrow-Band Noise Masking Tones (NMT) • Shape of the masking curve also depends on sound level
29.3.2019 ©2001-2019 Vesa Välimäki 40
20 Friday, March 29, 2019
Tone Masking Tone (TMT)
29.3.2019 ©2001-2019 Vesa Välimäki 41
Temporal Masking • Difficult to employ in audio coding
a few ms ~ 100 ms
29.3.2019 ©2001-2019 Vesa Välimäki 42
21 Friday, March 29, 2019
Critical Bandwidth • The frequency range around a masker frequency, in which the masking curve remains flat Critical Bandwidth • Each Critical Bandwidth corresponds to a constant distance on the basilar membrane • Unit of Critical Band Bark z / Bark 13arctan(0.76 f / kHz) 3.5arctanf / 7.5kHz 2 .
29.3.2019 ©2001-2019 Vesa Välimäki 43
Critical Bands on a Linear Frequency Scale
1
0.8
0.6
0.4
0.2
0 0 5 10 15 20 FrequencyLineaarinen taajuus (kHz) (kHz)
29.3.2019 ©2001-2019 Vesa Välimäki 44
22 Friday, March 29, 2019
Critical Bands on a Log Frequency Scale 1
0.8 • Constant bandwidth up to 0.6 500 Hz; then 1/3 octave 0.4
0.2
0 -1 0 1 10 10 10 LFrequencyogaritminen taajuus (kHz) (kHz)
29.3.2019 ©2001-2019 Vesa Välimäki 45
Spread of Masking • The effect of frequency masking is not limited to within one critical bandwidth • Various analytical masking spread functions models – Triangle function – Schroeder function –…
29.3.2019 ©2001-2019 Vesa Välimäki 46
23 Friday, March 29, 2019
Perceptual Entropy & Bit Allocation
• Perceptual Entropy lower bound of the number of bits to have transparent quality • Bit Allocation Algorithms – Allocate bit numbers according to the signal-to-mask ratio (SMR) – Noise-to-mask-ratio (NMR) remains below the masking threshold • It is well known that SNR (signal-to-noise ratio) is not a meaningful measure in perceptual coding
29.3.2019 ©2001-2019 Vesa Välimäki 47
13-dB Miracle
• Johnston and Brandenburg showed in 1991 that you can hide in an audio signal much noise, when it is shaped with the masking curve • When the SNR per frame is only 13 dB, the noise remains inaudible! • Example: original, 13-dB SNR, 0-dB SNR, -10-dB SNR, noise only
For comparison, unmasked noise at 13-dB SNR
Demo by Kai Jussila & Pekka Rönkkö, Aalto Univ. 2012
©2001-2019 Vesa Välimäki 29.3.2019 48
24 Friday, March 29, 2019
Parametric Audio Coding • Based on parametric modeling of audio signals – Cf. Parametric sound synthesis using the MIDI file format • Very low bit-rate applications – Mobile multimedia/gaming, internet streaming – 40 kbits/s and lower • HILN (Harmonic and Individual Lines plus Noise) – Sinusoids plus noise modeling – Parametric audio coding within the MPEG-4 standard – Minimum bit rate: 4 kbit/s
29.3.2019 ©2001-2019 Vesa Välimäki 49
HILN Encoder
29.3.2019 ©2001-2019 Vesa Välimäki 50
25 Friday, March 29, 2019
Parametric Model of Audio Source in HILN
• Decomposition of audio signal its components – Individual sinusoid frequency and amplitude
– Harmonic tone fundamental frequency, amplitude, spectral envelope of partials
–Noise amplitude and spectral envelope
– Transients optional parameter, such as temporal envelope
29.3.2019 ©2001-2019 Vesa Välimäki 51
HILN Perception Model • Different from the perception model used in subband coding
• Effect of parameter deviation on signal quality – Bit allocation for quantization
• Influence of different parameters on the quality of decoded signal – Choice of model parameter for transmission
29.3.2019 ©2001-2019 Vesa Välimäki 52
26 Friday, March 29, 2019
HILN Decoder • Harmonics + sines + noise synthesis
29.3.2019 ©2001-2019 Vesa Välimäki 53
MPEG Standards 1988- • The MPEG working group (Moving Picture Experts Group) focuses on international standardisation of video and audio technology – Official name: ISO/IEC JTC1 SC29 WG11 • The most well known standards are MPEG-1 ja MPEG-2 – MPEG-1 Layer 3 = MP3 – MPEG-1 Layer 2 is used in the European digital radio standard (DAB) • Also MPEG-4, MPEG-7, and MPEG-21 – New multimedia standards, not only coding • MPEG-D (2007): MPEG audio technologies – Includes MPEG Surround, Spatial Audio Object Coding (SAOC), and Unified Speech and Audio Coding (USAC)
29.3.2019 ©2001-2019 Vesa Välimäki 54
27 Friday, March 29, 2019
MP3 • MPEG-1 layer 3 is MP3 – Old technology; standard from 1991 • Still a most common codec for ”almost” CD-quality audio
29.3.2019 ©2001-2019 Vesa Välimäki 55
MP3
Ref. K. Brandenburg, 1999 29.3.2019 ©2001-2019 Vesa Välimäki 56
28 Friday, March 29, 2019
Examples of MP3 Audio at Various Bit-rates • One of the standard MPEG test signals: Suzanne Vega, “Tom’s Diner” (1987) - Duration 2 min 11 sec
Bit-rate Compression Bit/sample File size Quality 1411 kbit/s 1:1 16 22565 kB Original CD 128 kbit/s 1:11 1.5 2048 kB MP3 CD quality- 96 kbit/s 1:15 1.1 1536 kB Almost CD -quality 64 kbit/s 1:22 0.73 1026 kB FM- radio quality 32 kbit/s 1:44 0.36 514 kB Very low 8 kbit/s 1:176 0.09 130 kB Not recommended
• Signal-to-noise ratio does not describe well the sound quality of lossy audio coding, since errors are not only noise
29.3.2019 ©2001-2019 Vesa Välimäki 57
Typical Problems in MP3 Encoded Signals • Pre-echo, limited frequency range, additional noise – Castanets are a well known problematic signal
4 4 x 10 Original x 10 MP3 128kbps pre-echo
2 2
1.5 1.5
1 1 Frequency / kHz Frequency / kHz
0.5 0.5
0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Time / s Time / s 29.3.2019 ©2001-2019 Vesa Välimäki 58
29 Friday, March 29, 2019
MPEG-2 AAC • AAC = Advanced Audio Coding – One of the most popular audio codecs (phones, YouTube, Apple, PlayStation…) • Second-generation MPEG audio coding standard from 1997 – In half-rate mode, the bit-rate is 50% of that of MPEG-1 Layer 1 at the same quality – MP3 quality at 30% lower bit-rate (96 kbit/s) • Multi-channel sound – Mono/stereo/multi-channel – 1-48 channels and 0-16 effects channels (e.g. 5.1 movie sound) • Many sample rates between 8 and 96 kHz – 8 kHz, 24 kHz, 32 kHz, 44,1 kHz, 48 kHz, 64 kHz, 88,2 kHz, 96 kHz • Various bit-rates and variable bit-rate – For stereo signals: 16 ... 192 kbit/s 29.3.2019 ©2001-2019 Vesa Välimäki 59
New Features in MPEG-2 AAC • Improved solutions over MP3 – Better frequency resolution: max 1024 bins (in MP3 only 576) – Multi-symbol Huffman coding, in which four frequency points are combined • Cleaner transition of frames – Modified Discrete Cosine Transform (MDCT) Filter Bank in which the impulse response length is only 5.3 ms (in MP3 18.6 ms) – Reduces pre-echo • Temporal noise shaping (TNS) – Let the noise level raise in those parts of the frame, where it is not heard • Original vs. AAC (mono, 32 kb/s)
29.3.2019 ©2001-2019 Vesa Välimäki 60
30 Friday, March 29, 2019
Other Lossy Coding Methods • Dolby AC-3 or ”Dolby Digital” – 5.1 movie sound: left, center, right, left-surround, right-surround, and LFE (Low-Frequency Effects, low frequencies < 120 Hz) • Dolby E – A codec for professional systems, such as TV production • DTS (Digital Theatre Systems) – Movie and home theatre sound • Sony ATRAC (Adaptive Transform Acoustic Coding) and SDDS – ATRAC 4 is the coding method for MiniDisc (compression ratio only 1:5) • Microsoft Windows Media Audio • Lucent PAC (versions 1-4) and EPAC – PAC = Perceptual Audio Codec; EPAC = Enhanced PAC – Used in the US for commercial satellite radio broadcasting (XM, Sirius)
29.3.2019 ©2001-2019 Vesa Välimäki 61
5.1 Home Theatre Sound • Home theatre imitates the sound system of movie theatres Center
Left Right
LFE (subwoofer) Left surround Right surround
29.3.2019 ©2001-2019 Vesa Välimäki 62
31 Friday, March 29, 2019
Ogg Vorbis • Open source, IPR free (no patents) • Almost the same quality as MP3, but depends on audio material – Pre-echos are audible in short transients (e.g. castanets) • Variable bit-rate, but can be chosen to be practically constant – 45…500 kbit/s (stereo) • Hardware support in early portable music players – Rio Karma 20GB, Cowon iAudio X5 • Spotify and Wikipedia use Ogg Vorbis! – 96…320 kbit/s (stereo) • Used also in many computer games (Epic Games)
Source: http://en.wikipedia.org/wiki/Vorbis 29.3.2019 ©2001-2019 Vesa Välimäki 63
Subjective Comparison • The quality differences between audio coding methods can be best evaluated with listening tests • Top 5 audio codecs according to a listening test (in comparison against CD) (Soulodre et al., Journal of AES, 1998) 1. MPEG-2 AAC 2. Lucent PAC 3. MP3 (MPEG-1 Layer 3) 4. AC-3 5. MPEG-1 Layer 2 • Also other factors must be accounted for – Computational cost, sensitivity to bit errors, compatibility with other systems 29.3.2019 ©2001-2019 Vesa Välimäki 64
32 Friday, March 29, 2019
Audio Codec Comparison from Year 1998 • It is not easy to evaluate the results – Quality depends on bit-rate • Coders are designed for different purposes (bit-rates)
Source: http://www.aac-audio.com/technology/aac.rp.0002.xprtLsnr.html 29.3.2019 ©2001-2019 Vesa Välimäki 65
Bandwidth Expansion • Bandwidth extension is used in some perceptual coders – mp3PRO, AAC+, AMR-WB+ (Adaptive Multi-Rate Wideband by Nokia) • The highest octave is not coded, but only some features are transmitted – This trick alone can reduce the bit rate by almost 50%!!! • Reproduced by allowing lower frequencies to image (spectral band replication)
29.3.2019 ©2001-2019 Vesa Välimäki 66
33 Friday, March 29, 2019
Bandwidth Expansion in mp3PRO
Source: Ziegler et al., 2002 29.3.2019 ©2001-2019 Vesa Välimäki 67
New Now and in Near Future • Low-delay coding – Usually coders have too much delay for two-directional communication – MPEG-4 AAC-LC • Flexible multi-channel audio coding – Binaural Cue Coding (BCC): transmit only one channel and low bit-rate side information (Level/Time-Difference/Correlation) – Directional Audio Coding (DirAC) by Pulkki et al. (Aalto Univ.) – Number and position of loudspeakers can be chosen freely
29.3.2019 ©2001-2019 Vesa Välimäki 68
34 Friday, March 29, 2019
Bluetooth Audio • The bitrate in Bluetooth transmission (about 500 kbit/s) is insufficient for full-blown hi-fi audio etc. => Must compress!
Voice Hands-free headsets, Music wireless Voice-based headphones navigation and speakers Ring tones
29.3.2019 ©2001-2019 Vesa Välimäki 69
Bluetooth Audio • The SBC (Sub-Band Codec) is supported for bitrates 132…345 kbit/s (de Bont et al., 1995) – Much simpler than MP3 and other CD-quality coders – 4 or 8 subbands – Various sample rates 16…48 kHz, mono or stereo – Low-latency: even 16 samples • APT-X lossless audio codec can be used for high-quality applications, such as for movie soundtracks
29.3.2019 ©2001-2019 Vesa Välimäki 70
35 Friday, March 29, 2019
Unified Speech and Audio Coding (USAC) • A fairly recent codec (April 2012) for both music and speech using low bit rates 12 … 64 kbit/s • Part of the MPEG-D Part 3 standard (2007) • Uses LPC and residual coding for speech • Uses MDCT-based tools for audio • Includes MPEG-4 Spectral Band Replication, MPEG Surround (for multi-channel audio coding), and Parametric Stereo • Performs as well or better than the previous best speech or audio codes
29.3.2019 ©2001-2019 Vesa Välimäki 71
Opus • An open source, IETF audio coding standard for interactive real-time applications over the Internet (Internet Engineering Task Force, Sept. 2012) • Opus uses Skype’s VoIP speech codec SILK (linear prediction) and CELT (Constrained-Energy Lapped Transform), a low- latency MDCT codec – Either or both can be used – When both are used, SILK codes the lower frequency band (< 8 kHz) and CELT codes the higher frequency band (8-20 kHz) • Very low latency: 22.5 ms by default
Source: Antti Pakarinen, MSc thesis, 2012
29.3.2019 ©2001-2019 Vesa Välimäki 72
36 Friday, March 29, 2019
Sound Examples: USAC and Opus • Band – Original – USAC 64 kbps – Opus 64 kbps – Opus 14 kbps (mono) • Bells – Original – USAC 64 kbps – Opus 64 kbps – Opus 14 kbps (mono)
Source: Antti Pakarinen, MSc thesis, 2012
29.3.2019 ©2001-2019 Vesa Välimäki 73
Use of Lossy Audio Coding • Only suitable for end users – For audio that are listened to without further processing – Real-time streaming over a slow network – Do not equalize! – Errors may become audible with low-quality loudspeakers – It is not allowed to re-compress (tandem) • Very useful DSP technology, when either the bit-rate or memory capacity is limited
29.3.2019 ©2001-2019 Vesa Välimäki 74
37 Friday, March 29, 2019
Conclusion • The best compression ratio is obtained with lossy coding – Part of the signal is discarded – accounting for the limitations of hearing • Lossless coding is an alternative for archiving precious audio – 50% reduction is obtained using statistical properties of audio signals • The most common method has been MPEG-1 Layer 3 (MP3) – Later perhaps mp3PRO or AAC+ or something else • Audio coding is useful in many applications – Phones, streaming, digital TV and radio, movies, wireless audio • CD quality can now be obtained with less than 1 bit per sample – Almost at 48 kbit/s (stereo)
29.3.2019 ©2001-2019 Vesa Välimäki 75
Literature (1) • F. de Bont, M. Groenewegen and W. Oomen, “A High Quality Audio-Coding System at 128 kb/s,“ presented at the 98th AES Convention, Paris, France, Feb. 25-28, 1995. • Karlheinz Brandenburg, “MP3 and AAC explained,” in Proc. AES 17th International Conference on High Quality Audio Coding, 1999. • Josh Coalson, “FLAC - Free Lossless Audio Codec”, 2005. http://flac.sourceforge.net/index.html. • M. Hans and R. Schafer, “Lossless Compression of Digital Audio,” IEEE Signal Processing Magazine, pp. 21- 32, July 2001 • J. D. Johnston, “Perceptual coding of audio signals—a tutorial,” presented at the AES Convention, New York, Sept. 1997. • P. Knoll, “MPEG digital audio coding,” IEEE Signal Processing Magazine, vol. 14, no. 5, pp. 59– 81, July 1997. • T. Painter and A. Spanias, “Perceptual coding of digital audio,” Proc. IEEE, vol. 88, no. 4, pp. 451–513, April 2000. • Antti Pakarinen, Multi-core Platforms for Audio and Multimedia Coding Algorithms in Telecommunications, Master’s thesis, Aalto University, Espoo, Finland, 2012. • K. C. Pohlmann, Principles of Digital Audio. Fourth Edition, McGraw-Hill, 2000.
29.3.2019 ©2001-2019 Vesa Välimäki 76
38 Friday, March 29, 2019
Literature (2) • T. Ziegler, A. Ehret, P. Ekstrand, and M. Lutzky, “Enhancing mp3 with SBR: Features and capabilities of the new mp3PRO Algorithm,” Audio Engineering Society Convention Paper 5560, Munich, May 2002. • Udo Zölzer, Digital Audio Signal Processing, Wiley, 1997. Chapter 9, “Data Compression”, pp. 249–265.
29.3.2019 ©2001-2019 Vesa Välimäki 77
Homework #5
• Description: Analyze your use of time during the course with respect to the schedule you planned earlier (Homework #1). o Did anything go wrong? Why? o Was the original time plan realistic? o Were there tasks that demanded much more/less effort than expected? o Matlab implementations. How much time did they take w.r.t. the whole learning diaries? o All feelings welcomed!! • Due date: Wednesday, April 3rd, 2019, at 16.00 • Length should be about 1 page.
©2001-2019 Vesa Välimäki 29.3.2019 78
39 Friday, March 29, 2019
Course feedback
• Please give feedback about the Audio Signal Processing course https://www.webropolsurveys.com/... • The deadline is April 11, 2019
©2001-2019 Vesa Välimäki 29.3.2019 79
40