<<

Friday, March 29, 2019

29.3.2019

Audio Coding

ELEC-E5620 - Processing, Lecture #10

Vesa Välimäki check

Course Schedule in 2019 (Periods III-VI)

0. General issues (Vesa & Benoit) 11.1.2019 1. History and future of audio DSP (Vesa) 18.1.2019 2. Digital filters in audio (Vesa) 25.1.2019 3. design (Vesa) 1.2.2019 4. Analysis of audio signals (Vesa) 8.2.2019 5. Audio effects processing (Benoit Vesa) 15.2.2019 * No lecture (Evaluation week for Period III) 22.2.2019 6. Synthesis of audio signals (Fabian) 1.3.2019 7. and 3-D sound (Benoit) 8.3.2019 8. Physics-based sound synthesis (Vesa) 15.3.2019 9. Sampling rate conversion (Vesa) 22.3.2019 10. Audio coding (Vesa) 29.3.2019

©2001-2019 Vesa Välimäki 29.3.2019 2

1 Friday, March 29, 2019

Outline • Introduction • Lossless Audio Coding • Perceptual (Lossy) Audio Coding – Subband coding, time-to-frequency mapping, psychoacoustic models, parametric coding • MPEG standards and some new – MP3, AAC, USAC, • Applications

Part of this lecture material was produced by Ms. Azadeh Haghparast (TKK Dept. of and , 2007) 29.3.2019 ©2001-2019 Vesa Välimäki 3

Bit-rate of Audio Signals • -rate of one audio without compression (at 44.1 kHz): 16 bit  44100 samples/s = 700 kbit/s • Stereo signal (44.1 kHz): 2  16 bit  44100 samples/s = 1.4 Mbit/s • Additionally, are needed for error correction and synchronization – On a CD disk, 33 extra bits for each 16 bits of audio are needed, so the total bit-rate is  4.3 Mbit/s

29.3.2019 ©2001-2019 Vesa Välimäki 4

2 Friday, March 29, 2019

Bit-Rate for Various Schemes

Application Format Sample Audio Overhead Total bit- rate bit-rate bit-rate rate PCM 44.1 kHz 1.41 Mb/s 2.91 Mb/s 4.32 Mb/s (CD) PCM 44.1 kHz 1.41 Mb/s 1.67 Mb/s 3.08 Mb/s (DAT) MiniDisc (MD) ATRAC 44.1 kHz 292 kb/s 718 kb/s 1.01 Mb/s

Digital Audio MPEG-1 48 kHz 256 kb/s 256 kb/s 512 kb/s Broadcast (DAB) Layer II,III

29.3.2019 ©2001-2019 Vesa Välimäki 5

Applications of Audio Coding • Storage – archives – Movie soundtracks – Music for electronic games • Communication – Mobile streaming • – Digital and TV • audio – Hands-free and headsets – Wireless speakers 29.3.2019 ©2001-2019 Vesa Välimäki 6

3 Friday, March 29, 2019

Classification of Audio Coding Techniques • Lossless Audio Coding – Reducing the size of audio signal using redundancy reduction, such as sample value distribution – The original signal values can be obtained by decoding – , FLAC, Monkey’s Audio, MPEG-4 ALS, Audio 9 Lossless, RealAudio Lossless, APT-X… • Lossy Audio Coding – Reducing the size of audio signal using irrelevancy reduction – Use of limitations of human , e.g., auditory masking – MPEG Audio (Layer 1, 2, 3), , , MPEG-AAC, HILN (MPEG-4 Parametric Audio Coding), WMA (), …

29.3.2019 ©2001-2019 Vesa Välimäki 7

Applications of Lossless Audio Coding • Archiving of original recordings • Studio operations, such as mixing • Digital music distribution over the Internet • Portable music players/recorders • Multi-channel audio (e.g., DVD-Audio) • audio (headsets, speakers)

29.3.2019 ©2001-2019 Vesa Välimäki 8

4 Friday, March 29, 2019

Principles of Lossless Audio Coding • A lossless audio coder comprises of three main blocks: – Framing  Divides the audio signal into frames, e.g., 100 ms – Decorrelation  Removes redundancy (spectral whitening) –  Statistically efficient book  The histogram of audio signals is often close to Laplace distribution: more small sample values than large ones  Short code for common sample values  Long code for rare sample values

29.3.2019 ©2001-2019 Vesa Välimäki 9

Principles of Lossless Audio Coding • Two approaches to decorrelate the audio signal

1. Linear predictive model – Lossless representation  Predictor coefficients + Error signal

2. Linear transform model – Lossless representation  Transform coefficients + Error signal

29.3.2019 ©2001-2019 Vesa Välimäki 10

5 Friday, March 29, 2019

Linear Prediction • Predictor coefficients are determined – Usually using the Autocorrelation or Covariance method • Each sample is estimated from its previous samples using predictor coefficients

kM  xˆn Q a k x n  k ,  k 1 

29.3.2019 ©2001-2019 Vesa Välimäki 11

Decorrelation With a Polynomial Predictor • Try several simple polynomial predictors and select the best one – The best predictor is the one that produces error signal with the smallest amplitude – The spectrum is whitened – Integer coefficients to avoid rounding errors – For example, try the following polynomial predictors

• xp0(n) = 0 (next sample is zero)

• xp1(n) = x(n – 1) (same value repeats)

• xp2(n) = 2x(n –1) –x(n – 2) (linear trend)

• xp3(n) = 3x(n –1) –3x(n –2) + x(n –3)

(Ref. Hans and Schafer 2001) 29.3.2019 ©2001-2019 Vesa Välimäki 12

6 Friday, March 29, 2019

Principles of Lossless Audio Coding • Decorrelation by Linear Transform Model

29.3.2019 ©2001-2019 Vesa Välimäki 13

LTAC • LTAC: Lossless Transform Audio Coding • Fixed or variable length • Orthonormal Discrete Cosine Transform (DCT) • Groups of 32 adjoining transform coefficients • Rice coding for transform coefficients • for the error signal

29.3.2019 ©2001-2019 Vesa Välimäki 14

7 Friday, March 29, 2019

MPEG-4 ALS • MPEG-4 , 2005-

29.3.2019 ©2001-2019 Vesa Välimäki 15

MPEG-4 ALS • Based on Linear Prediction – Optimal predictor coefficients are calculated based on an iterative procedure • Optimal order of predictor  optimal predictor coefficients  the smallest bit-rate • Coefficients converted to arcsine • Linear 8-bit quantization of arcsine coefficients • Rice entropy coding

29.3.2019 ©2001-2019 Vesa Välimäki 16

8 Friday, March 29, 2019

Comparison of Lossless Audio

≈ 50% Ref: Coalson, 2005. http://flac.sourceforge.net/comparison.html 29.3.2019 ©2001-2019 Vesa Välimäki 17

Lossy Audio Coding • High compression ratios can be achieved, when the signal is allowed to change – The goal is the minimal disturbance for human listeners • Technology for end users – Listen to the coded material as is (no further processing, EQ etc.) – Unsuitable for high-quality recordings or archiving • Subband audio coding – MP3, Dolby AC-3, Vorbis, WMA () • Parametric audio coding – HILN (MPEG-4)

29.3.2019 ©2001-2019 Vesa Välimäki 18

9 Friday, March 29, 2019

Applications of Lossy Audio Coding • Portable music players and mobile phones – Also MiniDisk players • Internet audio • Digital TV • • Movie soundtracks

29.3.2019 ©2001-2019 Vesa Välimäki 19

Subband Audio Coding

• A.k.a. perceptual audio coding • Frequency-domain representation of audio signal • Based on a psychoacoustic model – Model of the threshold of hearing – Shape the quantization below the threshold of hearing

29.3.2019 ©2001-2019 Vesa Välimäki 20

10 Friday, March 29, 2019

Subband Audio Coding • General block diagram of subband coder

29.3.2019 ©2001-2019 Vesa Välimäki 21

Time to Frequency Mapping

• Time to frequency mapping techniques

– The simplest technique  (FFT)

– Filter bank technique – Pseudo-Quadrature Mirror Filter bank (PQMF) – Modified Discrete Cosine Transform (MDCT)

29.3.2019 ©2001-2019 Vesa Välimäki 22

11 Friday, March 29, 2019

Filter Bank

• N-channel filter bank  N parallel bandpass filters • Uniform or non-uniform • Magnitude response of a uniform bandwidth N-channel filter bank

29.3.2019 ©2001-2019 Vesa Välimäki 23

Filter Bank

• Analysis-synthesis filter bank  Perfect Reconstruction filter bank

29.3.2019 ©2001-2019 Vesa Välimäki 24

12 Friday, March 29, 2019

Filter Bank

• Down-sampling – Preserves data rate – Problem: limiting the spectral bandwidth  (folding) • Up-sampling – Restores data rate – Problem: expanding the spectral bandwidth  imaging distortion

29.3.2019 ©2001-2019 Vesa Välimäki 25

Pseudo-Quadrature Mirror Filter Bank

• Pseudo-Quadrature Mirror Filter (PQMF) Bank • Design a narrow lowpass filter  Prototype filter • Other filters obtained by cosine of the prototype filter • MPEG-1 and MPEG-2 – 32 channels – Prototype filter of order 512

29.3.2019 ©2001-2019 Vesa Välimäki 26

13 Friday, March 29, 2019

2-Channel PQMF • Design of a 2-channel analysis-synthesis filter bank • Challenge:

– Define the filters H 0 (z), H 1 ( z ) , G 0 ( z ) , G 1 ( z )

– For Perfect Reconstruction:

G0 z  H1 (z),

G1 z  H 0  z .

– and also H1z  H0  z .

29.3.2019 ©2001-2019 Vesa Välimäki 27

Modified DCT Filter Bank • Modified Discrete Cosine Transform Filter Bank • Also called Time-Domain Aliasing Cancellation (TDAC) • Special case of PQMF – Length of the prototype filter is twice that of the PQMF – 50% overlap with the previous frame • Prototype filter  Sine function • Choice of window length – Long window length  good for stationary signal – Short window length  suitable for transients

29.3.2019 ©2001-2019 Vesa Välimäki 28

14 Friday, March 29, 2019

Subband Coding

• General block diagram of subband coder

29.3.2019 ©2001-2019 Vesa Välimäki 29

Psychoacoustics • Absolute threshold of hearing • Masking phenomenon – Simultaneous masking, also called frequency masking – Non-simultaneous masking, also called temporal masking • Critical bandwidth • Spread of masking

29.3.2019 ©2001-2019 Vesa Välimäki 30

15 Friday, March 29, 2019

Sound Pressure Level • Quantity for measuring the sound pressure

P 2 SPL(dB)10log10 ( ) , P0 • P: pressure of sound

• P0: standard pressure level = 20 μPa – Sound pressure level at the hearing threshold, when f = 2 kHz

29.3.2019 ©2001-2019 Vesa Välimäki 31

Limits of Human Hearing

• Frequency range: 20 Hz … 20 kHz • Most sensitive range: 1 kHz … 5 kHz

: 20 dB … 95 dB (for safe listening) • Threshold of pain: About 120 dB

29.3.2019 ©2001-2019 Vesa Välimäki 32

16 Friday, March 29, 2019

Absolute Threshold of Hearing

• Also called Threshold in Quiet • Minimum level of sound of a pure tone perceived by an average human being in noiseless conditions – About 0 dB in mid frequencies – Frequency components below this curve are inaudible

29.3.2019 ©2001-2019 Vesa Välimäki 33

Threshold of Hearing

29.3.2019 ©2001-2019 Vesa Välimäki 34

17 Friday, March 29, 2019

Masking Phenomenon • The most important psychoacoustic concept for transparent audio coding

• Frequency masking – Concurrent masker and maskee

• Temporal masking – Extends beyond the time duration in which the masker occurs

29.3.2019 ©2001-2019 Vesa Välimäki 35

Frequency Masking • The threshold of audibility of one sound is raised in the presence of sound energy at neighboring frequencies

29.3.2019 ©2001-2019 Vesa Välimäki 36

18 Friday, March 29, 2019

Frequency Masking Examples • Examples of the raising of the hearing threshold Sine Narrow-band noise

White noise

29.3.2019 ©2001-2019 Vesa Välimäki 37

Masking Curves • Four masking curves for measurements – Narrow-band noise masking narrow-band noise (NMN) – Narrow-band noise masking tone (NMT) – Tone masking narrow-band noise (TMN) – Tone masking tone (TMT)

29.3.2019 ©2001-2019 Vesa Välimäki 38

19 Friday, March 29, 2019

Narrow-Band Noise Masking Tones (NMT) • Shape of the masking curve depends on frequency

29.3.2019 ©2001-2019 Vesa Välimäki 39

Narrow-Band Noise Masking Tones (NMT) • Shape of the masking curve also depends on sound level

29.3.2019 ©2001-2019 Vesa Välimäki 40

20 Friday, March 29, 2019

Tone Masking Tone (TMT)

29.3.2019 ©2001-2019 Vesa Välimäki 41

Temporal Masking • Difficult to employ in audio coding

a few ms ~ 100 ms

29.3.2019 ©2001-2019 Vesa Välimäki 42

21 Friday, March 29, 2019

Critical Bandwidth • The frequency range around a masker frequency, in which the masking curve remains flat  Critical Bandwidth • Each Critical Bandwidth corresponds to a constant distance on the basilar membrane • Unit of  Bark z / Bark  13arctan(0.76 f / kHz)  3.5arctanf / 7.5kHz 2 .

29.3.2019 ©2001-2019 Vesa Välimäki 43

Critical Bands on a Linear Frequency Scale

1

0.8

0.6

0.4

0.2

0 0 5 10 15 20 FrequencyLineaarinen taajuus (kHz) (kHz)

29.3.2019 ©2001-2019 Vesa Välimäki 44

22 Friday, March 29, 2019

Critical Bands on a Log Frequency Scale 1

0.8 • Constant bandwidth up to 0.6 500 Hz; then 1/3 octave 0.4

0.2

0 -1 0 1 10 10 10 LFrequencyogaritminen taajuus (kHz) (kHz)

29.3.2019 ©2001-2019 Vesa Välimäki 45

Spread of Masking • The effect of frequency masking is not limited to within one critical bandwidth • Various analytical masking spread functions models – Triangle function – Schroeder function –…

29.3.2019 ©2001-2019 Vesa Välimäki 46

23 Friday, March 29, 2019

Perceptual Entropy & Bit Allocation

• Perceptual Entropy  lower bound of the number of bits to have transparent quality • Bit Allocation – Allocate bit numbers according to the signal-to-mask ratio (SMR) – Noise-to-mask-ratio (NMR) remains below the masking threshold • It is well known that SNR (signal-to-noise ratio) is not a meaningful measure in perceptual coding

29.3.2019 ©2001-2019 Vesa Välimäki 47

13-dB Miracle

• Johnston and Brandenburg showed in 1991 that you can hide in an audio signal much noise, when it is shaped with the masking curve • When the SNR per frame is only 13 dB, the noise remains inaudible! • Example: original, 13-dB SNR, 0-dB SNR, -10-dB SNR, noise only

For comparison, unmasked noise at 13-dB SNR

Demo by Kai Jussila & Pekka Rönkkö, Aalto Univ. 2012

©2001-2019 Vesa Välimäki 29.3.2019 48

24 Friday, March 29, 2019

Parametric Audio Coding • Based on parametric modeling of audio signals – Cf. Parametric sound synthesis using the MIDI format • Very low bit-rate applications – Mobile multimedia/gaming, internet streaming – 40 kbits/s and lower • HILN (Harmonic and Individual Lines plus Noise) – Sinusoids plus noise modeling – Parametric audio coding within the MPEG-4 standard – Minimum : 4 kbit/s

29.3.2019 ©2001-2019 Vesa Välimäki 49

HILN Encoder

29.3.2019 ©2001-2019 Vesa Välimäki 50

25 Friday, March 29, 2019

Parametric Model of Audio Source in HILN

• Decomposition of audio signal its components – Individual sinusoid  frequency and amplitude

– Harmonic tone  fundamental frequency, amplitude, spectral envelope of partials

–Noise  amplitude and spectral envelope

– Transients  optional parameter, such as temporal envelope

29.3.2019 ©2001-2019 Vesa Välimäki 51

HILN Perception Model • Different from the perception model used in subband coding

• Effect of parameter deviation on signal quality – Bit allocation for quantization

• Influence of different parameters on the quality of decoded signal – Choice of model parameter for transmission

29.3.2019 ©2001-2019 Vesa Välimäki 52

26 Friday, March 29, 2019

HILN Decoder • Harmonics + sines + noise synthesis

29.3.2019 ©2001-2019 Vesa Välimäki 53

MPEG Standards 1988- • The MPEG working group (Moving Picture Experts Group) focuses on international standardisation of and audio technology – Official name: ISO/IEC JTC1 SC29 WG11 • The most well known standards are MPEG-1 ja MPEG-2 – MPEG-1 Layer 3 = MP3 – MPEG-1 Layer 2 is used in the European digital radio standard (DAB) • Also MPEG-4, MPEG-7, and MPEG-21 – New multimedia standards, not only coding • MPEG-D (2007): MPEG audio technologies – Includes MPEG Surround, Spatial Audio Object Coding (SAOC), and Unified Speech and Audio Coding (USAC)

29.3.2019 ©2001-2019 Vesa Välimäki 54

27 Friday, March 29, 2019

MP3 • MPEG-1 layer 3 is MP3 – Old technology; standard from 1991 • Still a most common for ”almost” CD-quality audio

29.3.2019 ©2001-2019 Vesa Välimäki 55

MP3

Ref. K. Brandenburg, 1999 29.3.2019 ©2001-2019 Vesa Välimäki 56

28 Friday, March 29, 2019

Examples of MP3 Audio at Various Bit-rates • One of the standard MPEG test signals: Suzanne Vega, “Tom’s Diner” (1987) - Duration 2 min 11 sec

Bit-rate  Compression Bit/sample   Quality 1411 kbit/s 1:1 16 22565 kB Original CD 128 kbit/s 1:11 1.5 2048 kB MP3 CD quality- 96 kbit/s 1:15 1.1 1536 kB Almost CD -quality 64 kbit/s 1:22 0.73 1026 kB FM- radio quality 32 kbit/s 1:44 0.36 514 kB Very low 8 kbit/s 1:176 0.09 130 kB Not recommended 

• Signal-to-noise ratio does not describe well the of lossy audio coding, since errors are not only noise

29.3.2019 ©2001-2019 Vesa Välimäki 57

Typical Problems in MP3 Encoded Signals • Pre-, limited frequency range, additional noise – Castanets are a well known problematic signal

4 4 x 10 Original x 10 MP3 128kbps pre-echo

2 2

1.5 1.5

1 1 Frequency / kHz Frequency / kHz

0.5 0.5

0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Time / s Time / s 29.3.2019 ©2001-2019 Vesa Välimäki 58

29 Friday, March 29, 2019

MPEG-2 AAC • AAC = – One of the most popular audio codecs (phones, YouTube, Apple, PlayStation…) • Second-generation MPEG audio coding standard from 1997 – In half-rate mode, the bit-rate is 50% of that of MPEG-1 Layer 1 at the same quality – MP3 quality at 30% lower bit-rate (96 kbit/s) • Multi-channel sound – Mono/stereo/multi-channel – 1-48 channels and 0-16 effects channels (e.g. 5.1 movie sound) • Many sample rates between 8 and 96 kHz – 8 kHz, 24 kHz, 32 kHz, 44,1 kHz, 48 kHz, 64 kHz, 88,2 kHz, 96 kHz • Various bit-rates and variable bit-rate – For stereo signals: 16 ... 192 kbit/s 29.3.2019 ©2001-2019 Vesa Välimäki 59

New Features in MPEG-2 AAC • Improved solutions over MP3 – Better frequency resolution: max 1024 bins (in MP3 only 576) – Multi-symbol , in which four frequency points are combined • Cleaner transition of frames – Modified Discrete Cosine Transform (MDCT) Filter Bank in which the impulse response length is only 5.3 ms (in MP3 18.6 ms) – Reduces pre-echo • Temporal (TNS) – Let the noise level raise in those parts of the frame, where it is not heard • Original vs. AAC (mono, 32 kb/s)

29.3.2019 ©2001-2019 Vesa Välimäki 60

30 Friday, March 29, 2019

Other Lossy Coding Methods • Dolby AC-3 or ”Dolby Digital” – 5.1 movie sound: left, center, right, left-surround, right-surround, and LFE (Low-Frequency Effects, low frequencies < 120 Hz) • Dolby E – A codec for professional systems, such as TV production • DTS (Digital Theatre Systems) – Movie and home theatre sound • ATRAC (Adaptive Transform Acoustic Coding) and SDDS – ATRAC 4 is the coding method for MiniDisc (compression ratio only 1:5) • Windows Media Audio • PAC (versions 1-4) and EPAC – PAC = Perceptual ; EPAC = Enhanced PAC – Used in the US for commercial broadcasting (XM, Sirius)

29.3.2019 ©2001-2019 Vesa Välimäki 61

5.1 Home Theatre Sound • Home theatre imitates the sound system of movie theatres Center

Left Right

LFE () Left surround Right surround

29.3.2019 ©2001-2019 Vesa Välimäki 62

31 Friday, March 29, 2019

Ogg Vorbis • Open source, IPR free (no patents) • Almost the same quality as MP3, but depends on audio material – Pre- are in short transients (e.g. castanets) • Variable bit-rate, but can be chosen to be practically constant – 45…500 kbit/s (stereo) • Hardware support in early portable music players – 20GB, iAudio X5 • and Wikipedia use Ogg Vorbis! – 96…320 kbit/s (stereo) • Used also in many games (Epic Games)

Source: http://en.wikipedia.org/wiki/Vorbis 29.3.2019 ©2001-2019 Vesa Välimäki 63

Subjective Comparison • The quality differences between audio coding methods can be best evaluated with listening tests • Top 5 audio codecs according to a listening test (in comparison against CD) (Soulodre et al., Journal of AES, 1998) 1. MPEG-2 AAC 2. Lucent PAC 3. MP3 (MPEG-1 Layer 3) 4. AC-3 5. MPEG-1 Layer 2 • Also other factors must be accounted for – Computational cost, sensitivity to bit errors, compatibility with other systems 29.3.2019 ©2001-2019 Vesa Välimäki 64

32 Friday, March 29, 2019

Audio Codec Comparison from Year 1998 • It is not easy to evaluate the results – Quality depends on bit-rate • Coders are designed for different purposes (bit-rates)

Source: http://www.aac-audio.com/technology/aac.rp.0002.xprtLsnr.html 29.3.2019 ©2001-2019 Vesa Välimäki 65

Bandwidth Expansion • Bandwidth extension is used in some perceptual coders – mp3PRO, AAC+, AMR-WB+ (Adaptive Multi-Rate Wideband by ) • The highest octave is not coded, but only some features are transmitted – This trick alone can reduce the bit rate by almost 50%!!! • Reproduced by allowing lower frequencies to image (spectral band replication)

29.3.2019 ©2001-2019 Vesa Välimäki 66

33 Friday, March 29, 2019

Bandwidth Expansion in mp3PRO

Source: Ziegler et al., 2002 29.3.2019 ©2001-2019 Vesa Välimäki 67

New Now and in Near Future • Low- coding – Usually coders have too much delay for two-directional communication – MPEG-4 AAC-LC • Flexible multi-channel audio coding – Binaural Cue Coding (BCC): transmit only one channel and low bit-rate side (Level/Time-Difference/Correlation) – Directional Audio Coding (DirAC) by Pulkki et al. (Aalto Univ.) – Number and position of can be chosen freely

29.3.2019 ©2001-2019 Vesa Välimäki 68

34 Friday, March 29, 2019

Bluetooth Audio • The bitrate in Bluetooth transmission (about 500 kbit/s) is insufficient for full-blown hi-fi audio etc. => Must !

Voice Hands-free headsets, Music wireless Voice-based headphones navigation and speakers Ring tones

29.3.2019 ©2001-2019 Vesa Välimäki 69

Bluetooth Audio • The SBC (Sub-Band Codec) is supported for bitrates 132…345 kbit/s (de Bont et al., 1995) – Much simpler than MP3 and other CD-quality coders – 4 or 8 subbands – Various sample rates 16…48 kHz, mono or stereo – Low-: even 16 samples • APT-X lossless audio codec can be used for high-quality applications, such as for movie soundtracks

29.3.2019 ©2001-2019 Vesa Välimäki 70

35 Friday, March 29, 2019

Unified Speech and Audio Coding (USAC) • A fairly recent codec (April 2012) for both music and speech using low bit rates 12 … 64 kbit/s • Part of the MPEG-D Part 3 standard (2007) • Uses LPC and residual coding for speech • Uses MDCT-based tools for audio • Includes MPEG-4 Spectral Band Replication, MPEG Surround (for multi-channel audio coding), and Parametric Stereo • Performs as well or better than the previous best speech or audio codes

29.3.2019 ©2001-2019 Vesa Välimäki 71

Opus • An open source, IETF audio coding standard for interactive real-time applications over the Internet (Internet Engineering Task Force, Sept. 2012) • Opus uses ’s VoIP speech codec SILK (linear prediction) and CELT (Constrained-Energy ), a low- latency MDCT codec – Either or both can be used – When both are used, SILK codes the lower frequency band (< 8 kHz) and CELT codes the higher frequency band (8-20 kHz) • Very low latency: 22.5 ms by default

Source: Antti Pakarinen, MSc thesis, 2012

29.3.2019 ©2001-2019 Vesa Välimäki 72

36 Friday, March 29, 2019

Sound Examples: USAC and Opus • Band – Original – USAC 64 kbps – Opus 64 kbps – Opus 14 kbps (mono) • Bells – Original – USAC 64 kbps – Opus 64 kbps – Opus 14 kbps (mono)

Source: Antti Pakarinen, MSc thesis, 2012

29.3.2019 ©2001-2019 Vesa Välimäki 73

Use of Lossy Audio Coding • Only suitable for end users – For audio that are listened to without further processing – Real-time streaming over a slow network – Do not equalize! – Errors may become audible with low-quality loudspeakers – It is not allowed to re-compress (tandem) • Very useful DSP technology, when either the bit-rate or memory capacity is limited

29.3.2019 ©2001-2019 Vesa Välimäki 74

37 Friday, March 29, 2019

Conclusion • The best compression ratio is obtained with lossy coding – Part of the signal is discarded – accounting for the limitations of hearing • Lossless coding is an alternative for archiving precious audio – 50% reduction is obtained using statistical properties of audio signals • The most common method has been MPEG-1 Layer 3 (MP3) – Later perhaps mp3PRO or AAC+ or something else • Audio coding is useful in many applications – Phones, streaming, digital TV and radio, movies, wireless audio • CD quality can now be obtained with less than 1 bit per sample – Almost at 48 kbit/s (stereo)

29.3.2019 ©2001-2019 Vesa Välimäki 75

Literature (1) • F. de Bont, M. Groenewegen and W. Oomen, “A High Quality Audio-Coding System at 128 kb/s,“ presented at the 98th AES Convention, Paris, France, Feb. 25-28, 1995. • , “MP3 and AAC explained,” in Proc. AES 17th International Conference on High Quality Audio Coding, 1999. • Josh Coalson, “FLAC - Free Lossless Audio Codec”, 2005. http://flac.sourceforge.net/index.html. • M. Hans and R. Schafer, “ of Digital Audio,” IEEE Signal Processing Magazine, pp. 21- 32, July 2001 • J. D. Johnston, “Perceptual coding of audio signals—a tutorial,” presented at the AES Convention, New York, Sept. 1997. • P. Knoll, “MPEG digital audio coding,” IEEE Signal Processing Magazine, vol. 14, no. 5, pp. 59– 81, July 1997. • T. Painter and A. Spanias, “Perceptual coding of digital audio,” Proc. IEEE, vol. 88, no. 4, pp. 451–513, April 2000. • Antti Pakarinen, Multi-core Platforms for Audio and Multimedia Coding Algorithms in , Master’s thesis, Aalto University, Espoo, Finland, 2012. • K. . Pohlmann, Principles of Digital Audio. Fourth Edition, McGraw-Hill, 2000.

29.3.2019 ©2001-2019 Vesa Välimäki 76

38 Friday, March 29, 2019

Literature (2) • T. Ziegler, A. Ehret, P. Ekstrand, and M. Lutzky, “Enhancing with SBR: Features and capabilities of the new mp3PRO ,” Audio Engineering Society Convention 5560, Munich, May 2002. • Udo Zölzer, Digital , , 1997. Chapter 9, “”, pp. 249–265.

29.3.2019 ©2001-2019 Vesa Välimäki 77

Homework #5

• Description: Analyze your use of time during the course with respect to the schedule you planned earlier (Homework #1). o Did anything go wrong? Why? o Was the original time plan realistic? o Were there tasks that demanded much more/less effort than expected? o Matlab implementations. How much time did they take w.r.t. the whole learning diaries? o All feelings welcomed!! • Due date: Wednesday, April 3rd, 2019, at 16.00 • Length should be about 1 page.

©2001-2019 Vesa Välimäki 29.3.2019 78

39 Friday, March 29, 2019

Course feedback

• Please give feedback about the Audio Signal Processing course https://www.webropolsurveys.com/... • The deadline is April 11, 2019

©2001-2019 Vesa Välimäki 29.3.2019 79

40