3DUW,, 03(* $XGLR
76%.,PDJH&RGLQJDQG'DWD &RPSUHVVLRQ /HFWXUH -|UJHQ$KOEHUJ 03(*$XGLR 2XWOLQH z Psycho-acoustic models z Overview of MPEG-4 Audio z AAC - Advanced Audio Codec z Specialized coders z Synthetic (structured) audio
3V\FKRDFRXVWLFPRGHOV z A psycho-acoustic model tells how humans perceive the sound. z The main feature of the psycho-acoustic model in the compression context is that it tells what parts that we can remove.
+HDULQJ7KUHVKROG
G%
:LOOQRWEHKHDUGDQ\ZD\GLVFDUG
N+]
)UHTXHQF\0DVNLQJ
(QHUJ\
)UHTXHQF\
)UHTXHQF\0DVNLQJ
(QHUJ\
)UHTXHQF\
7HPSRUDO0DVNLQJ
(QHUJ\
%DFNZDUG SUH PDVNLQJ PV
)RUZDUG SRVW PDVNLQJ $SSUR[PV
7LPH 6WURQJVRXQG ´PDVNHU´ 3V\FKRDFRXVWLF0RGHO 'HPR
0XVLFZLWKRXWGLVWRUWLRQ
0XVLFZLWKZKLWHQRLVH
0XVLFZLWKSHUFHSWXDOO\GLVWULEXWHGQRLVH
3DUWVRI03(*$XGLR z General natural z Synthetic audio audio – TTS – AAC – SAOL z BSAC – SASL z TwinVQ z – HILN (parametric) Composition – Mixing z Natural speech – Re-sampling – CELP – 3D-rendering – HVXC (parametric)
3DUWVRI03(*$XGLR FRQW z Error Protection z Error Resilience – CRC – Error resilient – FEC bitstreams z Block code – Error concealment z Convolution code – Interleaving
1DWXUDO$XGLR&RGHUV
4XDOLW\
&' *HQHUDODXGLR $$&7ZLQ94
)0 3DUDPHWULFDXGLR +,/1
$0 3DUDPHWULFVSHHFK +9;&
7HOHSKRQH +LJKTXDOLW\VSHHFK &(/3 &HOOXODU
NELWV
03(*$$& $GYDQFHG$XGLR&RGHU z DCT-based time/frequency coder. z Typically 16 – 64 kbit/s/channel. z ”Expert listener quality” at 128 kbit/s. z Added to MPEG-2, but without MPEG-4 features. z Half the bitrate compared to mp3, mainly due to improved psycho-acoustic model. kbits/s kHz Haydn Tracy Chapman
Mono 16 16
Stereo 32 16 Stereo 64 32
03(*([WHQVLRQV WRWKH$$& z TwinVQ (Transform-domain Weighted Interleave) – Improves performance for low bitrates (6-18 kbit/s). z PNS (Perceptual Noise Substituion) – Allows coding ”noise-like” parts parametrically. z LTP (Long-term prediction) – Allows ”tone-like” parts to be coded with higher accuracy to a lower bitrate.
03(*([WHQVLRQV WRWKH$$& z BSAC (Bit-sliced Arithmetic Coder) – Adds scaleability to the bitstream. – 16 – 64 kbit/s in steps of 1 kbit/s. z Demo:
kbit/s 60
40
20
2WKHU03(*1DWXUDO $XGLR&RGHUV z Speech coders – High bitrate speech coder (CELP) – Low bitrate speech coder (HVXC) z HILN low bitrate parametric coder – Harmonic and Individual Lines plus Noise – 4 - 16 kbit/s – Subband coder that codes each subband as a tone or as shaped noise.
03(*+LJK%LWUDWH 6SHHFK&RGHU z High quality CELP coder. z 8 or 16 kHz sampling (NB or WB mode). z 4 – 24 kbit/s.
&RGHERRNLQGH[N JN V Q
H Q 3HUFHSWXDO /3&ILOWHU ZILOWHU
[N Q
%DVLFSULQFLSOHRI&(/3FRGHU
3&0 XQFRPSUHVVHG NELWV NELWV 03(*/RZ%LWUDWH 6SHHFK&RGHU z HVXC – Harmonic Vector eXcitation Coder. z 8 kHz sampling, 2 – 4 kbit/s. z Down to 1.2 kbit/s in variable rate mode. z Sinusoidal coding for voiced parts and CELP coding for unvoiced part. z HVXC can be combined with HILN. – Automatic switching between the coders – Produces one bitstream.
03(*1DWXUDO$XGLR &RGHUV'HPR
Speech Original Music coder Music coder Speech coder coder audio (TwinVQ) (HILN) (CELP) (HVXC)
6 kbit/s 6 kbit/s 6 kbit/s 2 kbit/s
Speech
Simple music
Complex music
6SHHG&KDQJH z Possibility to decode to arbitrary speed, without changing the pitch.
2ULJLQDO
0XVLFaIDVWHU
6\QWKHWLF$XGLR z TTS – Text-To-Speech – MPEG-4 defines an interface, not the TTS itself z SAOL - Structured Audio Orchestra Language – SAOL describes how to generate instruments z SASL - Structured Audio Score Language – SASL describes which instruments to play when – MIDI is a subset of SASL z Demo: – Orchestra: Initially 80 kB instrument descriptions (SAOL) – While playing: 1 kbit/s (SASL)
%,)6±%LQDU\)RUPDWIRU 6FHQH'HVFULSWLRQ z All the sound you hear is coded at 16 kbit/s. z Initial voice coded using TTS. z Current voice coded using parametric speech coder (HVXC). z Background ”music” coded using Structured Audio. z Post-production specified using BIFS, using the Structured Audio tools.
$6FHQH*UDSK
$XGLR0L[ Mix the sounds
Add reverb $XGLR);
Speech Hand claps $XGLR6RXUFH $XGLR6RXUFH (SA decoder) (CELP-coder)
$XGLR0L[
$XGLR0L[ $XGLR);
$XGLR); $XGLR); $XGLR'HOD\
$XGLR6RXUFH $XGLR6RXUFH $XGLR6RXUFH
Piano Bass (SA) Finger snaps That was the last slide!