MPEG-4 Speech Coding

MPEG-4 Speech coding Masayuki Nishiguchi Audio and Speech Group, HomeNet Processing Lab, HomeNet Laboratories, Sony Corporation 1 OutlineOutline Background MPEG-4 Speech coding - features MPEG-4 CELP Algorithm Performance Demonstration MPEG-4 HVXC Algorithm Performance Demonstration Summary 2 BackBackggroundround Most of the existing speech coding standards support only a single “compression” functionality. Speech coding algorithms with high coding efficiency and multiple functionalities play important role for efficient use of bandwidth and emerging new applications of multimedia systems. 3 MPEG-4MPEG-4 SpeechSpeech CodinCodingg -- featuresfeatures • Two basic algorithms - HVXC (Harmonic Vector eXcitation Coding) - CELP (Code Excited Linear Prediction) • Multi bit-rates - 1.5 ~ 24 kbps • Narrow-band and wide-band - CELP • Lowest bit-rate as an international standard coding - HVXC - 2.0 kbps (fixed) ave 1.5 kbps (var) • New Functionalities - Speed / Pitch change - HVXC - Bit-rate scalability - HVXC, CELP - Bandwidth scalability - CELP 4 Quality MPEG-4 version-1 Natural Audio GA (AAC, TwinVQ) CD FM AM Telephone HVXC Cellular CELP (NB-CELP, WB-CELP) phone 2 4 8 16 32 64 Bit-rate (kbps) 5 MPEG-4MPEG-4 CELPCELP Narrow band 3.85-12.2 kbps 10-40 ms frame Wide band 10.9-23.8 kbps 10-20 ms frame Multi-rate 200 - 800 bps step Bit-rate scalability - 2.0kbps(NB), 4.0kbps(WB) step Bandwidth scalability Fine rate control Regular pulse - WB: Low complexity Multi pulse - WB, NB: High coding efficiency 6 BlockdiaBlockdiaggramram ofof thethe CELPCELP encoderencoder Speech Input LPC parameters LPC LSP Coefficient Bit-rate Interpolation Analysis VQ Excitation Input parameters Codebook MPE/RPE Long tern LPC Control Codebook Syn.Filter Syn.Filter - Weighted Error Calc. 7 StructureStructure ofof thethe bit-ratebit-rate scalablescalable codincodingg Basic 6kbps 6kbps speech Decoder-1 2kbps Speech Input 2kbps Encoder 8kbps High Quality 2kbps speech Decoder-2 10kbps 12kbps High Quality speech Decoder-3 22kbps Wideband speech Decoder-4 8 PerformancePerformance Speech quality of MPEG-4 speech codecs were evaluated in the official MPEG-4 verification tests in Aug 98 at 2 European labs and 1 Japanese lab*. 15 Japanese items were evaluated by 16 Japanese listeners in the Japanese Lab. 15 English/German/Swedish items were evaluated by 18 German and 16 Finnish listeners in the European Labs. * ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998 9 PerformancePerformance MOS 95%CI Narrow band CELP - Japanes e 5 GSM- EFR MNRU CELP Scalable CELP 12.2 kbps 40 dB 4 G.729 MNRU 8.0 kbps 30 dB 12.0 kbps 12.0 kbps 3 8.3 kbps 8.0 kbps 6.0 kbps G.723.1 MNRU 20 dB 6.3 kbps 2 MNRU 10 dB 1 ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998 10 MOS 95%CI Wideband CELP - Japanes e 5 MP E G.722 MNRU 17.9 kbps 56.0 kbps Layer III 40 dB 4 G.722 24 kbps 48.0 kbps MNRU BW Scalable 30 db 3 RPE 16.0 kbps 18.1 kbps MNRU 20 dB 2 MNRU 10 dB 1 ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998 11 MPEG-4MPEG-4 CELPCELP demonstrationdemonstration 6 kbps NB CELP 12 kbps NB CELP 22 kbps WB CELP (BW-scalable) CELP Demo samples are generated by NEC 12 MPEG-4MPEG-4 HVXCHVXC Low bit-rate / good quality - 2.0 / 4.0kbps (fixed) , 1.5 / 3.0kbps ave. (variable) - HVXC at 2.0kbps has higher quality than FS1016 CELP at 4.8kbps Bit-rate scalability - 2.0kbps decoding is possible using 4.0kbps bit-stream Speed change & Pitch change - Attractive for fast speech database search & browsing 13 ApproachApproach Two different types of coding schemes are combined. One is suitable for voiced segments and the other for unvoiced segments. Voiced: Phase information is thrown away by harmonic representation of power spectrum of LPC residual. Frequency domain analysis / synthesis. Unvoiced: Crisp consonant is obtained by CELP coding. Time domain analysis / synthesis. 14 EncoderEncoder LSP V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch Dimension Weighted VQ Spectral conversion shape&gain CELP coding Stochastc codebook shape & gain - Unvoiced - 15 EncoderEncoder LSP V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch Dimension Weighted VQ Spectral conversion shape&gain CELP coding Stochastc codebook shape & gain - Unvoiced - 16 EncoderEncoder LSP V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch Dimension Weighted VQ Spectral conversion shape&gain CELP coding Stochastc codebook shape & gain - Unvoiced - 17 EncoderEncoder LSP V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch Dimension Weighted VQ Spectral conversion shape&gain CELP coding Stochastc codebook shape & gain - Unvoiced - 18 EncoderEncoder LSP V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch Dimension Weighted VQ Spectral conversion shape&gain CELP coding Stochastc codebook shape & gain - Unvoiced - 19 HarmonicHarmonic spectralspectral mamaggnitudesnitudes andand finefine pitchpitch estimationestimation Magnitude Pitch frequency Harmonic spectral envelope Freq 20 EncoderEncoder LSP V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch Dimension Weighted VQ Spectral conversion shape&gain CELP coding Stochastc codebook shape & gain - Unvoiced - 21 DimensionDimension conversionconversion ofof HarmonicHarmonic spectralspectral mamaggnitudesnitudes Magnitude Magnitude Frequency Magnitude Frequency Frequency 22 EncoderEncoder LSP V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch Dimension Weighted VQ Spectral conversion shape&gain CELP coding Stochastc codebook shape & gain - Unvoiced - 23 VectorVector quantizationquantization ofof harmonicharmonic spectralspectral envelopeenvelope -- basebase layerlayer -- Shape Codebook Fixed dimension harmonic spectrum -0 Gain Shape Weighting Codebook -1 Energy estimation 24 Scalable vector quantization of spectral envelope - base & enhancement layer - Index VQ of SE Shape0 SE Gain + VQ of SE Shape1 Index - Weighted + distortion Index DimensionDimension Dimension ConversionConversion Conversion Input - + VQ of VQ of VQ of SE SE . SE Shape2 Shape3 Shape5 Index Index Index 25 EncoderEncoder LSP V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch Dimension Weighted VQ Spectral conversion shape&gain CELP coding Stochastc codebook shape & gain - Unvoiced - 26 Scalable CELP encoder for unvoiced segments - base and enhancement layer - Input LPC Speech Analysis Perceptual Weighting Filter VQ of LSP W(z) and Subtraction of Stochastic Codebook zero- Input response of H(z) + 6bits H(z) - + Perceptually GainCodebook Weighted Calculation LPC syn. Filter of Error 4bits Quantization Error Stochastic Codebook + 5bits H(z) - + Perceptually Weighted GainCodebook LPC syn. Filter Calculation of 3bits Error 27 DecoderDecoder LSP LSP Inv. VQ V / UV / MV Noise Pitch generation Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control Stochastic gain 28 DecoderDecoder LSP LSP Inv. VQ V / UV / MV Noise Pitch generation Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control Stochastic gain 29 DecoderDecoder LSP LSP Inv. VQ V / UV / MV Noise Pitch generation Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control Stochastic gain 30 DecoderDecoder LSP LSP Inv. VQ V / UV / MV Noise Pitch generation Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control Stochastic gain 31 DecoderDecoder LSP LSP Inv. VQ V / UV / MV Noise Pitch generation Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control Stochastic gain 32 HarmonicHarmonic synthesissynthesis forfor voicedvoiced excitationexcitation t f f ()= ()()θ () ft∑ Amm tcos t m θωττ()=+t ( ) φ mmtd∫ 0 0 33 DecoderDecoder LSP LSP Inv. VQ V / UV / MV Noise Pitch generation Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control Stochastic gain 34 ParameterParameter interpolationinterpolation forfor speedspeed controlcontrol arrays of original parameters : param[ n] arrays of interpolated parameters :mdf _ param[ m] time index before the time scale modification : n time index after the time scale modification : m ratio of speed change : spd spd > 1 speed up spd < 1 speed down define: define: = − fr= m * spd − 1 l m * spd fr0 0 = + r = − fr10 fr 1 fr1

MPEG-4 Speech Coding

Speech Coding and Compression

Digital Speech Processing— Lecture 17

Speech Compression Using Discrete Wavelet Transform and Discrete Cosine Transform

Speech Coding Using Code Excited Linear Prediction

An Ultra Low-Power Miniature Speech CODEC at 8 Kb/S and 16 Kb/S Robert Brennan, David Coode, Dustin Griesdorf, Todd Schneider Dspfactory Ltd

Cognitive Speech Coding Milos Cernak, Senior Member, IEEE, Afsaneh Asaei, Senior Member, IEEE, Alexandre Hyaﬁl

International Organisation for Standardisation Organisation Internationale De Normalisation Iso/Iec Jtc1/Sc29/Wg11 Coding of Moving Pictures and Audio

A Novel Speech Enhancement Approach Based on Modified Dct and Improved Pitch Synchronous Analysis

Speech Compression

What Is Ogg Vorbis?

User Manual Contents

Meeting Abstracts