MPEG-4

Masayuki Nishiguchi

Audio and Speech Group, HomeNet Processing Lab, HomeNet Laboratories, Sony Corporation

1 OutlineOutline

Background MPEG-4 Speech coding - features MPEG-4 CELP Algorithm Performance Demonstration MPEG-4 HVXC Algorithm Performance Demonstration Summary

2 BackBackggroundround

Most of the existing speech coding standards support only a single “compression” functionality.

Speech coding algorithms with high coding efficiency and multiple functionalities play important role for efficient use of bandwidth and emerging new applications of multimedia systems.

3 MPEG-4MPEG-4 SpeechSpeech CodinCodingg -- featuresfeatures

• Two basic algorithms - HVXC (Harmonic Vector eXcitation Coding) - CELP (Code Excited Linear Prediction)

• Multi bit-rates - 1.5 ~ 24 kbps

• Narrow-band and wide-band - CELP

• Lowest bit-rate as an international standard coding - HVXC - 2.0 kbps (fixed) ave 1.5 kbps (var)

• New Functionalities - Speed / Pitch change - HVXC - Bit-rate scalability - HVXC, CELP - Bandwidth scalability - CELP

4 Quality MPEG-4 version-1 Natural Audio GA (AAC, TwinVQ) CD

FM

AM

Telephone HVXC Cellular CELP (NB-CELP, WB-CELP) phone

2 4 8 16 32 64

Bit-rate (kbps)

5 MPEG-4MPEG-4 CELPCELP

Narrow band 3.85-12.2 kbps 10-40 ms frame

Wide band 10.9-23.8 kbps 10-20 ms frame

Multi-rate 200 - 800 bps step

Bit-rate scalability - 2.0kbps(NB), 4.0kbps(WB) step

Bandwidth scalability

Fine rate control

Regular pulse - WB: Low complexity

Multi pulse - WB, NB: High coding efficiency

6 BlockdiaBlockdiaggramram ofof thethe CELPCELP encoderencoder

Speech Input LPC parameters LPC LSP Coefficient Bit-rate Interpolation Analysis VQ Excitation Input parameters

Codebook MPE/RPE Long tern LPC Control Codebook Syn.Filter Syn.Filter -

Weighted Error Calc.

7 StructureStructure ofof thethe bit-ratebit-rate scalablescalable codincodingg

Basic 6kbps 6kbps speech Decoder-1 2kbps Speech Input 2kbps Encoder 8kbps High Quality 2kbps speech Decoder-2 10kbps

12kbps High Quality speech Decoder-3

22kbps Wideband speech Decoder-4

8 PerformancePerformance Speech quality of MPEG-4 speech were evaluated in the official MPEG-4 verification tests in Aug 98 at 2 European labs and 1 Japanese lab*.

15 Japanese items were evaluated by 16 Japanese listeners in the Japanese Lab.

15 English/German/Swedish items were evaluated by 18 German and 16 Finnish listeners in the European Labs.

* ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech verification tests,” Oct. 1998

9 PerformancePerformance

MOS 95%CI Narrow band CELP - Japanes e 5 GSM- EFR MNRU CELP Scalable CELP 12.2 kbps 40 dB 4 G.729 MNRU 8.0 kbps 30 dB 12.0 kbps 12.0 kbps 3 8.3 kbps 8.0 kbps 6.0 kbps G.723.1 MNRU 20 dB 6.3 kbps 2 MNRU 10 dB 1

ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998 10 MOS 95%CI Wideband CELP - Japanes e 5

MP E G.722 MNRU 17.9 kbps 56.0 kbps Layer III 40 dB 4 G.722 24 kbps 48.0 kbps MNRU BW Scalable 30 db 3 RPE 16.0 kbps 18.1 kbps MNRU 20 dB 2 MNRU 10 dB 1

ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998

11 MPEG-4MPEG-4 CELPCELP demonstrationdemonstration

6 kbps NB CELP

12 kbps NB CELP

22 kbps WB CELP (BW-scalable)

CELP Demo samples are generated by NEC

12 MPEG-4MPEG-4 HVXCHVXC

Low bit-rate / good quality - 2.0 / 4.0kbps (fixed) , 1.5 / 3.0kbps ave. (variable) - HVXC at 2.0kbps has higher quality than FS1016 CELP at 4.8kbps

Bit-rate scalability - 2.0kbps decoding is possible using 4.0kbps bit-stream

Speed change & Pitch change - Attractive for fast speech database search & browsing

13 ApproachApproach

Two different types of coding schemes are combined. One is suitable for voiced segments and the other for unvoiced segments.

Voiced: Phase information is thrown away by harmonic representation of power spectrum of LPC residual. Frequency domain analysis / synthesis. Unvoiced: Crisp consonant is obtained by CELP coding. Time domain analysis / synthesis.

14 EncoderEncoder LSP

V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ

LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch

Dimension Weighted VQ Spectral conversion shape&gain

CELP coding Stochastc codebook shape & gain - Unvoiced - 15 EncoderEncoder LSP

V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ

LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch

Dimension Weighted VQ Spectral conversion shape&gain

CELP coding Stochastc codebook shape & gain - Unvoiced - 16 EncoderEncoder LSP

V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ

LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch

Dimension Weighted VQ Spectral conversion shape&gain

CELP coding Stochastc codebook shape & gain - Unvoiced - 17 EncoderEncoder LSP

V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ

LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch

Dimension Weighted VQ Spectral conversion shape&gain

CELP coding Stochastc codebook shape & gain - Unvoiced - 18 EncoderEncoder LSP

V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ

LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch

Dimension Weighted VQ Spectral conversion shape&gain

CELP coding Stochastc codebook shape & gain - Unvoiced - 19 HarmonicHarmonic spectralspectral mamaggnitudesnitudes andand finefine pitchpitch estimationestimation

Magnitude Pitch frequency

Harmonic spectral envelope

Freq

20 EncoderEncoder LSP

V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ

LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch

Dimension Weighted VQ Spectral conversion shape&gain

CELP coding Stochastc codebook shape & gain - Unvoiced - 21 DimensionDimension conversionconversion ofof HarmonicHarmonic spectralspectral mamaggnitudesnitudes

Magnitude

Magnitude

Frequency

Magnitude Frequency

Frequency 22 EncoderEncoder LSP

V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ

LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch

Dimension Weighted VQ Spectral conversion shape&gain

CELP coding Stochastc codebook shape & gain - Unvoiced - 23 VectorVector quantizationquantization ofof harmonicharmonic spectralspectral envelopeenvelope -- basebase layerlayer --

Shape Codebook Fixed dimension harmonic spectrum -0

Gain

Shape Weighting Codebook -1 Energy estimation

24 Scalable of spectral envelope - base & enhancement layer -

Index VQ of SE Shape0 SE Gain + VQ of SE Shape1 Index - Weighted + distortion Index DimensionDimension Dimension ConversionConversion Conversion Input - +

VQ of VQ of VQ of SE SE . . . . SE Shape2 Shape3 Shape5 Index Index Index 25 EncoderEncoder LSP

V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ

LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch

Dimension Weighted VQ Spectral conversion shape&gain

CELP coding Stochastc codebook shape & gain - Unvoiced - 26 Scalable CELP encoder for unvoiced segments - base and enhancement layer -

Input LPC Speech Analysis Perceptual Weighting Filter VQ of LSP W(z) and Subtraction of Stochastic Codebook zero- Input response of H(z) + 6bits H(z) - +

Perceptually GainCodebook Weighted Calculation LPC syn. Filter of Error 4bits Quantization Error

Stochastic Codebook + 5bits H(z) - + Perceptually Weighted GainCodebook LPC syn. Filter Calculation of 3bits Error

27 DecoderDecoder

LSP LSP Inv. VQ

V / UV / MV Noise Pitch generation

Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control

Stochastic gain 28 DecoderDecoder

LSP LSP Inv. VQ

V / UV / MV Noise Pitch generation

Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control

Stochastic gain 29 DecoderDecoder

LSP LSP Inv. VQ

V / UV / MV Noise Pitch generation

Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control

Stochastic gain 30 DecoderDecoder

LSP LSP Inv. VQ

V / UV / MV Noise Pitch generation

Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control

Stochastic gain 31 DecoderDecoder

LSP LSP Inv. VQ

V / UV / MV Noise Pitch generation

Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control

Stochastic gain 32 HarmonicHarmonic synthesissynthesis forfor voicedvoiced excitationexcitation t

f

f

()= ()()θ () ft∑ Amm tcos t θωττm ()=+t ( ) φ mmtd∫ 0 0

33 DecoderDecoder

LSP LSP Inv. VQ

V / UV / MV Noise Pitch generation

Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control

Stochastic gain 34 ParameterParameter interpolationinterpolation forfor speedspeed controlcontrol arrays of original parameters : param[ n] arrays of interpolated parameters :mdf _ param[ m] time index before the time scale modification : n time index after the time scale modification : m ratio of speed change : spd spd > 1 speed up spd < 1 speed down

define: define: = −  fr=  m * spd  − 1 l m * spd fr0  0  = + r = −  fr10 fr 1  fr1 m * spd

time scale modified parameters are approximated as:

[ ] mdf_ param m= param[ fr0] * r + param[ fr 1] * l

35 HarmonicsHarmonics spectraspectra interpolationinterpolation forfor speedspeed controlcontrol

Normal speed

Speed up 36 HarmonicsHarmonics spectraspectra interpolationinterpolation forfor speedspeed controlcontrol

Speed down 37 PerformancePerformance Speech quality of MPEG-4 speech codecs were evaluated in the official MPEG-4 verification tests in Aug 98 at 2 European labs and 1 Japanese lab*.

15 Japanese items were evaluated by16 Japanese listeners in the Japanese Lab.

15 English/German/Swedish items were evaluated by 18 German and 16 Finnish listeners in the European Labs.

* ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998

38 HVXC - J apane s e MOS 9 5 %CI 5 MNRU 40 dB 4 HVXC HVXC MNRU 4.0 kbps 30 dB 2.0 kbps 3 MNRU 20 dB 2 FS1016 MNRU 4.8 kbps 10 dB 1

ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998

39 MPEG-4MPEG-4 HVXCHVXC DemonstrationDemonstration

FS1016 4.8kbps CELP

2kbps HVXC

4kbps HVXC

40 DemonstrationDemonstration

Real time software decode by PC

4kbps HVXC pitch change

4kbps HVXC speed change

41 SummarySummary

• HVXC at 2.0kbps and 4.0kbps > FS1016 CELP at 4.8 kbps.

• NB CELP ≈ existing standards at the same bit-rate ranges providing flexible bit-rate controllability and scalability.

• WB CELP at 18kbps ≈ G.722 at 48 to 56 kbps.

• MPEG-4 speech coding provides new functionalities

- speed and pitch change - bit-rate / bandwidth scalability - bit-rate controllability

• International Standard in November 1999 42 References

[1]ISO/IEC JTC1/SC29/WG11 N2503, "Final Draft International Standard of ISO/IEC 14496-3", Dec. 1998 [2]M.Nishiguchi, K.Iijima, J.Matsumoto, "Harmonic Vector Excitation Coding of Speech at 2.0 kbps,” IEEE Workshop on Speech Coding, Sep.1997 [3]T.Nomura, M.Iwadare, M.Serizawa, K.Ozawa, “A and Bandwidth Scalable CELP coder,” Proc. ICASSP-98, pp.I-341-344, May. 1998 [4]T.Nomura, M.Iwadare, N.Tanaka,”MPEG-4/CELP speech coding Algorithm,” Tech. Report of IEICE, SP98-89, Nov. 1998 [5]M.Nishiguchi, A.Inoue, Y.Maeda, J.Matsumoto,” Parametric Speech Coding – HVXC at 2.0-4.0 kbps,”IEEE Workshop on Speech Coding, June 1999 [6]N.Tanaka, et al.,”A Multi-mode Variable Rate Speech Coder for CDMA Cellular Systems”, Proc. IEEE VTC pp.198-202, Apr.1996 [7]D.W.Griffin and J.S.Lim, "Multiband Excitation ," IEEE Trans. ASSP, Vol.36, pp.1223-1235, Aug. 1988 [8]M.Nishiguchi, J.Matsumoto, S.Ono, R.Wakatsuki, "Vector Quantized MBE with Simplified V/UV Division at 3.0Kbps," Proc. ICASSP-93, pp.II-151-154, Apr.1993 [9]M.Nishiguchi, J.Matsumoto, "Harmonic and Noise Coding of LPC Residuals with Classified Vector Quantization," Proc. ICASSP-95, pp.I-484-487, May 1995 [10]M.Nishiguchi, K.Iijima, J.Matsumoto, ”Low bit rate speech coding by Harmonic Vector Excitation Coding,” Proc .ASJ 1-2-4,Sep 1997 [11]ISO / IEC JTC1 / SC29 / WG11 MPEG98 / N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998

43 END

44