MPEG-4 Speech coding
Masayuki Nishiguchi
Audio and Speech Group, HomeNet Processing Lab, HomeNet Laboratories, Sony Corporation
1 OutlineOutline
Background MPEG-4 Speech coding - features MPEG-4 CELP Algorithm Performance Demonstration MPEG-4 HVXC Algorithm Performance Demonstration Summary
2 BackBackggroundround
Most of the existing speech coding standards support only a single “compression” functionality.
Speech coding algorithms with high coding efficiency and multiple functionalities play important role for efficient use of bandwidth and emerging new applications of multimedia systems.
3 MPEG-4MPEG-4 SpeechSpeech CodinCodingg -- featuresfeatures
• Two basic algorithms - HVXC (Harmonic Vector eXcitation Coding) - CELP (Code Excited Linear Prediction)
• Multi bit-rates - 1.5 ~ 24 kbps
• Narrow-band and wide-band - CELP
• Lowest bit-rate as an international standard coding - HVXC - 2.0 kbps (fixed) ave 1.5 kbps (var)
• New Functionalities - Speed / Pitch change - HVXC - Bit-rate scalability - HVXC, CELP - Bandwidth scalability - CELP
4 Quality MPEG-4 version-1 Natural Audio GA (AAC, TwinVQ) CD
FM
AM
Telephone HVXC Cellular CELP (NB-CELP, WB-CELP) phone
2 4 8 16 32 64
Bit-rate (kbps)
5 MPEG-4MPEG-4 CELPCELP
Narrow band 3.85-12.2 kbps 10-40 ms frame
Wide band 10.9-23.8 kbps 10-20 ms frame
Multi-rate 200 - 800 bps step
Bit-rate scalability - 2.0kbps(NB), 4.0kbps(WB) step
Bandwidth scalability
Fine rate control
Regular pulse - WB: Low complexity
Multi pulse - WB, NB: High coding efficiency
6 BlockdiaBlockdiaggramram ofof thethe CELPCELP encoderencoder
Speech Input LPC parameters LPC LSP Coefficient Bit-rate Interpolation Analysis VQ Excitation Input parameters
Codebook MPE/RPE Long tern LPC Control Codebook Syn.Filter Syn.Filter -
Weighted Error Calc.
7 StructureStructure ofof thethe bit-ratebit-rate scalablescalable codincodingg
Basic 6kbps 6kbps speech Decoder-1 2kbps Speech Input 2kbps Encoder 8kbps High Quality 2kbps speech Decoder-2 10kbps
12kbps High Quality speech Decoder-3
22kbps Wideband speech Decoder-4
8 PerformancePerformance Speech quality of MPEG-4 speech codecs were evaluated in the official MPEG-4 verification tests in Aug 98 at 2 European labs and 1 Japanese lab*.
15 Japanese items were evaluated by 16 Japanese listeners in the Japanese Lab.
15 English/German/Swedish items were evaluated by 18 German and 16 Finnish listeners in the European Labs.
* ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998
9 PerformancePerformance
MOS 95%CI Narrow band CELP - Japanes e 5 GSM- EFR MNRU CELP Scalable CELP 12.2 kbps 40 dB 4 G.729 MNRU 8.0 kbps 30 dB 12.0 kbps 12.0 kbps 3 8.3 kbps 8.0 kbps 6.0 kbps G.723.1 MNRU 20 dB 6.3 kbps 2 MNRU 10 dB 1
ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998 10 MOS 95%CI Wideband CELP - Japanes e 5
MP E G.722 MNRU 17.9 kbps 56.0 kbps Layer III 40 dB 4 G.722 24 kbps 48.0 kbps MNRU BW Scalable 30 db 3 RPE 16.0 kbps 18.1 kbps MNRU 20 dB 2 MNRU 10 dB 1
ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998
11 MPEG-4MPEG-4 CELPCELP demonstrationdemonstration
6 kbps NB CELP
12 kbps NB CELP
22 kbps WB CELP (BW-scalable)
CELP Demo samples are generated by NEC
12 MPEG-4MPEG-4 HVXCHVXC
Low bit-rate / good quality - 2.0 / 4.0kbps (fixed) , 1.5 / 3.0kbps ave. (variable) - HVXC at 2.0kbps has higher quality than FS1016 CELP at 4.8kbps
Bit-rate scalability - 2.0kbps decoding is possible using 4.0kbps bit-stream
Speed change & Pitch change - Attractive for fast speech database search & browsing
13 ApproachApproach
Two different types of coding schemes are combined. One is suitable for voiced segments and the other for unvoiced segments.
Voiced: Phase information is thrown away by harmonic representation of power spectrum of LPC residual. Frequency domain analysis / synthesis. Unvoiced: Crisp consonant is obtained by CELP coding. Time domain analysis / synthesis.
14 EncoderEncoder LSP
V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ
LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch
Dimension Weighted VQ Spectral conversion shape&gain
CELP coding Stochastc codebook shape & gain - Unvoiced - 15 EncoderEncoder LSP
V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ
LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch
Dimension Weighted VQ Spectral conversion shape&gain
CELP coding Stochastc codebook shape & gain - Unvoiced - 16 EncoderEncoder LSP
V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ
LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch
Dimension Weighted VQ Spectral conversion shape&gain
CELP coding Stochastc codebook shape & gain - Unvoiced - 17 EncoderEncoder LSP
V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ
LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch
Dimension Weighted VQ Spectral conversion shape&gain
CELP coding Stochastc codebook shape & gain - Unvoiced - 18 EncoderEncoder LSP
V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ
LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch
Dimension Weighted VQ Spectral conversion shape&gain
CELP coding Stochastc codebook shape & gain - Unvoiced - 19 HarmonicHarmonic spectralspectral mamaggnitudesnitudes andand finefine pitchpitch estimationestimation
Magnitude Pitch frequency
Harmonic spectral envelope
Freq
20 EncoderEncoder LSP
V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ
LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch
Dimension Weighted VQ Spectral conversion shape&gain
CELP coding Stochastc codebook shape & gain - Unvoiced - 21 DimensionDimension conversionconversion ofof HarmonicHarmonic spectralspectral mamaggnitudesnitudes
Magnitude
Magnitude
Frequency
Magnitude Frequency
Frequency 22 EncoderEncoder LSP
V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ
LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch
Dimension Weighted VQ Spectral conversion shape&gain
CELP coding Stochastc codebook shape & gain - Unvoiced - 23 VectorVector quantizationquantization ofof harmonicharmonic spectralspectral envelopeenvelope -- basebase layerlayer --
Shape Codebook Fixed dimension harmonic spectrum -0
Gain
Shape Weighting Codebook -1 Energy estimation
24 Scalable vector quantization of spectral envelope - base & enhancement layer -
Index VQ of SE Shape0 SE Gain + VQ of SE Shape1 Index - Weighted + distortion Index DimensionDimension Dimension ConversionConversion Conversion Input - +
VQ of VQ of VQ of SE SE . . . . SE Shape2 Shape3 Shape5 Index Index Index 25 EncoderEncoder LSP
V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ
LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch
Dimension Weighted VQ Spectral conversion shape&gain
CELP coding Stochastc codebook shape & gain - Unvoiced - 26 Scalable CELP encoder for unvoiced segments - base and enhancement layer -
Input LPC Speech Analysis Perceptual Weighting Filter VQ of LSP W(z) and Subtraction of Stochastic Codebook zero- Input response of H(z) + 6bits H(z) - +
Perceptually GainCodebook Weighted Calculation LPC syn. Filter of Error 4bits Quantization Error
Stochastic Codebook + 5bits H(z) - + Perceptually Weighted GainCodebook LPC syn. Filter Calculation of 3bits Error
27 DecoderDecoder
LSP LSP Inv. VQ
V / UV / MV Noise Pitch generation
Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control
Stochastic gain 28 DecoderDecoder
LSP LSP Inv. VQ
V / UV / MV Noise Pitch generation
Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control
Stochastic gain 29 DecoderDecoder
LSP LSP Inv. VQ
V / UV / MV Noise Pitch generation
Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control
Stochastic gain 30 DecoderDecoder
LSP LSP Inv. VQ
V / UV / MV Noise Pitch generation
Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control
Stochastic gain 31 DecoderDecoder
LSP LSP Inv. VQ
V / UV / MV Noise Pitch generation
Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control
Stochastic gain 32 HarmonicHarmonic synthesissynthesis forfor voicedvoiced excitationexcitation t
f
f
()= ()()θ () ft∑ Amm tcos t θωττm ()=+t ( ) φ mmtd∫ 0 0
33 DecoderDecoder
LSP LSP Inv. VQ
V / UV / MV Noise Pitch generation
Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control
Stochastic gain 34 ParameterParameter interpolationinterpolation forfor speedspeed controlcontrol arrays of original parameters : param[ n] arrays of interpolated parameters :mdf _ param[ m] time index before the time scale modification : n time index after the time scale modification : m ratio of speed change : spd spd > 1 speed up spd < 1 speed down
define: define: = − fr= m * spd − 1 l m * spd fr0 0 = + r = − fr10 fr 1 fr1 m * spd
time scale modified parameters are approximated as:
[ ] mdf_ param m= param[ fr0] * r + param[ fr 1] * l
35 HarmonicsHarmonics spectraspectra interpolationinterpolation forfor speedspeed controlcontrol
Normal speed
Speed up 36 HarmonicsHarmonics spectraspectra interpolationinterpolation forfor speedspeed controlcontrol
Speed down 37 PerformancePerformance Speech quality of MPEG-4 speech codecs were evaluated in the official MPEG-4 verification tests in Aug 98 at 2 European labs and 1 Japanese lab*.
15 Japanese items were evaluated by16 Japanese listeners in the Japanese Lab.
15 English/German/Swedish items were evaluated by 18 German and 16 Finnish listeners in the European Labs.
* ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998
38 HVXC - J apane s e MOS 9 5 %CI 5 MNRU 40 dB 4 HVXC HVXC MNRU 4.0 kbps 30 dB 2.0 kbps 3 MNRU 20 dB 2 FS1016 MNRU 4.8 kbps 10 dB 1
ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998
39 MPEG-4MPEG-4 HVXCHVXC DemonstrationDemonstration
FS1016 4.8kbps CELP
2kbps HVXC
4kbps HVXC
40 DemonstrationDemonstration
Real time software decode by PC
4kbps HVXC pitch change
4kbps HVXC speed change
41 SummarySummary
• HVXC at 2.0kbps and 4.0kbps > FS1016 CELP at 4.8 kbps.
• NB CELP ≈ existing standards at the same bit-rate ranges providing flexible bit-rate controllability and scalability.
• WB CELP at 18kbps ≈ G.722 at 48 to 56 kbps.
• MPEG-4 speech coding provides new functionalities
- speed and pitch change - bit-rate / bandwidth scalability - bit-rate controllability
• International Standard in November 1999 42 References
[1]ISO/IEC JTC1/SC29/WG11 N2503, "Final Draft International Standard of ISO/IEC 14496-3", Dec. 1998 [2]M.Nishiguchi, K.Iijima, J.Matsumoto, "Harmonic Vector Excitation Coding of Speech at 2.0 kbps,” IEEE Workshop on Speech Coding, Sep.1997 [3]T.Nomura, M.Iwadare, M.Serizawa, K.Ozawa, “A Bit rate and Bandwidth Scalable CELP coder,” Proc. ICASSP-98, pp.I-341-344, May. 1998 [4]T.Nomura, M.Iwadare, N.Tanaka,”MPEG-4/CELP speech coding Algorithm,” Tech. Report of IEICE, SP98-89, Nov. 1998 [5]M.Nishiguchi, A.Inoue, Y.Maeda, J.Matsumoto,” Parametric Speech Coding – HVXC at 2.0-4.0 kbps,”IEEE Workshop on Speech Coding, June 1999 [6]N.Tanaka, et al.,”A Multi-mode Variable Rate Speech Coder for CDMA Cellular Systems”, Proc. IEEE VTC pp.198-202, Apr.1996 [7]D.W.Griffin and J.S.Lim, "Multiband Excitation Vocoder," IEEE Trans. ASSP, Vol.36, pp.1223-1235, Aug. 1988 [8]M.Nishiguchi, J.Matsumoto, S.Ono, R.Wakatsuki, "Vector Quantized MBE with Simplified V/UV Division at 3.0Kbps," Proc. ICASSP-93, pp.II-151-154, Apr.1993 [9]M.Nishiguchi, J.Matsumoto, "Harmonic and Noise Coding of LPC Residuals with Classified Vector Quantization," Proc. ICASSP-95, pp.I-484-487, May 1995 [10]M.Nishiguchi, K.Iijima, J.Matsumoto, ”Low bit rate speech coding by Harmonic Vector Excitation Coding,” Proc .ASJ 1-2-4,Sep 1997 [11]ISO / IEC JTC1 / SC29 / WG11 MPEG98 / N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998
43 END
44