MPEG-4 Speech Coding
Total Page:16
File Type:pdf, Size:1020Kb
MPEG-4 Speech coding Masayuki Nishiguchi Audio and Speech Group, HomeNet Processing Lab, HomeNet Laboratories, Sony Corporation 1 OutlineOutline Background MPEG-4 Speech coding - features MPEG-4 CELP Algorithm Performance Demonstration MPEG-4 HVXC Algorithm Performance Demonstration Summary 2 BackBackggroundround Most of the existing speech coding standards support only a single “compression” functionality. Speech coding algorithms with high coding efficiency and multiple functionalities play important role for efficient use of bandwidth and emerging new applications of multimedia systems. 3 MPEG-4MPEG-4 SpeechSpeech CodinCodingg -- featuresfeatures • Two basic algorithms - HVXC (Harmonic Vector eXcitation Coding) - CELP (Code Excited Linear Prediction) • Multi bit-rates - 1.5 ~ 24 kbps • Narrow-band and wide-band - CELP • Lowest bit-rate as an international standard coding - HVXC - 2.0 kbps (fixed) ave 1.5 kbps (var) • New Functionalities - Speed / Pitch change - HVXC - Bit-rate scalability - HVXC, CELP - Bandwidth scalability - CELP 4 Quality MPEG-4 version-1 Natural Audio GA (AAC, TwinVQ) CD FM AM Telephone HVXC Cellular CELP (NB-CELP, WB-CELP) phone 2 4 8 16 32 64 Bit-rate (kbps) 5 MPEG-4MPEG-4 CELPCELP Narrow band 3.85-12.2 kbps 10-40 ms frame Wide band 10.9-23.8 kbps 10-20 ms frame Multi-rate 200 - 800 bps step Bit-rate scalability - 2.0kbps(NB), 4.0kbps(WB) step Bandwidth scalability Fine rate control Regular pulse - WB: Low complexity Multi pulse - WB, NB: High coding efficiency 6 BlockdiaBlockdiaggramram ofof thethe CELPCELP encoderencoder Speech Input LPC parameters LPC LSP Coefficient Bit-rate Interpolation Analysis VQ Excitation Input parameters Codebook MPE/RPE Long tern LPC Control Codebook Syn.Filter Syn.Filter - Weighted Error Calc. 7 StructureStructure ofof thethe bit-ratebit-rate scalablescalable codincodingg Basic 6kbps 6kbps speech Decoder-1 2kbps Speech Input 2kbps Encoder 8kbps High Quality 2kbps speech Decoder-2 10kbps 12kbps High Quality speech Decoder-3 22kbps Wideband speech Decoder-4 8 PerformancePerformance Speech quality of MPEG-4 speech codecs were evaluated in the official MPEG-4 verification tests in Aug 98 at 2 European labs and 1 Japanese lab*. 15 Japanese items were evaluated by 16 Japanese listeners in the Japanese Lab. 15 English/German/Swedish items were evaluated by 18 German and 16 Finnish listeners in the European Labs. * ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998 9 PerformancePerformance MOS 95%CI Narrow band CELP - Japanes e 5 GSM- EFR MNRU CELP Scalable CELP 12.2 kbps 40 dB 4 G.729 MNRU 8.0 kbps 30 dB 12.0 kbps 12.0 kbps 3 8.3 kbps 8.0 kbps 6.0 kbps G.723.1 MNRU 20 dB 6.3 kbps 2 MNRU 10 dB 1 ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998 10 MOS 95%CI Wideband CELP - Japanes e 5 MP E G.722 MNRU 17.9 kbps 56.0 kbps Layer III 40 dB 4 G.722 24 kbps 48.0 kbps MNRU BW Scalable 30 db 3 RPE 16.0 kbps 18.1 kbps MNRU 20 dB 2 MNRU 10 dB 1 ISO/IEC JTC1/SC29/WG11 MPEG98/N2424 “Report on the MPEG-4 speech codec verification tests,” Oct. 1998 11 MPEG-4MPEG-4 CELPCELP demonstrationdemonstration 6 kbps NB CELP 12 kbps NB CELP 22 kbps WB CELP (BW-scalable) CELP Demo samples are generated by NEC 12 MPEG-4MPEG-4 HVXCHVXC Low bit-rate / good quality - 2.0 / 4.0kbps (fixed) , 1.5 / 3.0kbps ave. (variable) - HVXC at 2.0kbps has higher quality than FS1016 CELP at 4.8kbps Bit-rate scalability - 2.0kbps decoding is possible using 4.0kbps bit-stream Speed change & Pitch change - Attractive for fast speech database search & browsing 13 ApproachApproach Two different types of coding schemes are combined. One is suitable for voiced segments and the other for unvoiced segments. Voiced: Phase information is thrown away by harmonic representation of power spectrum of LPC residual. Frequency domain analysis / synthesis. Unvoiced: Crisp consonant is obtained by CELP coding. Time domain analysis / synthesis. 14 EncoderEncoder LSP V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch Dimension Weighted VQ Spectral conversion shape&gain CELP coding Stochastc codebook shape & gain - Unvoiced - 15 EncoderEncoder LSP V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch Dimension Weighted VQ Spectral conversion shape&gain CELP coding Stochastc codebook shape & gain - Unvoiced - 16 EncoderEncoder LSP V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch Dimension Weighted VQ Spectral conversion shape&gain CELP coding Stochastc codebook shape & gain - Unvoiced - 17 EncoderEncoder LSP V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch Dimension Weighted VQ Spectral conversion shape&gain CELP coding Stochastc codebook shape & gain - Unvoiced - 18 EncoderEncoder LSP V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch Dimension Weighted VQ Spectral conversion shape&gain CELP coding Stochastc codebook shape & gain - Unvoiced - 19 HarmonicHarmonic spectralspectral mamaggnitudesnitudes andand finefine pitchpitch estimationestimation Magnitude Pitch frequency Harmonic spectral envelope Freq 20 EncoderEncoder LSP V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch Dimension Weighted VQ Spectral conversion shape&gain CELP coding Stochastc codebook shape & gain - Unvoiced - 21 DimensionDimension conversionconversion ofof HarmonicHarmonic spectralspectral mamaggnitudesnitudes Magnitude Magnitude Frequency Magnitude Frequency Frequency 22 EncoderEncoder LSP V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch Dimension Weighted VQ Spectral conversion shape&gain CELP coding Stochastc codebook shape & gain - Unvoiced - 23 VectorVector quantizationquantization ofof harmonicharmonic spectralspectral envelopeenvelope -- basebase layerlayer -- Shape Codebook Fixed dimension harmonic spectrum -0 Gain Shape Weighting Codebook -1 Energy estimation 24 Scalable vector quantization of spectral envelope - base & enhancement layer - Index VQ of SE Shape0 SE Gain + VQ of SE Shape1 Index - Weighted + distortion Index DimensionDimension Dimension ConversionConversion Conversion Input - + VQ of VQ of VQ of SE SE . SE Shape2 Shape3 Shape5 Index Index Index 25 EncoderEncoder LSP V / UV V / UV /MV / MV LPCAna. - Voiced - Input LSP VQ LPC Harmonic Pitch Inv. Filter FFT magnitudes detection estimation Pitch Dimension Weighted VQ Spectral conversion shape&gain CELP coding Stochastc codebook shape & gain - Unvoiced - 26 Scalable CELP encoder for unvoiced segments - base and enhancement layer - Input LPC Speech Analysis Perceptual Weighting Filter VQ of LSP W(z) and Subtraction of Stochastic Codebook zero- Input response of H(z) + 6bits H(z) - + Perceptually GainCodebook Weighted Calculation LPC syn. Filter of Error 4bits Quantization Error Stochastic Codebook + 5bits H(z) - + Perceptually Weighted GainCodebook LPC syn. Filter Calculation of 3bits Error 27 DecoderDecoder LSP LSP Inv. VQ V / UV / MV Noise Pitch generation Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control Stochastic gain 28 DecoderDecoder LSP LSP Inv. VQ V / UV / MV Noise Pitch generation Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control Stochastic gain 29 DecoderDecoder LSP LSP Inv. VQ V / UV / MV Noise Pitch generation Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control Stochastic gain 30 DecoderDecoder LSP LSP Inv. VQ V / UV / MV Noise Pitch generation Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control Stochastic gain 31 DecoderDecoder LSP LSP Inv. VQ V / UV / MV Noise Pitch generation Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control Stochastic gain 32 HarmonicHarmonic synthesissynthesis forfor voicedvoiced excitationexcitation t f f ()= ()()θ () ft∑ Amm tcos t m θωττ()=+t ( ) φ mmtd∫ 0 0 33 DecoderDecoder LSP LSP Inv. VQ V / UV / MV Noise Pitch generation Output Harmonic synthesis LPC syn. Spectral Dimension Inv. VQ shape&gain conversion filter Postfilter Stochastic Stochastic Windowing shape codebook Parameter interpolation for speed control Stochastic gain 34 ParameterParameter interpolationinterpolation forfor speedspeed controlcontrol arrays of original parameters : param[ n] arrays of interpolated parameters :mdf _ param[ m] time index before the time scale modification : n time index after the time scale modification : m ratio of speed change : spd spd > 1 speed up spd < 1 speed down define: define: = − fr= m * spd − 1 l m * spd fr0 0 = + r = − fr10 fr 1 fr1