(12) United States Patent (10) Patent No.: US 7.254.533 B1 Jabri Et Al
Total Page:16
File Type:pdf, Size:1020Kb
US00725.4533B1 (12) United States Patent (10) Patent No.: US 7.254.533 B1 Jabri et al. (45) Date of Patent: Aug. 7, 2007 (54) METHOD AND APPARATUS FOR A THIN 3GPP TS 26.090, “Adaptive Multi-Rate (AMR) speech codec: CELP VOICE CODEC Transcoding fuctions”, Release 5.0.0 (Jun. 2002), 3' Generation Partnership Project (3GPP), http://www.3gpp2.org/. (75) Inventors: Marwan A. Jabri, Broadway (AU); 3GPP TS 26.104 “ANSI-C code for the floating-point AMR speech Nicola Chong-White, Chatswood (AU): codec”. Release 5.00, (Jun. 2002) 3" Gerneration Partnership Jianwei Wang, Killarney Heights (AU) Project (3GPP), http://www.3gpp2.org/. 3GPP TS 26.173 “ANSI-C code for the Adaptive Multi-Rate (73) Assignee: Dilithium Networks Pty Ltd., Sydney, Wideband speech codec, (Mar. 2002) 3" Generation Partnership NSW (AU) Project (3GPP), http://www.3gpp2.org/. *) Notice: Subject to anyy disclaimer, the term of this 3GPP TS 26.190 “AMR Wideband speech codec; Transcoding patent is extended or adjusted under 35 Functions (Release 5), 3 Generation Partnership Project (3GPP); U.S.C. 154(b) by 872 days. Dec. 2001, http://www.3gpp2.org/. (21) Appl. No.: 10/688,857 (Continued) Primary Examiner V. Paul Harper (22) Filed: Oct. 17, 2003 (74) Attorney, Agent, or Firm Townsend and Townsend Related U.S. Application Data and Crew LLP (60) Provisional application No. 60/439.366, filed on Jan. (57) ABSTRACT 9, 2003, provisional application No. 60/419,776, filed on Oct. 17, 2002. An apparatus and method for encoding and decoding a voice (51) Int. Cl. signal. The apparatus includes an encoder configured to GOL 9/00 (2006.01) generate an output bitstream signal from an input voice (52) U.S. Cl. ...................... 704/219; 704/230; 704/221; signal. The output bitstream signal is associated with at least 455/413 a first standard of a first plurality of CELP voice compres (58) Field of Classification Search ................. 704/219 sion standards. Additionally, the apparatus includes a See application file for complete search history. decoder configured to generate an output voice signal from (56) References Cited an input bitstream signal. The input bitstream signal is associated with at least a first standard of a second plurality U.S. PATENT DOCUMENTS of CELP voice compression standards. The CELP encoder 6,314.393 B1* 11/2001 Zheng et al. ............... TO4,223 includes a plurality of codec-specific encoder modules. 6,717.955 B1 * 4/2004 Holler ........................ 370/474 Additionally, the CELP encoder includes a plurality of 2002/0028670 A1* 3/2002 Ohsuge ...................... 455,413 generic encoder modules. The CELP decoder includes a 2003/0103524 A1* 6/2003 Hasegawa ................... 370/465 plurality of codec-specific decoder modules. Additionally, OTHER PUBLICATIONS the CELP decoder includes a plurality of generic decoder modules. 3GPP TS 26.073 “ANSI-C code for the Adaptive Multi Rate (AMR) speech codec”. Release 5.00, (Mar. 2002) 3' Generation Partner ship Project (3GPP), http://www.3gpp2.org/. 33 Claims, 21 Drawing Sheets GENERC PARAMETERS Linear prediction parameters (unquantized) Pitch tag (unduantized) Pitch gain (unauantized) Speech CEP samples code SPECFC PARAMETERS linear prediction parameters (quantized) Adaptive codebook lag (quantized) Adaptive codebook gain (auantized) Fixed codebook indices (quantized) Fixed codebook gain (quantized) Other parameters (quantized) INTERMEDAE GENERC PARAMEERS Open-loop pitch lag Excitation signal US 7.254.533 B1 Page 2 OTHER PUBLICATIONS ITU-T G.723.1 Annex B “Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s, Annex B: Alter 3GPP TS 26.204 “ANSI-C code for the floating-point Adaptive native specification based on floating point arithmetic'. ITU-T Multi-Rate Wideband (AMR-WB) speech codec”. Release 5.0.0. (Mar. 2002)3" Generation Partnership Project (3GPP): http://www. Recommendation G.723.1—Annex B. http://www.itu.org/. Nov. 3gpp2.org/. 1996. UPA. 3GPP2 C.S0030-0 “Selectable Mode Vocoder Service Option for ITU-T G.728 "Coding of speech at 16 kbit/s using low-delay code Wideband Spread Spectrum Communication Systems”. 3 Genera excited linear prediction”. ITU-Recommendation G.728 (1992), tion Partnership Project (3GPP2), Dec. 2001, http://www.3gpp2. Geneva, http://www.itu.org/. org/. ITU-T G.729 "Coding of speech at 8 kbit/s using conjugate ANSI/TIA/EIA-136-Rev.C., part 410. “TDMA Cellular/PCS -Ra Structure Algebraiccode-excited linear-prediction (CS-ACELP)”. dio Interface, Enhance Full Rate Voice Codec (ACELP).” Formerly ITU-T Recommendation G.729 (1996), Geneva, http://www.itu. IS-641. TIA published standard, Jun. 1, 2001, http://www.tiaoline. org/. Org. ITU-T G.729A "Coding of Speech at 8 kbit/s using conjugate Cox, R.V. “Speech Coding Standards,” Speech Coding and Synthe structure algebraiccode-excited linear-prediction (CS-ACELP) sis, W.B.Kleijn and K.K Paliwal, eds., pp. 49-78, Elsevier Science, Annex A: Reduced complexity 8kbit/s. CS-ACELP speech codec'. (1995), The Netherlands. ITU-T Recommendation G.729—Annex A. Nov. 1996, http://www. ETSI, GSM 6.10 "Recommendation GSM 6.10 Full-Rate Speech itu.org/. Transcoding, verison 8.02 (Nov. 2000), European Telecommuni ITU-T G.729C "Annex C: Reference floating-point implementation cations Standards Institute (ETSI), http://www.etsi.org/. for G.729 CSAELP 8 kbit/s speech coding”, ITU-T Recommenda ETSI GSM 06.60, “Enhanced Full Rate (EFR) Speech transcoding” tion G.729—Annex C. Sep. 1998, http://www.itu.org/. version 8.0.1 (Nov. 2000), European Telecommunications Stan dards Institute (ETSI), http://www.etsi.org/. Spanias, A.S. "Speech Coding: A Tutorial Review'. Proc. IEEE, vol. ETSI GSM 06.20, “Half rate speech: Half rate speech transcoding”. 82, No. 10, pp. 1541-1582, Oct. 1994. veriosn 8.01 (Nov. 2000). European Telecommunications Standards TIA/EIA/IS-127-2 “Enhanced Variable Rate Codec, Speech Service Institute (ETSI), http://www.esti.org/. Option 3 for WidebandSpread Spectrum Digital Systems' Telecom ISO/IEC 14496-3 MPEG4-CELP Coder, “Information munications Industry Association 1999. Technology—coding of Audiovisual Objects, Part3: Audio, Subpart TIA/EIAIIS-733, “High Rate Speech Service Option 17 for 3. CELP, ISO/JTC 1/SC 29 N2203CELP. May 1998. Wideband Spread Spectrum Communication Systems', TIA pub ITU-T G.723.1 "Speech Coders: Dual rate speech coder for multi lished standard, Nov. 17, 1997. media communications transmission at 5.3 and 6.3 kbit/s', ITU-T Recommendation G.723.1 (1996), Geneva, http://www.itu.org/. * cited by examiner U.S. Patent Aug. 7, 2007 Sheet 1 of 21 US 7.254.533 B1 CODEC 1 Encoder Encoded bitstream Speech Samples according to codec (e.g. PCM) standard 1 Encoded bitstream Speech Samples CODEC 2 Encoder according to codec (e.g. PCM) standard 2 Encoded bitstream CODEC K Encoder Speech Samples according to codec (e.g. PCM) standard K Figure 1(A) U.S. Patent Aug. 7, 2007 Sheet 2 of 21 US 7.254.533 B1 CODEC 1 Decoder Encoded bitstream Speech Samples according to codec (e.g. PCM) standard 1 Encoded bitstream CODEC 2 Decoder Speech Samples according to codec (e.g. PCM) standard 2 Encoded bitstream CODEC K Decoder Speech Samples according to codec (e.g. PCM) standard K Figure 1 (B) U.S. Patent Aug. 7, 2007 Sheet 3 of 21 US 7.254.533 B1 EnCoded bitStream Speech samples according to one of (e.g. PCM) Thin Codec the Standards (encoder part) {Codec 1, Codec 2, ..., Codec K} Encoded bitstream according to one of Speech samples s ( . Thin Codec the Standards (e.g. PCM) (decoder part) (Codec 1, Codec 2, ..., Codec K} 200 Figure 2 U.S. Patent Aug. 7, 2007 Sheet 4 of 21 US 7.254.533 B1 GENERC PARAMETERS Linear prediction parameters (unduantized) Pitch tag (unquantized) Pitch gain (unquantized) Speech CEP samples coded SPECIFIC PARAMETERS Linear prediction parameters (quantized) Adaptive Codebook lag (quantized) Adaptive codebook gain (quantized) Fixed codebook indices (quantized) Fixed Codebook gain (quantized) Other parameters (quantized) INTERMEDIATE GENERC PARAMETERS Open-loop pitch lag Excitation signal Figure 3 U.S. Patent Aug. 7, 2007 Sheet S of 21 US 7.254.533 B1 codebook Fixed Excitation Output s Fixed codebook signal speech 40 codebook gain Synthesis filter Post processing index 2 1. y : Adaptive codebook X predictionLine ar 2. coefficients Adaptive Adaptive codebook 420" codebook gain lag Figure 4 U.S. Patent Aug. 7, 2007 Sheet 6 of 21 US 7.254.533 B1 input speech sample (PCM) 52d Pre-processing LP analysis and quantiaation Open-loop pitch tag analysis Adpative codebook lag analysis and quantization Adaptive codebook gain analysis and quantization Fixed codebook index analysis and quantization Fixed codebook gain analysis and quantization Codec bitstream Bitstream packing Figure 5 U.S. Patent Aug. 7, 2007 Sheet 7 of 21 US 7.254.533 B1 Codec bitstream Bitstream Unpacking 420 Excitation 430 reconstruction Synthesis filtering A49 Output Speech sample) (PCM) Post processing 450 Figure 6 U.S. Patent Aug. 7, 2007 Sheet 8 of 21 US 7.254.533 B1 71° — Codec 1 (C1) C1 Pre-processing 2. O C1 Parameter 1 Encoding (Q1) C1 Parameter 2 Generic Pre-processing Encoding (Q1) Specific Pre-processing Generic Parameter 1 C1 Bitstream Packing Encoding Specific Param. 1 Encoding Generic Parameter 2 Encoding Combine all CodeCS Codec K(CK) into a single Specific Param. 2 architecture Encoding CK Preprocessing CK Parameter 1 us Encoding (QN) 8. CK Parameter 2 Encoding (QN) Generic Bitstream Packing Specific Bitstream Packing CK Bitstream Packing Universa. Thin Codec (Encoder) Figure 7 U.S. Patent Aug. 7, 2007 Sheet 9 of 21 US 7.254.533 B1 Generic Bitstral Unpacking Specific Bitstream Unpacking Generic Excitation ?econ. C1 Post-processing Specific Excitation recon. Codec K (CK) CK Bitstream Unpacking Generic Post-processing CK Excitation recon.