US00725.4533B1

(12) United States Patent (10) Patent No.: US 7.254.533 B1 Jabri et al. (45) Date of Patent: Aug. 7, 2007

(54) METHOD AND APPARATUS FOR A THIN 3GPP TS 26.090, “Adaptive Multi-Rate (AMR) speech codec: CELP VOICE CODEC Transcoding fuctions”, Release 5.0.0 (Jun. 2002), 3' Generation Partnership Project (3GPP), http://www.3gpp2.org/. (75) Inventors: Marwan A. Jabri, Broadway (AU); 3GPP TS 26.104 “ANSI-C code for the floating-point AMR speech Nicola Chong-White, Chatswood (AU): codec”. Release 5.00, (Jun. 2002) 3" Gerneration Partnership Jianwei Wang, Killarney Heights (AU) Project (3GPP), http://www.3gpp2.org/. 3GPP TS 26.173 “ANSI-C code for the Adaptive Multi-Rate (73) Assignee: Dilithium Networks Pty Ltd., Sydney, Wideband speech codec, (Mar. 2002) 3" Generation Partnership NSW (AU) Project (3GPP), http://www.3gpp2.org/. *) Notice: Subject to anyy disclaimer, the term of this 3GPP TS 26.190 “AMR Wideband speech codec; Transcoding patent is extended or adjusted under 35 Functions (Release 5), 3 Generation Partnership Project (3GPP); U.S.C. 154(b) by 872 days. Dec. 2001, http://www.3gpp2.org/. (21) Appl. No.: 10/688,857 (Continued) Primary Examiner V. Paul Harper (22) Filed: Oct. 17, 2003 (74) Attorney, Agent, or Firm Townsend and Townsend Related U.S. Application Data and Crew LLP (60) Provisional application No. 60/439.366, filed on Jan. (57) ABSTRACT 9, 2003, provisional application No. 60/419,776, filed on Oct. 17, 2002. An apparatus and method for encoding and decoding a voice (51) Int. Cl. signal. The apparatus includes an encoder configured to GOL 9/00 (2006.01) generate an output bitstream signal from an input voice (52) U.S. Cl...... 704/219; 704/230; 704/221; signal. The output bitstream signal is associated with at least 455/413 a first standard of a first plurality of CELP voice compres (58) Field of Classification Search ...... 704/219 sion standards. Additionally, the apparatus includes a See application file for complete search history. decoder configured to generate an output voice signal from (56) References Cited an input bitstream signal. The input bitstream signal is associated with at least a first standard of a second plurality U.S. PATENT DOCUMENTS of CELP voice compression standards. The CELP encoder 6,314.393 B1* 11/2001 Zheng et al...... TO4,223 includes a plurality of codec-specific encoder modules.

6,717.955 B1 * 4/2004 Holler ...... 370/474 Additionally, the CELP encoder includes a plurality of 2002/0028670 A1* 3/2002 Ohsuge ...... 455,413 generic encoder modules. The CELP decoder includes a 2003/0103524 A1* 6/2003 Hasegawa ...... 370/465 plurality of codec-specific decoder modules. Additionally, OTHER PUBLICATIONS the CELP decoder includes a plurality of generic decoder modules. 3GPP TS 26.073 “ANSI-C code for the Adaptive Multi Rate (AMR) speech codec”. Release 5.00, (Mar. 2002) 3' Generation Partner ship Project (3GPP), http://www.3gpp2.org/. 33 Claims, 21 Drawing Sheets

GENERC PARAMETERS Linear prediction parameters (unquantized) Pitch tag (unduantized) Pitch gain (unauantized)

Speech CEP samples code

SPECFC PARAMETERS linear prediction parameters (quantized) Adaptive codebook lag (quantized) Adaptive codebook gain (auantized) Fixed codebook indices (quantized) Fixed codebook gain (quantized) Other parameters (quantized) INTERMEDAE GENERC PARAMEERS Open-loop pitch lag Excitation signal US 7.254.533 B1 Page 2

OTHER PUBLICATIONS ITU-T G.723.1 Annex B “Dual rate speech coder for multimedia communications transmitting at 5.3 and 6.3 kbit/s, Annex B: Alter 3GPP TS 26.204 “ANSI-C code for the floating-point Adaptive native specification based on floating point arithmetic'. ITU-T Multi-Rate Wideband (AMR-WB) speech codec”. Release 5.0.0. (Mar. 2002)3" Generation Partnership Project (3GPP): http://www. Recommendation G.723.1—Annex B. http://www.itu.org/. Nov. 3gpp2.org/. 1996. UPA. 3GPP2 C.S0030-0 “Selectable Mode Vocoder Service Option for ITU-T G.728 "Coding of speech at 16 kbit/s using low-delay code Wideband Spread Spectrum Communication Systems”. 3 Genera excited linear prediction”. ITU-Recommendation G.728 (1992), tion Partnership Project (3GPP2), Dec. 2001, http://www.3gpp2. Geneva, http://www.itu.org/. org/. ITU-T G.729 "Coding of speech at 8 kbit/s using conjugate ANSI/TIA/EIA-136-Rev.C., part 410. “TDMA Cellular/PCS -Ra Structure Algebraiccode-excited linear-prediction (CS-ACELP)”. dio Interface, Enhance Voice Codec (ACELP).” Formerly ITU-T Recommendation G.729 (1996), Geneva, http://www.itu. IS-641. TIA published standard, Jun. 1, 2001, http://www.tiaoline. org/. Org. ITU-T G.729A "Coding of Speech at 8 kbit/s using conjugate Cox, R.V. “ Standards,” Speech Coding and Synthe structure algebraiccode-excited linear-prediction (CS-ACELP) sis, W.B.Kleijn and K.K Paliwal, eds., pp. 49-78, Elsevier Science, Annex A: Reduced complexity 8kbit/s. CS-ACELP speech codec'. (1995), The Netherlands. ITU-T Recommendation G.729—Annex A. Nov. 1996, http://www. ETSI, GSM 6.10 "Recommendation GSM 6.10 Full-Rate Speech itu.org/. Transcoding, verison 8.02 (Nov. 2000), European Telecommuni ITU-T G.729C "Annex C: Reference floating-point implementation cations Standards Institute (ETSI), http://www.etsi.org/. for G.729 CSAELP 8 kbit/s speech coding”, ITU-T Recommenda ETSI GSM 06.60, “ (EFR) Speech transcoding” tion G.729—Annex C. Sep. 1998, http://www.itu.org/. version 8.0.1 (Nov. 2000), European Telecommunications Stan dards Institute (ETSI), http://www.etsi.org/. Spanias, A.S. "Speech Coding: A Tutorial Review'. Proc. IEEE, vol. ETSI GSM 06.20, “ speech: Half rate speech transcoding”. 82, No. 10, pp. 1541-1582, Oct. 1994. veriosn 8.01 (Nov. 2000). European Telecommunications Standards TIA/EIA/IS-127-2 “Enhanced Variable Rate Codec, Speech Service Institute (ETSI), http://www.esti.org/. Option 3 for WidebandSpread Spectrum Digital Systems' Telecom ISO/IEC 14496-3 MPEG4-CELP Coder, “Information munications Industry Association 1999. Technology—coding of Audiovisual Objects, Part3: Audio, Subpart TIA/EIAIIS-733, “High Rate Speech Service Option 17 for 3. CELP, ISO/JTC 1/SC 29 N2203CELP. May 1998. Wideband Spread Spectrum Communication Systems', TIA pub ITU-T G.723.1 "Speech Coders: Dual rate speech coder for multi lished standard, Nov. 17, 1997. media communications transmission at 5.3 and 6.3 kbit/s', ITU-T Recommendation G.723.1 (1996), Geneva, http://www.itu.org/. * cited by examiner U.S. Patent Aug. 7, 2007 Sheet 1 of 21 US 7.254.533 B1

CODEC 1 Encoder Encoded bitstream Speech Samples according to codec (e.g. PCM) standard 1

Encoded bitstream Speech Samples Encoder according to codec (e.g. PCM) standard 2

Encoded bitstream CODEC K Encoder Speech Samples according to codec (e.g. PCM) standard K

Figure 1(A) U.S. Patent Aug. 7, 2007 Sheet 2 of 21 US 7.254.533 B1

CODEC 1 Decoder Encoded bitstream Speech Samples according to codec (e.g. PCM) standard 1

Encoded bitstream CODEC 2 Decoder Speech Samples according to codec (e.g. PCM) standard 2

Encoded bitstream CODEC K Decoder Speech Samples according to codec (e.g. PCM) standard K

Figure 1 (B) U.S. Patent Aug. 7, 2007 Sheet 3 of 21 US 7.254.533 B1

EnCoded bitStream Speech samples according to one of (e.g. PCM) Thin Codec the Standards (encoder part) {Codec 1, Codec 2, ..., Codec K}

Encoded bitstream according to one of Speech samples s ( . Thin Codec the Standards (e.g. PCM) (decoder part) (Codec 1, Codec 2, ..., Codec K}

200 Figure 2 U.S. Patent Aug. 7, 2007 Sheet 4 of 21 US 7.254.533 B1

GENERC PARAMETERS Linear prediction parameters (unduantized) Pitch tag (unquantized) Pitch gain (unquantized)

Speech CEP samples coded

SPECIFIC PARAMETERS Linear prediction parameters (quantized) Adaptive Codebook lag (quantized) Adaptive codebook gain (quantized) Fixed codebook indices (quantized) Fixed Codebook gain (quantized) Other parameters (quantized) INTERMEDIATE GENERC PARAMETERS Open-loop pitch lag Excitation signal

Figure 3 U.S. Patent Aug. 7, 2007 Sheet S of 21 US 7.254.533 B1

codebook

Fixed Excitation Output s Fixed codebook signal speech 40 codebook gain Synthesis filter Post processing index 2 1. y

: Adaptive codebook X predictionLine ar 2. coefficients

Adaptive Adaptive codebook 420" codebook gain lag

Figure 4 U.S. Patent Aug. 7, 2007 Sheet 6 of 21 US 7.254.533 B1

input speech sample (PCM) 52d Pre-processing

LP analysis and quantiaation

Open-loop pitch tag analysis

Adpative codebook lag analysis and quantization

Adaptive codebook gain analysis and quantization

Fixed codebook index analysis and quantization

Fixed codebook gain analysis and quantization

Codec bitstream Bitstream packing

Figure 5 U.S. Patent Aug. 7, 2007 Sheet 7 of 21 US 7.254.533 B1

Codec bitstream Bitstream Unpacking 420

Excitation 430 reconstruction

Synthesis filtering A49

Output Speech sample) (PCM) Post processing

450 Figure 6 U.S. Patent Aug. 7, 2007 Sheet 8 of 21 US 7.254.533 B1

71° — Codec 1 (C1)

C1 Pre-processing 2. O C1 Parameter 1 Encoding (Q1) C1 Parameter 2 Generic Pre-processing Encoding (Q1) Specific Pre-processing

Generic Parameter 1 C1 Bitstream Packing Encoding Specific Param. 1 Encoding Generic Parameter 2 Encoding Combine all CodeCS Codec K(CK) into a single Specific Param. 2 architecture Encoding CK Preprocessing CK Parameter 1 us Encoding (QN) 8. CK Parameter 2 Encoding (QN) Generic Bitstream Packing Specific Bitstream Packing

CK Bitstream Packing Universa. Thin Codec (Encoder)

Figure 7 U.S. Patent Aug. 7, 2007 Sheet 9 of 21 US 7.254.533 B1

Generic Bitstral Unpacking Specific Bitstream Unpacking

Generic Excitation ?econ. C1 Post-processing

Specific Excitation recon.

Codec K (CK) CK Bitstream Unpacking Generic Post-processing

CK Excitation recon. Specific Post Ssing

4. Universal Thin CodeC (decoder) CKPost-processing

Figure 8 U.S. Patent US 7.254.533 B1

SETTGJOWOIXJENES)

2bb

004 U.S. Patent Aug. 7, 2007 Sheet 11 of 21 US 7.254.533 B1

SETTGJOWOI>JENES)

o901 040/

080/ suo?evado

(700/ U.S. Patent Aug. 7, 2007 Sheet 12 of 21 US 7.254.533 B1

N s s is is S. C E E2 is S33

3C y 2

S, E 3 N t3 SP 5 CD O 5

K. 5. . . on. 9 N E is3 &S (O ( ) y d

C E CD

is (S ty 2 S in O E . . . 5 3 8

(9 3C5

U.S. Patent Aug. 7, 2007 Sheet 14 of 21 US 7.254.533 B1

epoo xepu!

U.S. Patent Aug. 7, 2007 Sheet 16 of 21 US 7.254,533 B1

|------

?????????????

U.S. Patent US 7.254.533 B1

22$/

U.S. Patent Aug. 7, 2007 Sheet 19 of 21 US 7.254.533 B1

@ )) dTEO8

LI9In?IH

U.S. Patent Aug. 7, 2007 Sheet 20 of 21 US 7.254.533 B1

US 7,254,533 B1 1. 2 METHOD AND APPARATUS FOR A THIN In another example, the Code Division Multiple Access CELP VOICE CODEC (CDMA) standards SMV and EVRC share certain math functions at the basic operations level. At the algorithm CROSS-REFERENCES TO RELATED level, the noise Suppression and rate selection routines of APPLICATIONS EVRC are substantially identical to SMV modules. The LP analysis follows Substantially the same algorithm in both This application claims priority to U.S. Provisional Nos. codecs and both modify the target signal to match an 60/419,776 filed Oct. 17, 2002 and 60/439,366 filed Jan. 9, interpolated delay contour. At Rate /s, both codecs produce 2003, which are incorporated by reference herein. a pseudo-random noise excitation to represent the signal. 10 SMV incorporates the full range of post-processing opera STATEMENT AS TO RIGHTS TO INVENTIONS tions including tilt compensation, formant postfilter, long MADE UNDER FEDERALLY SPONSORED term postfilter, gain normalization, and highpass filtering, RESEARCH OR DEVELOPMENT whereas EVRC uses a subset of these operations. As discussed above, a large number of industry standards NOTAPPLICABLE 15 codecs use CELP techniques. These codecs are usually supported by mobile and telephony handsets in order to BACKGROUND OF THE INVENTION interoperate with emerging and legacy network infrastruc ture. With the deployment of media rich handsets and the The present invention relates generally to telecommuni increasing complexity of user applications on these hand cation techniques. More particularly, the invention provides sets, the large number of codecs is putting increasing pres an encoding and decoding system and method that Support Sure on handset resources in terms of program memory and a plurality of compression standards and share computa DSP resources. tional resources. Merely by way of example, the invention Hence it is desirable to improve codec techniques. has been applied to Code Excited Linear Prediction (CELP) BRIEF SUMMARY OF THE INVENTION techniques, but it would be recognized that the invention has 25 a much broader range of applicability. The present invention relates generally to telecommuni Code Excited Linear Prediction (CELP) speech coding cation techniques. More particularly, the invention provides techniques are widely used in mobile telephony, Voice an encoding and decoding system and method that Support trunking and routing, and Voice-over-IP (VoIP). Such cod 30 a plurality of compression standards and share computa erS/decoders (codecs) model Voice signals as a source filter tional resources. Merely by way of example, the invention model. The source/excitation signal is generated via adap has been applied to Code Excited Linear Prediction (CELP) tive and fixed codebooks, and the filter is modeled by a techniques, but it would be recognized that the invention has short-term linear predictive coder (LPC). The encoded a much broader range of applicability. speech is then represented by a set of parameters which 35 According to an embodiment, the present invention pro specify the filter coefficients and the type of excitation. vides a method and apparatus for encoding and decoding a Industry standards codecs using CELP techniques include speech signal using a multiple codec architecture concept Global System for Mobile (GSM) Communications that supports several CELP voice coding standards. The Enhanced Full Rate (EFR) codec, Adaptive Multi-Rate individual codecs are combined into an integrated frame Narrowband (AMR-NB) codec, Adaptive Multi-Rate Wide 40 work to reduce the program size. This integrated framework band (AMR-WB), G.723.1, G.729, Enhanced Variable Rate is referred to as a thin CELP codec. The apparatus includes Codec (EVRC), Selectable Mode Vocoder (SMV), QCELP, a CELP encoder that generates a bitstream from the input and MPEG-4. These standard codecs apply substantially the voice signal in a format specific to the desired CELP codec, same generic algorithms in extracting CELP parameters and a CELP decoding module that decodes a received CELP with modifications to frame and Subframe sizes, filtering 45 bitstream and generates a voice signal. The CELP encoder procedures, interpolation resolutions, code-book structures includes one or more codec-specific CELP encoding mod and code-book search intervals. ules, a common functions library, a common math opera For example, the GSM standards AMR-NB and AMR tions library, a common tables library, and a bitstream WB usually operate with a 20 ms frame size divided into 4 packing module. The common libraries are shared between subframes of 5 ms. One difference between the wideband 50 more than one Voice coding standard. The output bitstream and narrowband coder is the sampling rate, which is 8 kHz may be bit-exact to the standard codec implementation or for AMR-NB and 16 kHz downsampled to 12.8 kHz for produce quality equivalent to the standard codec implemen analysis for AMR-WB. The linear prediction (LP) tech tation. The CELP decoder includes bitstream unpacking niques used in both AMR-NB and AMR-WB are substan module, one or more codec-specific CELP decoding mod tially identical, but AMR-WB performs adaptive tilt filter 55 ules, a common functions library, a common math opera ing, linear prediction (LP) analysis to 16th order over an tions library and a library of common tables. The output extended bandwidth of 6.4 kHz, conversion of LP coeffi Voice signal may be bit-exact to the standard codec imple cients to/from Immittance Spectral Pairs (ISP), and quanti mentation or produce quality equivalent to the standard Zation of the ISPs using split-multi-stage vector quantization codec implementation (SMSVQ). The pitch search routines and computation of the 60 According to another embodiment, the method for encod target signal are similar. Both codecs follow an ACELP fixed ing a voice signal includes generating CELP parameters codebook structure using a depth-first tree search to reduce from the input voice signal in a format specific to the desired computations. The adaptive and fixed codebook gains are CELP codec and packing the codec-specific CELP param quantized in both codecs using joint vector quantization eters to the output bitstream. The method for decoding a (VQ) with 4th order moving average (MA) prediction. 65 Voice signal includes unpacking the bitstream into codec AMR-WB also contains additional functions to deal with the specific CELP parameters, and decoding the parameters to higher frequency band up to 7 kHz. generate output speech. US 7,254,533 B1 3 4 According to yet another embodiment of the present eters based on at least information associated with the input invention, an apparatus for encoding and decoding a voice Voice signal, and packing the first plurality of codec-specific signal includes an encoder configured to generate an output CELP parameters to the output bitstream signal. The pro bitstream signal from an input voice signal. The output cessing the input bitstream signal uses at least a second bitstream signal is associated with at least a first standard of 5 common functions library, at least a second common math a first plurality of CELP voice compression standards. operations library, and a second common tables library. The Additionally, the apparatus includes a decoder configured to second common functions library includes a second func generate an output voice signal from an input bitstream tion, the second common math operations library includes a signal. The input bitstream signal is associated with at least second operation, and the second common tables library a first standard of a second plurality of CELP voice com 10 pression standards. The CELP encoder includes a plurality includes a second table. The second function, the second of codec-specific encoder modules. At least one of the operation and the second table are associated with at least a plurality of codec-specific encoder modules including at second standard and a third standard of the second plurality least a first table, at least a first function or at least a first of CELP voice compression standards. The second standard operation. The first table, the first function or the first 15 and the third standard of the second plurality of CELP voice operation is associated with only a second standard of the compression standards are different. The generating an out first plurality of CELP voice compression standards. Addi put voice signal includes unpacking the input bitstream tionally, the CELP encoder includes a plurality of generic signal and decoding a second plurality of codec-specific encoder modules. At least one of the plurality of generic CELP parameters to produce an output voice signal. encoder modules includes at least a second table, a second An example of the invention are provided, specifically a function or a second operation. The second table, the second thin CELP codec which combines the voice coding stan function or the second operation is associated with at least dards of GSM-EFR, GSM AMR-NB and GSM AMR-WB. a third standard and a fourth standard of the first plurality of Another example illustrates the combination of the EVRC CELP voice compression standards. The third standard and and SMV voice coding standards for CDMA. Many varia the fourth standard of the first plurality of CELP voice 25 tions of Voice coding standard combinations are applicable. compression standards are different. The CELP decoder Numerous benefits are achieved using the present inven includes a plurality of codec-specific decoder modules. At tion over conventional techniques. Certain embodiments of least one of the plurality of codec-specific decoder modules the present invention can be used to reduce the program size includes at least a third table, at least a third function or at of the encoder and decoder modules to be significantly less least a third operation. The third table, the third function or 30 than the combined program size of the individual voice the third operation is associated with only a second standard compression modules. Some embodiments of the present of the second plurality of CELP voice compression stan invention can be used to produce improved Voice quality dards. Additionally, the CELP decoder includes a plurality of output than the standard codec implementation. Certain generic decoder modules. At least one of the plurality of embodiments of the present invention can be used to pro generic decoder modules includes at least a fourth table, a 35 duce lower computational complexity than the standard fourth function or a fourth operation. The fourth table, the codec implementation. Some embodiments of the present fourth function or the fourth operation is associated with at invention provide efficient embedding of a number of stan least a third standard and a fourth standard of the second dard codecs and facilitates interoperability of handsets with plurality of CELP voice compression standards. The third diverse networks. standard and the fourth standard of the second plurality of 40 Depending upon the embodiment under consideration, CELP voice compression standards are different. one or more of these benefits may be achieved. These According to yet another embodiment of the present benefits and various additional objects, features and advan invention, a method for encoding and decoding a voice tages of the present invention can be fully appreciated with signal includes receiving an input voice signal, processing reference to the detailed description and accompanying the input voice signal, and generating an output bitstream 45 signal based on at least information associated with the input drawings that follow. Voice signal. The output bitstream signal is associated with at least a first standard of a first plurality of CELP voice BRIEF DESCRIPTION OF THE DRAWINGS compression standards. Additionally, the method includes receiving an input bitstream signal, processing the input 50 FIGS. 1A and 1B are simplified illustrations of the bitstream signal, and generating an output voice signal based encoder and decoder modules for voice coding to encode to on at least information associated with the input bitstream and decode from multiple voice coding standards; signal. The output voice signal is associated with at least a FIG. 2 is a simplified diagram for a thin codec according first standard of a second plurality of CELP voice compres to one embodiment of the present invention; sion standards. The processing the input voice signal uses at 55 FIG. 3 is a simplified diagram for certain parameters least a first common functions library, at least a first common common to some CELP codec standards according to an math operations library, and at least a first common tables embodiment of the present invention; library. The first common functions library includes a first function; the first common math operations library includes FIG. 4 is a simplified block diagram of a CELP decoder; a first operation, and the first common tables library includes 60 FIG. 5 is a simplified diagram for processing modules of a first table. The first function, the first operation and the first a CELP encoder; table are associated with at least a second standard and a FIG. 6 is a simplified diagram for processing modules of third standard of the first plurality of CELP voice compres a CELP decoder; sion standards. The second standard and the third standard of FIG. 7 is a simplified diagram comparing the structure of the first plurality of CELP voice compression standards are 65 multiple individual encoders and the encoder part of a thin different. The generating an output bitstream signal includes codec architecture according to one embodiment of the generating a first plurality of codec-specific CELP param present invention; US 7,254,533 B1 5 6 FIG. 8 is a simplified diagram comparing the structure of to voice samples. The thin codec 200 includes an encoder multiple individual decoders and the decoder part of a thin system 210 and a decoder system 220. The encoder system codec architecture according to one embodiment of the 210 can encode the input voice samples into one of several present invention; CELP voice compression formats and the decoder system FIG. 9 is a simplified block diagram for an encoder of a 220 can decode a bitstream in one of several CELP voice thin CELP codec according to an embodiment of the present compression formats back to speech samples using an invention; integrated codec architecture. FIG. 10 is a simplified block diagram for a decoder of a FIG. 3 is a simplified diagram for certain parameters thin CELP codec according to an embodiment of the present common to some CELP codec standards according to an invention; 10 embodiment of the present invention. This diagram is FIG. 11A is a simplified diagram showing generic mod merely an example, which should not unduly limit the scope ules between codec 1, codec 2 and code 3 for bit-exact of the present invention. One of ordinary skill in the art implementation according to an embodiment of the present would recognize many variations, alternatives, and modifi invention; cations. The intermediate parameters of open-loop pitch lag FIG. 11B is a simplified diagram showing generic mod 15 and excitation signal are usually generic to CELP codecs. ules between codec 1, codec 2 and code 3 for equivalent The unquantized values for linear prediction parameters, performance implementation according to an embodiment pitch lags, and pitch gains are also usually generic CELP of the present invention; parameters. The quantized values for linear prediction FIG. 12 is a simplified block diagram of an encoder for parameters, adaptive codebook lags, adaptive codebook GSM-EFR and AMR-NB: gains, fixed codebook indices, fixed codebook gains and FIG. 13 is a simplified block diagram of an encoder for other parameters are usually considered codec-specific GSM AMR-WB; parameters. For example, the quantized values for linear FIG. 14 is a simplified block diagram for an encoder of a prediction parameters include line spectral frequencies thin codec for GSM-EFR, AMR-NB and AMR-WB accord obtained from a vector-quantization codebook. ing to an embodiment of the present invention; 25 FIG. 4 is a simplified block diagram of a CELP decoder. FIG. 15 is a simplified block diagram for an decoder of a A fixed codebook index 410 and an adaptive codebook lag thin codec for GSM-EFR, AMR-NB and AMR-WB accord 420 are used to extract vectors from a fixed codebook 412 ing to an embodiment of the present invention; and an adaptive codebook 422 respectively. The selected FIG. 16 is a simplified block diagram for an encoder for fixed codebook vector and adaptive codebook vector are EVRC; 30 gain-scaled using a decoded fixed codebook gain 414 and an FIG. 17 is a simplified block diagram of the encoder for adaptive codebook gain 424 respectively, and then added SMV: together to form an excitation signal 430. The excitation FIG. 18 is a simplified block diagram of an embodiment signal 430 is filtered by a linear prediction synthesis filter of an encoder of a thin codec for SMV and EVRC according 440 to provide the spectral shape, and the resulting signal is to an embodiment of the present invention. 35 post-processed by a post processing unit 450 to form an FIG. 19 is a simplified block diagram of an embodiment output speech 460. of an decoder of a thin codec for SMV and EVRC according FIG. 5 is a simplified diagram for processing modules of to an embodiment of the present invention. a CELP encoder. An input speech sample 510 is first pre-processed by a pre-processing module 520. The output DETAILED DESCRIPTION OF THE 40 of the pre-processing module 520 is further processed by a INVENTION linear prediction analysis and quantization module 530. The open-loop pitch lag, adaptive codebook lag, and adaptive The present invention relates generally to telecommuni codebook gain are then determined and quantized by mod cation techniques. More particularly, the invention provides ules 540, 550, and 560 respectively. The fixed codebook an encoding and decoding system and method that Support 45 indices and fixed codebook gain are then determined and a plurality of compression standards and share computa quantized by modules 570 and 580 respectively. Lastly, the tional resources. Merely by way of example, the invention bitstream is packed in a desired format by a module 590. has been applied to Code Excited Linear Prediction (CELP) FIG. 6 is a simplified diagram for processing modules of techniques, but it would be recognized that the invention has a CELP decoder. A codec bitstream 610 is first unpacked to a much broader range of applicability. 50 yield the CELP parameters by a module 620, and the An illustration of the encoder and decoder modules for excitation is reconstructed using the adaptive codebook Voice coding to encode to and decode from multiple voice parameters and fixed codebook parameters by a module 630. coding standards are shown in FIG. 1A and FIG. 1B. A The excitation is then filtered by a linear prediction synthesis separate encoder and decoder may be used for each coding filter 640, and finally post-processing operations are applied standard, which may lead to large combined program 55 by a module 650 to produce an output speech sample 660. memory requirements. Since many voice coding standards FIG. 7 is a simplified diagram comparing the structure of presently used are based on the Code Excited Linear Pre multiple individual encoders and the encoder part of a thin diction (CELP) algorithm, there are many similarities in the codec architecture according to one embodiment of the processing functions across different coding standards. present invention. This diagram is merely an example, FIG. 2 is a simplified diagram for a thin codec according 60 which should not unduly limit the scope of the present to one embodiment of the present invention. This diagram is invention. One of ordinary skill in the art would recognize merely an example, which should not unduly limit the scope many variations, alternatives, and modifications. In the thin of the present invention. One of ordinary skill in the art codec architecture, individual encoders 710 are integrated would recognize many variations, alternatives, and modifi into a combined codec architecture 720. Each processing cations. The thin codec 200 can encode voice samples into 65 module of the encoders 710 is factorized into a generic part one of several voice compression formats, and decode and a specific part in the combined codec architecture 720. bitstreams in one of several voice compression formats back The program memory for the generic coding part can be US 7,254,533 B1 7 8 shared between several voice coding standards, resulting in 1050, generic math operations 1060, and generic subfunc Smaller overall program size. Depending on the bitstream tions 1070. A codec-specific bitstream 1010 is unpacked by constraints, the number of codecs combined, and the simi the bitstream unpacking modules 1020, which contain a larity between the codecs combined, the encoder part 720 of bitstream unpacking routine for each Supported Voice coding the thin codec may achieve significant program size reduc 5 standard, and codec-specific CELP parameters 1030 are tions. The bitstream constraints may include bit-exactness output to the CELP decoding modules 1040. The tables and minimum performance requirements. 1050, math operations 1060 and subfunctions 1070 that are FIG. 8 is a simplified diagram comparing the structure of common or generic to more than two of the Supported multiple individual decoders and the decoder part of a thin decoders are factored out of the codec-specific CELP decod codec architecture according to one embodiment of the 10 ing modules and included in a shared library. present invention. This diagram is merely an example, The algorithm factorization module can operate at a which should not unduly limit the scope of the present number of levels depending on the codec requirements. If a invention. One of ordinary skill in the art would recognize bit-exact implementation is required to the individual stan many variations, alternatives, and modifications. In the thin dard codecs, only functions, tables, and math operations that codec architecture, individual decoders are integrated into a 15 maintain bit-exactness between more than two codecs are combined codec architecture 820. Each processing module factored out into the generic modules. FIG. 11A is a sim of the decoders 810 is factorized into a generic part and a plified diagram showing generic modules between codec 1. specific part. The program memory for the generic decoding codec 2 and code 3 for bit-exact implementation according part can be shared between several voice coding standards, to an embodiment of the present invention. This diagram is resulting in Smaller overall program size. Depending on the merely an example, which should not unduly limit the scope bitstream constraints, the number of codecs combined, and of the present invention. One of ordinary skill in the art the similarity between the codecs combined, the decoder would recognize many variations, alternatives, and modifi part 820 of the thin codec may achieve significant program cations. An area 1110 represents generic bit-exact modules size reductions. The bitstream constraints may include bit of codecs 1, 2, and 3. Areas 1120, 1130, and 1140 represent exactness and minimum performance requirements. 25 generic bit-exact modules of codecs 1 and 3, codecs 1 and FIG. 9 is a simplified block diagram for an encoder of a 2, and codec 2 and 3 respectively. thin CELP codec according to an embodiment of the present If the bit-exact constraint is relaxed, then functions, tables invention. This diagram is merely an example, which should and math operations that produce equivalent quality or not unduly limit the scope of the present invention. One of provide equivalent functionality can be factored out into the ordinary skill in the art would recognize many variations, 30 generic modules. Alternatively, new generic processing alternatives, and modifications. modules can be derived and called by one or more codecs. This has the benefit of providing bit-compliant codec imple An encoder 900 of a thin CELP codec includes specific mentation. Using this approach, the program size can be modules 990 and generic modules 992. The specific modules reduced even further by having an increased number of 990 include CELP encoding modules 920 and bitstream 35 generic modules. FIG. 11B is a simplified diagram showing packing modules 940. The generic modules 992 include generic modules between codec 1, codec 2 and code 3 for generic tables 960, generic math operations 970, and generic equivalent performance implementation according to an Subfunctions 980. Input speech samples 910 are input to the embodiment of the present invention. This diagram is codec-specific CELP encoding modules 920 and codec merely an example, which should not unduly limit the scope specific CELP parameters 930 are produced. These param 40 of the present invention. One of ordinary skill in the art eters are then packed to a bitstream 950 in a desired coding would recognize many variations, alternatives, and modifi standard format using the codec-specific bitstream packing cations. An area 1160 represents generic bit-exact modules modules 940. The codec-specific CELP encoding modules of codecs 1, 2, and 3. Areas 1170, 1180, and 1190 represent 920 contain encoding modules for each supported voice generic bit-exact modules of codecs 1 and 3, codecs 1 and coding standard. However, the tables 960, math operations 45 2, and codec 2 and 3 respectively. For example, the area 970 and subfunctions 980 that are common or generic to two 1160 is larger than the area 1110, so more generic modules or more of the supported encoders are factored out of the can be used in equivalent performance than in bit-exact individual encoding modules by a codec algorithm factor implementation. ization module, and included only once in a shared library It is beneficial to maintain a modular, generalized frame in the thin codec 900. This sharing of common code reduces 50 work so that modules for additional coders can be easily the combined program memory requirements. Algorithm integrated. The use of generic modules may provide output factorization is performed only once during the implemen Voice quality higher than the standard codec implementation tation stage for each combination of codecs in the thin without an increase in program complexity, for example, by codec. Efficient factorizing of Subfunctions may require applying more advanced perceptual weighting filters. The splitting the processing modules into more than one stage. 55 use of generic modules may also provide lower complexity Some stages may share commonality with other codecs, than the standard codec, for example, by applying faster while other stages may be distinct to a particular codec. searching techniques. These benefits may be combined. FIG. 10 is a simplified block diagram for a decoder of a The greater the similarity between Voice coding stan thin CELP codec according to an embodiment of the present dards, the greater the program size savings that can be invention. This diagram is merely an example, which should 60 achieved by a thin codec according to an embodiment of the not unduly limit the scope of the present invention. One of present invention. As an example for illustration of the ordinary skill in the art would recognize many variations, bit-compliant specific embodiment of a thin CELP codec, alternatives, and modifications. A decoder 1000 of a thin the speech codecs integrated are GSM-EFR, AMR-NB and CELP codec includes specific modules 1080 and generic AMR-WB, although others can be used. GSM-EFR is modules 1090. The specific modules 1080 include bitstream 65 algorithmically the same as the highest rate of AMR-NB, unpacking modules 1020 and CELP decoding modules thus no additional program code is required for AMR-NB to 1040. The generic modules 1090 includes generic tables gain GSM-EFR bit-compliant functionality. The GSM stan US 7,254,533 B1 9 10 dards AMR-NB, which has eight modes ranging from 4.75 functions library, the first common math operations library kbps to 12.2 kbps, and AMR-WB, which has eight modes and the first common tables library. ranging from 6.60 kbps to 23.85 kbps, share a high degree The first common functions library, the first common of similarity in the encoder/decoder flow and in the general math operations library and the first common tables library algorithms of many procedures. 5 are associated with at least the third standard and the fourth According to one embodiment of the present invention, an standard of the first plurality of CELP voice compression apparatus for encoding and decoding a voice signal includes standards and configured to Substantially remove all dupli an encoder configured to generate an output bitstream signal cations between a first program code associated with the from an input voice signal. The output bitstream signal is third standard of the first plurality of CELP voice compres associated with at least a first standard of a first plurality of 10 sion standards and a second program code associated with CELP Voice compression standards. Additionally, the appa the fourth standard of the first plurality of CELP voice ratus includes a decoder configured to generate an output compression standards. Voice signal from an input bitstream signal. The input For example, the first common functions library, the first bitstream signal is associated with at least a first standard of common math operations library and the first common tables a second plurality of CELP voice compression standards. 15 library include only functions, math operations and tables The output bitstream signal is bit exact or equivalent in configured to maintain bit exactness for the third standard quality for the first standard of the first plurality of CELP and the fourth standard of the first plurality of CELP voice Voice compression standards. compression standards. For another example, the first com mon functions library, the first common math operations The CELP encoder includes a plurality of codec-specific library and the first common tables library include only encoder modules. At least one of the plurality of codec functions, math operations and tables algorithmically iden specific encoder modules including at least a first table, at tical to ones of the third standard and the fourth standard of least a first function or at least a first operation. The first the first plurality of CELP voice compression standards, and table, the first function or the first operation is associated functions, math operations and tables algorithmically similar with only a second standard of the first plurality of CELP 25 to ones of the third standard and the fourth standard of the voice compression standards. Additionally, the CELP first plurality of CELP voice compression standards. encoder includes a plurality of generic encoder modules. At The CELP decoder includes a plurality of codec-specific least one of the plurality of generic encoder modules decoder modules. At least one of the plurality of codec includes at least a second table, a second function or a specific decoder modules includes at least a third table, at second operation. The second table, the second function or 30 least a third function or at least a third operation. The third the second operation is associated with at least a third table, the third function or the third operation is associated standard and a fourth standard of the first plurality of CELP with only a second standard of the second plurality of CELP Voice compression standards. The third standard and the voice compression standards. Additionally, the CELP fourth standard of the first plurality of CELP voice com decoder includes a plurality of generic decoder modules. At pression standards are different. 35 least one of the plurality of generic decoder modules The plurality of codec-specific encoder modules includes includes at least a fourth table, a fourth function or a fourth a pre-processing module configured to process the speech operation. The fourth table, the fourth function or the fourth for encoding, a linear prediction analysis module configured operation is associated with at least a third standard and a to generate linear prediction parameters, an excitation gen fourth standard of the second plurality of CELP voice eration module configured to generate an excitation signal 40 compression standards. The third standard and the fourth by filtering the input speech signal by the short-term pre standard of the second plurality of CELP voice compression diction filter, and a long-term prediction module configured standards are different. to generate open-loop pitch lag parameters. Additionally, the The plurality of codec-specific decoder modules include a plurality of codec-specific encoder modules includes an bitstream unpacking module. The bitstream unpacking mod adaptive codebook module configured to determine an adap 45 ule includes at least one bitstream unpacking routine and is tive codebook lag and an adaptive codebook gain, a fixed configured to decode the input bitstream signal and generate codebook module configured to determine fixed codebook codec-specific CELP parameters. Additionally, the plurality vectors and a fixed codebook gain; and a bitstream packing of codec-specific decoder modules include an excitation module. The bitstream packing module includes at least one reconstruction module configured to reconstruct an excita bitstream packing routine and is configured to generate the 50 tion signal based on at least information associated with output bitstream signal based on at least codec-specific adaptive codebook lags, adaptive codebook gains, fixed CELP parameters associated with at least the first standard codebook indices and fixed codebook gains. Moreover, the of the first plurality of CELP voice compression standards. plurality of codec-specific decoder modules include a syn The plurality of generic encoder modules comprises a first thesis module configured to filter the excitation signal and common functions library including at least the second 55 generate a reconstructed speech. Also, the plurality of codec function, a first common math operations library including specific decoder modules include a post-processing module at least the second operation, and a first common tables configured to improve a perceptual quality of the recon library including at least the second table. The first common structed speech. functions library, the first common math operations library The generic decoder modules comprise a second common and the first common tables library are made by at least an 60 functions library including at least the fourth function, a algorithm factorization module. The algorithm factorization second common math operations library including at least module is configured to remove a first plurality of generic the fourth operation, and a second common tables library functions, a first plurality of generic operations and a first including at least the fourth table. The second common plurality of generic tables from the plurality of codec functions library, the second common math operations specific encoder modules and store the first plurality of 65 library and the second common tables library are made by at generic functions, the first plurality of generic operations least an algorithm factorization module. The algorithm fac and the first plurality of generic tables in the first common torization module is configured to remove a second plurality US 7,254,533 B1 11 12 of generic functions, a second plurality of operations and a standard of a second plurality of CELP voice compression second plurality of tables from the plurality of codec standards. The output bitstream signal is bit exact or equiva specific decoder modules and store the second plurality of lent in quality for the first standard of the first plurality of generic functions, the second plurality of operations and the CELP Voice compression standards. The output voice signal second plurality of tables in the second common functions is bit exact or equivalent in quality for the first standard of library, the second common math operations library and the the second plurality of CELP voice compression standards. second common tables library. For example, the first plurality of CELP voice compression The second common functions library, the second com standards include GSM-EFR, GSM-AMR Narrowband, and mon math operations library and the second common tables GSM-AMR Wideband. As another example, the first plu library are associated with at least the third standard and the 10 rality of CELP voice compression standards includes EVRC fourth standard of the second plurality of CELP voice and SMV. compression standards and configured to Substantially The processing the input voice signal uses at least a first remove all duplications between a third program code common functions library, at least a first common math associated with the third standard of the second plurality of operations library, and at least a first common tables library. CELP Voice compression standards and a fourth program 15 The first common functions library includes a first function; code associated with the fourth standard of the second the first common math operations library includes a first plurality of CELP voice compression standards. operation, and the first common tables library includes a first For example, the second common functions library, the table. The first function, the first operation and the first table second common math operations library and the second are associated with at least a second standard and a third common tables library include only functions, math opera standard of the first plurality of CELP voice compression tions and tables configured to maintain bit exactness for the standards. The second standard and the third standard of the third standard and the fourth standard of the second plurality first plurality of CELP voice compression standards are of CELP voice compression standards. For another example, different. The first common functions library, the first com the second common functions library, the second common mon math operations library and the first common tables math operations library and the second common tables 25 library are made by at least an algorithm factorization library include only functions, math operations and tables module. The algorithm factorization module is configured to algorithmically identical to ones of the third standard and the store a first plurality of generic functions, a first plurality of fourth standard of the second plurality of CELP voice operations and a first plurality of tables in the first common compression standards, and functions, math operations and functions library, the first common math operations library tables algorithmically similar to ones of the third standard 30 and the first common tables library. and the fourth standard of the second plurality of CELP The generating an output bitstream signal includes gen voice compression standards. erating a first plurality of codec-specific CELP parameters As discussed above and further emphasized here, one of based on at least information associated with the input voice ordinary skill in the art would recognize many variations, signal, and packing the first plurality of codec-specific CELP alternatives, and modifications. For example, the first plu 35 parameters to the output bitstream signal. The first plurality rality of CELP voice compression standards may be different of codec-specific CELP parameters include a linear predic from or the same as the second plurality of CELP voice tion parameter, an adaptive codebook lag, an adaptive code compression standards. The first standard of the first plural book gain, a fixed codebook index, and a fixed codebook ity of CELP voice compression standards may be different gain. For example, the linear prediction parameter includes from or the same as the first standard of the second plurality 40 a line spectral frequency. The generating a first plurality of of CELP voice compression standards. The first standard of code-specific CELP parameters includes performing a linear the first plurality of CELP voice compression standards may prediction analysis, generating linear prediction parameters, be different from or the same as the second standard of the and filtering the input speech signal by a short-term predic first plurality of CELP voice compression standards. The tion filter. Additionally, the generating a first plurality of first standard of the first plurality of CELP voice compres 45 code-specific CELP parameters includes generating an exci sion standards may be different from or the same as the third tation signal, determining an adaptive codebook pitch lag standard or the fourth standard of the first plurality of CELP parameter, and determining an adaptive codebook gain Voice compression standards. The first standard of the sec parameter. Moreover, the generating a first plurality of ond plurality of CELP voice compression standards may be code-specific CELP parameters includes determining an different from or the same as the second standard of the 50 index of a fixed codebook vector associated with a fixed second plurality of CELP voice compression standards. The codebook target signal, and determining a gain of the fixed apparatus of claim 1 wherein the first standard of the second codebook vector. plurality of CELP voice compression standards is the same The processing the input bitstream signal uses at least a as the third standard or the fourth standard of the second second common functions library, at least a second common plurality of CELP voice compression standards. 55 math operations library, and a second common tables library. According to another embodiment of the present inven The second common functions library includes a second tion, a method for encoding and decoding a voice signal function, the second common math operations library includes receiving an input voice signal, processing the includes a second operation, and the second common tables input voice signal, and generating an output bitstream signal library includes a second table. The second function, the based on at least information associated with the input voice 60 second operation and the second table are associated with at signal. The output bitstream signal is associated with at least least a second standard and a third standard of the second a first standard of a first plurality of CELP voice compres plurality of CELP voice compression standards. The second sion standards. Additionally, the method includes receiving standard and the third standard of the second plurality of an input bitstream signal, processing the input bitstream CELP voice compression standards are different. signal, and generating an output voice signal based on at 65 The generating an output voice signal includes unpacking least information associated with the input bitstream signal. the input bitstream signal and decoding a second plurality of The output voice signal is associated with at least a first codec-specific CELP parameters to produce an output voice US 7,254,533 B1 13 14 signal. The decoding a second plurality of codec-specific lag is found, the adaptive codebook gain is determined, CELP parameters includes reconstructing an excitation sig followed by the fixed codebook target, fixed codebook nal, synthesizing the excitation signal, and generating an indices and fixed codebook gain. An ACELP fixed codebook intermediate speech signal. Additionally, the decoding a structure is applied for all modes. The codebook vectors are second plurality of codec-specific CELP parameters chosen by minimizing the error between the original signal includes processing the intermediate speech signal to and the synthesized speech using a perceptually weighted improve a perceptual quality. distortion measure. As discussed above and further emphasized here, one of ordinary skill in the art would recognize many variations, FIG. 13 is a simplified block diagram of an encoder for alternatives, and modifications. For example, the first plu 10 GSM AMR-WB. The encoder structure has a high degree of rality of CELP voice compression standards may be different similarity to the AMR-NB structure. Input speech samples from or the same as the second plurality of CELP voice 1310 is first preprocessed in a pre-processing module 1312. compression standards. The first standard of the first plural The 16"-order linear prediction coefficients (LPCs) are ity of CELP voice compression standards is different from or determined once per frame using the Levinson-Durbin algo the same as the first standard of the second plurality of CELP 15 rithm by an LP windowing and autocorrelation module 1314 voice compression standards. The first standard of the first and a Levinson-Durbin module 1316. The LPCs are con plurality of CELP voice compression standards may be verted to immittance spectral frequencies (ISFs) by an LPC different from or the same as the second standard or the third to ISF conversion module 1318. The converted frequencies standard of the first plurality of CELP voice compression are quantized by an ISF quantization module 1320. The standards. The first standard of the second plurality of CELP unquantized ISFs are interpolated by an ISF interpolation voice compression standards may be different from or the module 1322, and the quantized ISFs are interpolated by an same as the second standard or the third standard of the ISF interpolation module 1324. These interpolated outputs second plurality of CELP voice compression standards. are used in the computation of the weighting filter, impulse FIG. 12 is a simplified block diagram of an encoder for response and adaptive codebook target by modules 1326, GSM-EFR and AMR-NB, GSM-EFR is algorithmically 25 1328 and 1330. The open-loop pitch is determined from the substantially the same as the highest rate of AMR-NB. Input weighted speech by a module 1332 and then refined during speech samples 1210 is first preprocessed by a pre-process the adaptive codebook search by a module 1334. The ing module 1212, and 10"-order linear prediction coeffi impulse response is computed and used in both the adaptive cients are determined once per frame or twice per frame for and fixed codebook searches. One of two interpolation filters 12.2 kbps mode by an LP windowing and autocorrelation 30 is selected for the fractional adaptive codebook search. Once module 1214 and a Levinson-Durbin module 1216. The the adaptive lag is found, the adaptive codebook gain is Levinson-Durbin module 1216 uses the Levinson-Durbin determined, followed by the fixed codebook target, fixed algorithm. These 10"-orderlinear prediction coefficients are codebook indices and fixed codebook gain. An ACELP fixed converted to line spectral frequencies (LSFs) by an LPC to codebook structure is applied for all modes. The codebook LSF conversion module 1218. The converted frequencies 35 vectors are chosen by minimizing the error between the are quantized by an LSF quantization module 1220. The original signal and the synthesized speech using a percep unquantized LSFs are interpolated by an LSF interpolation tually weighted distortion measure. For a high rate, the gain module 1222, and the quantized LSFs are interpolated by an of the high frequency range is determined and again index LSF interpolation module 1224. These interpolated outputs is transmitted. are used in the computation of the weighted speech, impulse 40 A comparison of certain features and processing functions response and adaptive codebook target by modules 1226, of AMR-NB and AMR-WB according to an embodiment of 1228 and 1230 respectively. The open-loop pitch is deter the present invention is shown in Table 1. This table is mined from the weighted speech by a module 1232 and then merely an example, which should not unduly limit the scope refined during the adaptive codebook search by a module of the present invention. One of ordinary skill in the art 1234. The impulse response is computed and used in both 45 would recognize many variations, alternatives, and modifi the adaptive and fixed codebook searches. Once the adaptive cations.

TABLE 1.

AMR-NB

Frame size 20 ms 20 ms Subframes 4 4 per frame Sampling 8 kHz 16kHz rate Pre Highpass filtering (80 Hz) Upsample by 4, LPF 6.4 kHz, Downsample processing by 5 Highpass filtering (50 Hz) Pre-emphasis H(z) = 1-0.68Z-1 LP analysis 10" order LP analysis 16" order LP analysis LPC to LSP conversion LPC to ISP conversion LP param. Quantize LSFs Quantize ISPs Quant. Split matrix quantization (SMO) or Split Multi-stage vector quantization, 2 Split Vector Quantization (SVQ) Weighting W(z) = A(Z/Y1)/A(z?y2) ilter Open-loop Pitch lag range 18-143 Pitch lag range 17-115 pitch Use 3 ranges or weighting function Use a weighting function US 7,254,533 B1 15 16

TABLE 1-continued

AMR-NB AMR-WB Closed-loop Adaptive codebook Adaptive codebook pitch Range 17, 19-143 Range 34-231 /6, /3 sample resolution /2, 4 sample resolution Fixed ACELP, 40 samples subframe ACELP, 64 samples subframe codebook Different tracks and no. of pulses for each Different no. of pulses for each mode Structure mode. adaptive prefilter F(z) = 1/(1-0.85 z')(1-b and search adaptive prefilter F(z) = 1/(1-g z-T) Z-1) Gain Joint VQ with 4" order MA prediction or Joint VQ with 4" order MA prediction quantization Separate quantization of gc, gp High band na Transmit high-band gain for highest rate frequency Generate 6.4-7 kHz with scaled white noise, convert to speech domain. Post- Adaptive tilt compensation filter Highpass filtering processing Formant postfilter De-emphasis filter Highpass filtering Upsample by 5, Downsample by 4

As shown in Table 1, both AMR-NB and AMR-WB codebook can be shared, but the algebraic structures differ. operate with a 20 ms frame size divided into 4 subframes of The quantization modules are mostly codec-specific and the 5 ms. A difference between the wideband and narrowband high-band processing functions are usually used only by coder is the sampling rate, which is 8 kHz for AMR-NB and AMR-WB. 16 kHz downsampled to 12.8 kHz for analysis for AMR FIG. 15 is a simplified block diagram for an decoder of a WB. AMR wideband contains additional pre-processing 25 thin codec for GSM-EFRAMR-NB and AMR-WB accord functions for decimation and pre-emphasis. The linear pre ing to an embodiment of the present invention. This diagram diction (LP) techniques used in both AMR-NB and AMR is merely an example, which should not unduly limit the WB are substantially identical, but AMR-WB performs scope of the present invention. One of ordinary skill in the linear prediction (LP) analysis to 16th order over an art would recognize many variations, alternatives, and modi extended bandwidth of 6.4 kHz and converts the LP coef 30 fications. Modules 1524, 1510, 1512, and 1514 for interpo ficients to/from Immittance Spectral Pairs (ISP). Quantiza lation, excitation reconstruction, synthesis and post-process tion of the ISPs is performed using split-multi-stage vector ing respectively have a high degree of similarity and can be quantization (SMSVQ), as opposed to split matrix quanti generic without Substantial loss of quality. Bitstream decod Zation and split vector quantization for quantization of the ing modules 1516 and 1518 are codec-specific. The adaptive LSFs in AMR-NB. The pitch search routines and computa 35 codebook filter 1520 and high-band processing functions tion of the target signal are similar, although the sample 1522 are usually used only for AMR-WB. At least some resolution for pitches differs. Both codecs follow an ACELP generic modules are shared between the codecs. Addition fixed codebook structure using a depth-first tree search to ally, common tables, Subfunctions and operations of codec reduce computations. The adaptive and fixed codebook specific modules are also factorized out into a shared library gains are quantized in both codecs using joint vector quan 40 to further reduce the program size. tization (VQ) with 4th order moving average (MA) predic As another example for illustration of the bit-compliant tion. AMR-NB also uses Scalar gain quantization for some specific embodiment, a thin CELP codec is applied to modes. AMR-WB contains additional functions to deal with integrate the Code Division Multiple Access (CDMA) stan the higher frequency band up to 7 kHz. The post-processing dards SMV and EVRC, although others can be used. SMV for both coders includes high-pass filtering, with AMR-NB 45 has 4 bit rates including Rate 1, Rate /2, Rate 4 and Rate including specific functions for adaptive tilt-compensation /s and EVRC has 3 bit rates including Rate 1, Rate /2 and and formant postfiltering, and AMR-WB including specific Rate /8. functions for de-emphasis and up-sampling. FIG. 16 is a simplified block diagram for an encoder for FIG. 14 is a simplified block diagram for an encoder of a EVRC. A signal 1610 is passed to a pre-processing module thin codec for GSM-EFR, AMR-NB and AMR-WB accord 50 1612 which performs highpass filtering to suppress very low ing to an embodiment of the present invention. This diagram frequencies and noise reduction to lessen background noise. is merely an example, which should not unduly limit the Linear prediction analysis is performed by a module 1614 scope of the present invention. One of ordinary skill in the once per frame using the Levinson-Durbin recursion pro art would recognize many variations, alternatives, and modi ducing autocorrelation coefficients and linear prediction fications. Modules 1410 and 1412 for LP analysis, modules 55 coefficients (LPCs). The LPCs are converted to LSPs by a 1414 and 1416 for interpolation, a module 1418 for open module 1616 and interpolated by a module 1618. The loop pitch search, modules 1420 and 1422 for adaptive and excitation is generated by a module 1620 that performs fixed target computation respectively, and a module 1424 for inverse filtering of the pre-processed speech by the inverse impulse response computation have a high degree of simi linear prediction filter. The open-loop pitch lag and pitch larity and can be generic without Substantial loss of quality. 60 gain are then estimated. Using the autocorrelation coeffi The modules 1410 and 1412 for LP analysis may include a cients, the pitch gain, and an external rate command, the bit module 1410 for autocorrelation and a module 1412 for rate for the current frame is determined by a module 1622. Levinson-Durbin. The modules of computing weighted The rate determination module 1622 applies voice activity speech, closed-loop pitch search, ACELP codebook search, detection (VAD) and logic operations to determine the rate. search and construct excitation also contain similarity in the 65 Depending on the bit rate, a different processing path is processing, although conditions and parameters may vary. selected. For Rate /s, the parameters transmitted are the For example, the search methods for the ACELP fixed LSPs, quantized to 8 bits, and the frame energy. For Rate /2 US 7,254,533 B1 17 18 and Rate 1, the LSPs, pitch lag, adaptive codebook gain, or RCELP (Type 1) processing is performed, whereby the fixed codebook indices and fixed codebook gains are com signal is modified by time-warping so that the signal has a puted. Rate 1 has the additional parameters of spectral Smooth pitch contour. transition indicator and delay difference. The LSFs are A comparison of certain features and processing functions quantized first and RCELP processing is performed, of SMV and EVRC according to an embodiment of the whereby the signal is modified by time-warping so that the present invention is shown in Table 2. This table is merely signal has a Smooth pitch contour. The adaptive and fixed an example, which should not unduly limit the scope of the codebook vectors are selected to match the modified speech present invention. One of ordinary skill in the art would signal. recognize many variations, alternatives, and modifications.

TABLE 2

SMV EVRC

Frame size 20 ms 20 ms Subframes 4, 3, or 2 depending on Rate and Frame 3 (53, 53, 54 samples) per frame type Sampling 8 kHz 8 kHz rate Pre- Silence enhancement Highpass filtering (120 Hz, 6th order) processing High-pass filtering (80 Hz, 2" order) Noise pre-processing (same as SMV option Noise pre-processing (2 options) A) Adaptive Tilt filter LP analysis 10" order LP analysis 10" order LP analysis LPC to LSP conversion LPC to LSP conversion Rate Rate based on input characteristics Rate based on input characteristics Selection 2 VAD options (Rate determination identical to one of WAD SMV VAD options) LSP Quant. Switched MA prediction, 2 predictors Weighted Split Vector Quantization (SVQ) Weighted Multi-stage VQ (MSVQ) Pitch search Integer and fractional delay search on Integer pitch search on residual weighted speech No closed-loop search Target signal RCELP signal modification RCELP signal modification Warp, Shift weighted speech to match pitch Shift residual to match pitch contour contOur Fixed ACELP and Gaussian codebooks ACELP codebooks codebook Iterative depth-first tree search Iterative depth-first search or exhaustive search Gain Joint quantization of adaptive and fixed Separate quantization of adaptive and fixed quantization gains gains Low rates NELP processing for Rate 1/4 Gaussian excitation for Rate 1/8 Gaussian excitation Rate /8 Post Tilt compensation Formant postfilter processing Formant post-filter Highpass filtering Long-term postfilter Highpass filtering

FIG. 17 is a simplified block diagram of the encoder for As shown in Table 2, SMV and EVRC share a high degree SMV. A signal 1710 is passed to a pre-processing module of similarity. At the basic operations level, SMV math 1712 which performs silence enhancement, highpass filter 45 functions are based on EVRC libraries. At the algorithm ing, noise reduction and adaptive tilt filtering. Linear pre level, both codecs have a frame size of 20 ms and determine diction analysis is performed by a module 1714 three times the bit rate for each frame based on the input signal per frame, centered at different locations, using the characteristics. In each case, a different coding scheme is Levinson-Durbin recursion producing autocorrelation coef used depending on the bit rate. SMV has an additional rate, ficients and linear prediction coefficients (LPCs). The LPCs 50 Rate 4, which uses NELP encoding. The noise suppression are converted to LSPs by a module 1716. The pre-processed and rate selection routines of EVRC are identical to SMV speech is perceptually weighted, and the open-loop pitch lag modules. SMV contains additional preprocessing functions and frame class/type are estimated. The lag is used to modify of silence enhancement and adaptive tilt filtering. The 10th the pre-processed speech by time-warping and the frame order LP analysis is common to both codecs, as is the class may be updated. Using numerous analysis parameters, 55 RCELP processing for the higher rates which modifies the including the frame class, the bit rate for the current frame target signal to match an interpolated delay contour. Both is determined. Depending on the bit rate and frame type, a codecs use an ACELP fixed codebook structure and iterative different processing path is selected. For Rate /s, the param depth-first tree search. SMV also uses Gaussian fixed code eters transmitted are the LSPs, quantized to 11 bits, and the books. At Rate /8, both codecs produce a pseudo-random Subframe gains. For Rate 4, noise excited linear prediction 60 noise excitation to represent the signal. SMV incorporates (NELP) processing is performed. For Rate /2 and Rate 1, the full range of post-processing operations including tilt two processing paths are available for each rate, Type 1 and compensation, formant postfilter, long term postfilter, gain Type 0. In each case, the LSPs, LSP predictor switch, normalization, and highpass filtering, whereas EVRC uses a adaptive codebook lags, adaptive codebook gain, fixed code Subset of these operations. book indices and fixed codebook gains are computed. Rate 65 FIG. 18 is a simplified block diagram of an embodiment 1. Type 0 has the additional parameter of LSP interpolation of an encoder of a thin codec for SMV and EVRC according path. The LSFs are quantized first and either CELP (Type O) to an embodiment of the present invention. This diagram is US 7,254,533 B1 19 20 merely an example, which should not unduly limit the scope Although specific embodiments of the present invention of the present invention. One of ordinary skill in the art have been described, it will be understood by those of skill would recognize many variations, alternatives, and modifi in the art that there are other embodiments that are equiva cations. A module 1810 for LP analysis, a module 1812 for lent to the described embodiments. Accordingly, it is to be LPC to LSP conversion, a module 1814 for perceptual 5 understood that the invention is not to be limited by the weighting, a module 1816 for open-loop pitch search, a specific illustrated embodiments, but only by the scope of module 1818 for RCELP modification, and module 1820 for the appended claims. generating random excitation have a high degree of simi What is claimed is: larity and can be generic. The module 1810 may perform 1. An apparatus for encoding and decoding a voice signal, autocorrelation and Levinson-Durbin processing. Addition 10 the apparatus comprising: ally, modules for interpolation, adaptive and fixed target an encoder configured to generate an output bitstream computation, and impulse response computation also have a signal from an input voice signal, the output bitstream high degree of similarity and can be generic. The Rate /8 signal associated with at least a first standard of a first processing is similar to both SMV and EVRC codecs, while plurality of CELP voice compression standards: the Rate 1 and Rate /2 processing of EVRC is similar to 15 Type 1 SMV processing. SMV requires additional classifi a decoder configured to generate an output voice signal cation processing to accurately classify the input, and addi from an input bitstream signal, the input bitstream tional processing paths to accommodate both Type 1 and signal associated with at least a first standard of a Type 0 processing. Many of the fixed codebook search second plurality of CELP voice compression standards: functions are generic as both codecs include ACELP code wherein the CELP encoder comprises: books. Since SMV is considerably more algorithmically a plurality of codec-specific encoder modules, at least complex than EVRC, a possible approach for one or more of one of the plurality of codec-specific encoder mod the thin codec encoding modules, for example the rate ules including at least a first table, at least a first determination module, is to embed EVRC functionality function or at least a first operation, the first table, the within the SMV processing modules. These modules may be 25 first function or the first operation associated with split into stages, with some stages generic to each codec. only a second standard of the first plurality of CELP Other modules containing some generic stages include mod Voice compression standards; ule 1822 for pre-processing, and module 1824 for rate a plurality of generic encoder modules, at least one of determination. the plurality of generic encoder modules including at FIG. 19 is a simplified block diagram of an embodiment 30 least a second table, a second function or a second of a decoder of a thin codec for SMV and EVRC according operation, the second table, the second function or to an embodiment of the present invention. This diagram is the second operation associated with at least a third merely an example, which should not unduly limit the scope standard and a fourth standard of the first plurality of of the present invention. One of ordinary skill in the art CELP voice compression standards, the third stan would recognize many variations, alternatives, and modifi 35 dard and the fourth standard of the first plurality of cations. Similar to the encoderas shown in FIG. 18, there are CELP voice compression standards being different; different processing paths, depending on the bit rate. The wherein the CELP decoder comprises: bitstream decoding modules are codec-specific and the post a plurality of codec-specific decoder modules, at least processing operations for EVRC can be embedded within one of the plurality of codec-specific decoder mod the SMV post-processing module. Module 1910 for Rate /8 40 ules including at least a third table, at least a third decoding has a high degree of similarity and can be generic. function or at least a third operation, the third table, In addition to shared decoding modules, common tables, the third function or the third operation associated Subfunctions and operations of codec-specific modules are with only a second standard of the second plurality also factorized out into a shared library to further reduce the of CELP voice compression standards: program size. 45 a plurality of generic decoder modules, at least one of As discussed above and further emphasized here, FIGS. the plurality of generic decoder modules including at 18 and 19 are merely examples. The apparatus and method least a fourth table, a fourth function or a fourth for a thin CELP voice codec is applicable to numerous operation, the fourth table, the fourth function or the combinations of various voice codecs. For example, these fourth operation associated with at least a third voice codecs include G.723.1, GSM-AMR, EVRC, G.728, 50 standard and a fourth standard of the second plurality G.729, G.729A, QCELP, MPEG-4 CELP, SMV, AMR-WB, of CELP voice compression standards, the third and VMR. Usually, the more similar the codec algorithms, standard and the fourth standard of the second plu the greater the potential achievable program size savings. rality of CELP voice compression standards being Numerous benefits are achieved using the present inven different. tion over conventional techniques. Certain embodiments of 55 2. The apparatus of claim 1 wherein the output bitstream the present invention can be used to reduce the program size signal is bit exact for the first standard of the first plurality of the encoder and decoder modules to be significantly less of CELP voice compression standards. than the combined program size of the individual voice 3. The apparatus of claim 1 wherein the output bitstream compression modules. Some embodiments of the present signal is equivalent in quality for the first standard of the first invention can be used to produce improved Voice quality 60 plurality of CELP voice compression standards. output than the standard codec implementation. Certain 4. The apparatus of claim 1 wherein the plurality of embodiments of the present invention can be used to pro generic encoder modules comprises: duce lower computational complexity than the standard a first common functions library, the first common func codec implementation. Some embodiments of the present tions library including at least the second function; invention provide efficient embedding of a number of stan 65 a first common math operations library, the first common dard codecs and facilitate interoperability of handsets with math operations library including at least the second diverse networks. operation; US 7,254,533 B1 21 22 a first common tables library, the first common tables CELP parameters associated with at least the first library including at least the second table. standard of the first plurality of CELP voice compres 5. The apparatus of claim 4, wherein the generic decoder sion standards. modules comprise: 11. The apparatus of claim 1 wherein the plurality of a second common functions library, the second common 5 codec-specific decoder modules comprise: functions library including at least the fourth function; a bitstream unpacking module including at least one a second common math operations library, the second bitstream unpacking routine and configured to decode common math operations library including at least the the input bitstream signal and generate codec-specific fourth operation; CELP parameters; a second common tables library, the second common 10 an excitation reconstruction module configured to recon tables library including at least the fourth table. struct an excitation signal based on at least information 6. The apparatus of claim 5 wherein the first common associated with adaptive codebook lags, adaptive code functions library, the first common math operations library book gains, fixed codebook indices and fixed codebook and the first common tables library are made by at least an gains: algorithm factorization module, the algorithm factorization 15 a synthesis module configured to filter the excitation module configured to remove a first plurality of generic signal and generate a reconstructed speech; functions, a first plurality of generic operations and a first a post-processing module configured to improve a per plurality of generic tables from the plurality of codec ceptual quality of the reconstructed speech. specific encoder modules and store the first plurality of 12. The apparatus of claim 1 wherein the first plurality of generic functions, the first plurality of generic operations CELP voice compression standards are the same as the and the first plurality of generic tables in the first common second plurality of CELP voice compression standards. functions library, the first common math operations library 13. The apparatus of claim 1 wherein the first standard of and the first common tables library. the first plurality of CELP voice compression standards is 7. The apparatus of claim 6 wherein the first common the same as the first standard of the second plurality of CELP functions library, the first common math operations library 25 Voice compression standards. and the first common tables library are associated with at 14. The apparatus of claim 1 wherein the first standard of least the third standard and the fourth standard of the first the first plurality of CELP voice compression standards is plurality of CELP voice compression standards and config the same as the second standard of the first plurality of CELP ured to substantially remove all duplications between a first Voice compression standards. program code associated with the third standard of the first 30 15. The apparatus of claim 1 wherein the first standard of plurality of CELP voice compression standards and a second the first plurality of CELP voice compression standards is program code associated with the fourth standard of the first the same as the third standard or the fourth standard of the plurality of CELP voice compression standards. first plurality of CELP voice compression standards. 8. The apparatus of claim 5 wherein the first common 16. The apparatus of claim 1 wherein the first standard of functions library, the first common math operations library 35 the second plurality of CELP voice compression standards is and the first common tables library include only functions, the same as the second standard of the second plurality of math operations and tables configured to maintain bit exact CELP voice compression standards. ness for the third standard and the fourth standard of the first 17. The apparatus of claim 1 wherein the first standard of plurality of CELP voice compression standards. the second plurality of CELP voice compression standards is 9. The apparatus of claim 4 wherein the first common 40 the same as the third standard or the fourth standard of the functions library, the first common math operations library second plurality of CELP voice compression standards. and the first common tables library include only functions, 18. A method for encoding and decoding a voice signal, math operations and tables algorithmically identical to ones the method comprising: of the third standard and the fourth standard of the first receiving an input voice signal; plurality of CELP voice compression standards, and func 45 processing the input voice signal; tions, math operations and tables algorithmically similar to generating an output bitstream signal based on at least ones of the third standard and the fourth standard of the first information associated with the input voice signal, the plurality of CELP voice compression standards. output bitstream signal associated with at least a first 10. The apparatus of claim 1 wherein the plurality of standard of a first plurality of CELP voice compression codec-specific encoder modules comprise: 50 standards; a pre-processing module configured to process the speech receiving an input bitstream signal; for encoding: processing the input bitstream signal; a linear prediction analysis module configured to generate generating an output voice signal based on at least infor linear prediction parameters; mation associated with the input bitstream signal, the 55 output voice signal associated with at least a first an excitation generation module configured to generate an standard of a second plurality of CELP voice compres excitation signal by filtering the input speech signal by sion standards; the short-term prediction filter; wherein the processing the input voice signal uses at least a long-term prediction module configured to generate a first common functions library, at least a first common open-loop pitch lag parameters; 60 math operations library, and at least a first common an adaptive codebook module configured to determine an tables library, the first common functions library adaptive codebook lag and an adaptive codebook gain; including a first function; the first common math opera a fixed codebook module configured to determine fixed tions library including a first operation, the first com codebook vectors and a fixed codebook gain; mon tables library including a first table: a bitstream packing module including at least one bit 65 wherein the first function, the first operation and the first stream packing routine and configured to generate the table are associated with at least a second standard and output bitstream signal based on at least codec-specific a third standard of the first plurality of CELP voice US 7,254,533 B1 23 24 compression standards, the second standard and the tion parameter, an adaptive codebook lag, an adaptive code third standard of the first plurality of CELP voice book gain, a fixed codebook index, and a fixed codebook compression standards being different; gain. wherein the generating an output bitstream signal com 25. The method of claim 24 wherein the linear prediction prises: parameter comprises a line spectral frequency. generating a first plurality of codec-specific CELP 26. The method of claim 18 wherein the generating a first parameters based on at least information associated plurality of code-specific CELP parameters comprises: with the input voice signal; performing a linear prediction analysis; packing the first plurality of codec-specific CELP generating linear prediction parameters; parameters to the output bitstream signal; 10 wherein the processing the input bitstream signal uses at filtering the input speech signal by a short-term prediction least a second common functions library, at least a filter; second common math operations library, and a second generating an excitation signal; common tables library, the second common functions determining an adaptive codebook pitch lag parameter; library including a second function, the second com 15 determining an adaptive codebook gain parameter, mon math operations library including a second opera determining an index of a fixed codebook vector associ tion, the second common tables library including a ated with a fixed codebook target signal; second table; determining a gain of the fixed codebook vector. wherein the second function, the second operation and the 27. The method of claim 18 wherein the decoding a second table are associated with at least a second second plurality of codec-specific CELP parameters com standard and a third standard of the second plurality of prises: CELP voice compression standards, the second stan dard and the third standard of the second plurality of reconstructing an excitation signal; CELP voice compression standards being different; synthesizing the excitation signal; wherein the generating an output voice signal comprises: 25 generating an intermediate speech signal; unpacking the input bitstream signal; processing the intermediate speech signal to improve a decoding a second plurality of codec-specific CELP perceptual quality. parameters to produce an output voice signal. 28. The method of claim 18, wherein the first plurality of 19. The method of claim 18 wherein the first common CELP voice compression standards comprises GSM-EFR, functions library, the first common math operations library 30 GSM-AMR Narrowband, and GSM-AMR Wideband. and the first common tables library are made by at least an 29. The method of claim 18, wherein the first plurality of algorithm factorization module, the algorithm factorization CELP voice compression standards comprises EVRC and module configured to store a first plurality of generic func SMV. tions, a first plurality of operations and a first plurality of 30. The method of claim 18 wherein the first plurality of tables in the first common functions library, the first com 35 mon math operations library and the first common tables CELP voice compression standards are the same as the library. second plurality of CELP voice compression standards. 20. The method of claim 18 wherein the output bitstream 31. The method of claim 18 wherein the first standard of signal is bit exact for the first standard of the first plurality the first plurality of CELP voice compression standards is of CELP voice compression standards. 40 the same as the first standard of the second plurality of CELP 21. The method of 18 wherein the output bitstream signal Voice compression standards. is equivalent in quality for the first standard of the first 32. The method of claim 18 wherein the first standard of plurality of CELP voice compression standards. the first plurality of CELP voice compression standards is 22. The method of claim 18 wherein the output voice the same as the second standard or the third standard of the signal is bit exact for the first standard of the second plurality 45 first plurality of CELP voice compression standards. of CELP voice compression standards. 33. The method of claim 18 wherein the first standard of 23. The method of 18 wherein the output voice signal is the second plurality of CELP voice compression standards is equivalent in quality for the first standard of the second the same as the second standard or the third standard of the plurality of CELP voice compression standards. second plurality of CELP voice compression standards. 24. The method of claim 18 wherein the first plurality of 50 codec-specific CELP parameters comprise a linear predic