US007788091B2

(12) United States Patent (10) Patent N0.: US 7,788,091 B2 Goudar et a]. (45) Date of Patent: Aug. 31, 2010

(54) METHODS, DEVICES AND SYSTEMS FOR 6,766,289 B2 7/2004 Kandhadai et a1. IMPROVED PITCH ENHANCEMENT AND 6,789,059 B2 9/2004 Kandhadai et a1. AUTOCORRELATION IN VOICE CODECS 2002/0007269 A1 V2002 G30 2002/0103638 A1 8/2002 Gao (75) Inventors: ChanaveeragoudaV. Goudar, 2003/0078771 A1 4/2003 Jung et 31' Bangalore (IN); Murali M. Deshpande, ghyssen _ ee et al. Bangalore (IN); Pankal Rabha’ 2004/0181400 A1 * 9/2004 Kannan et al...... 704/223 Bangalore (IN) OTHER PUBLICATIONS (73) Asslgneez Texas Instruments Incorporated’ 3GPP2, “Selectable Mode Vocoder Service Option for Wideband Dallas’ TX (Us) Spread Spectrum Communication Systems,” Dec. 2001, C.S0030-0,

_ _ _ _ _ Version 2.0.* ( * ) Nonce: sublect to any dlsclalmer: the term Ofthls 3GPP2, “Selectable Mode Vocoder Service Option for Wideband Pawnt is extended or adjusted under 35 Spread Spectrum Communication Systems,” Dec. 2001, C.S0030-0, U.S.C. 154(1)) by 1315 days. Version 2.0, pp. 3-4, 13-14, 61, 143-145, 159-178, Figs. 5.3-1 and 5.6-1,2. (21) Appl. No.: 11/231,686 Baron, “Five Chips From Tl4or, Is It Six?” Microprocessor Report, Instat-MDR, Mar. 17, 2003, Figs. 1, 3, 4. (22) Filed: Sep. 21, 2005 * Cited by examiner (65) Prior Publication Data Primary ExamineriAngela A Armstrong Us 6’ (74-)AZZOI’I’I6J), Agent, OrFirmiWade Brady, Frederick J. Telecky, Jr. Related US. Application Data (57) ABSTRACT (60) Provisional application No. 60/612,494, ?led on Sep. 22, 2004, provisional application NO_ 60/612,497, An electronic circuit includes a storage circuit and a micro ?led on Sep 22, 2004' processor operable together With the storage circuit as a speech coder. The speech coder has a backward pitch (51) Int. Cl. enhancement in frames or subframes having a length and at G10L 11/04 (2006.01) least one main pulse and at least one backward pitch enhance G10L 19/12 (2006.01) ment pulse preceding the main pulse by a portion of the length (52) US. Cl...... 704/207; 704/223; 704/221 Called a pitch lag, and is operable to limit in number any such (58) Field of Classi?cation Search ...... None backward Pitch enhancement Pulse Or pulses to a predeter See application ?le for Complete Search history mined maximum number more than none upon an occurrence When the length divided by the pitch lag is at least one more (56) References Cited than that maximum number. Other forms of the invention involve systems, circuits, devices, processes and methods of U.S. PATENT DOCUMENTS operation. 6,470,309 B1 10/2002 McCree 6,493,665 B1 12/2002 Su et al. 40 Claims, 9 Drawing Sheets

320 330 340 \ \ / SPEECH, PITCH PERCEPTUAL WEIGHTED INFORMATION WEIGHTING SPEECH / \ FILTER MODIFICATION 350 3/65 /360 SELECT RATE PI'JKQON AND FRAME RATE ssof ESTI TYPE DEPENDENT. TYPE- —> . DEPENDENT SPEECH PROCESSING CTRL _’ VAD CLASSIFY LPC / \ 395/ ANALYZE 385 370 L- LSFSMOOTH LSF QUANTIZATION / \ 390 375 US. Patent Aug. 31, 2010 Sheet 1 019 US 7,788,091 B2

1000

CELILAR WLAN AP —DSL INTERNETO— _ BASE (ACCESS CABLE / ——ETHERNET 1040 \ 1060

CELL PHONE PCB

1020 O / 1010 1080 ‘F / VOICE <-> DSL WLAN <—> CABLE @ @ GATEWAY A_ © @D 1085 C) Q Q USB 0 O G Y G O 6 PERSONAL _DSL _ 1055 o O O :lCOMPUTER ——-—CAB|_E @ —I:_I'HERNET :4 \ 1050

FIG. 1

US. Patent Aug. 31, 2010 Sheet 5 of9 US 7,788,091 B2

FIG. 5 320 \ SPEECH PRE-PROCESSING I I 330\ SPEECH RATE /520 WEIGHTING DECISION I WEIGHTED SPEECH 340/ MODIFICATION I v CODEBOOK PROCESSING 550/ ACB/FCB1/FCB2/FCB3

ERROR FUNCTION: TARGET FILTER SIGNAL MATRIX \ /J 2 8 = llTg-glHlcll / \ GAIN EXCITATION vECToR _:_ _0_ ~ 0 FIG. 6 < ~ 0

Ei=c_+pi= - + - 5 0 ~ 1 <— PULSE —

~ 0 —**_ _0_

elNTP V PITCH ENHANCEMENT: | LSF US. Patent Aug. 31, 2010 Sheet 6 of9 US 7,788,091 B2

FIG. 7 710 \ CALCULATION OF IMPULSE RESPONSE OF WEIGHTED SYNTHESIS FILTER I PITCH ENHANCEMENT OF IMPULSE 720 RESPONSE USING FORWARD AND \ BACKWARD ENHANCEMENTS NUMBER OF BACKWARD ENHANCEMENT Pmax=(|nt)Subframe Size/Pitch lag I PITCH ENHANCED IMPULSE RESPONSE 730 / FOR FIXED CODEBOOK SEARCH

FIG. 8 310 \ CALCULATION OF IMPULSE RESPONSE OF WEIGHTED SYNTHESIS FILTER I 820 \ CALCULATE NUMBER OF BACKWARD ENHANCEMENT Pmax=(|nt)Subframe Size/Pitch lag 830 IS SUBFRAME SIZE =53 OR 54?

Pmax=min (2,Pmax) 840 f I? PITCH ENHANCEMENT OF IMPULSE / RESPONSE USING FORWARD AND 850 BACKWARD ENHANCEMENTS I CALCULATE AUTOCORRELATION (Phi) 860 / MATRIX OF IMPULSE RESPONSES I CODEBOOK SEARCH USING Phi 870 / MATRIX BASED ENERGY CALCULATION US. Patent Aug. 31, 2010 Sheet 7 of9 US 7,788,091 B2

LAG=17 FIG. 9

20% 25-3 30-: 35-: 40-5 45-3 50-:

D 510 15 2025 30354045 50

LAG=17 FIG. 10A US. Patent Aug. 31, 2010 Sheet 8 of9 US 7,788,091 B2

LAG=25 FIG. ]0B

10-: 15-:

25: \ 30-: 35—:

LAG=40 FIG. ]0C

0 51015 20 25 30 35 39 US. Patent Aug. 31, 2010 Sheet 9 0f 9 US 7,788,091 B2

1100 FIG. 11 / @1105 IDENTIFY REGIONS OF EQUAL /1110 IMPULSE RESPONSE VECTORS

V imax =MAX(i) [1120 imi" : N (D IN REGION

imax =MAX 7' 1130

SET i‘=imax AND I=ImaX /

1180 COMPUTE (i,j) WHERE i=i' AND I=r AND STORE <1>(i,j) \1150

I INCREMENTALLY COMPUTE-(i,j) (i min ,j'-i'+imin )} AND STORE @(i,

V 1170

imin =1‘min imax =jmax '1 \118O I imin=imin+1 imax =imax \1190 _—______l US 7,788,091 B2 1 2 METHODS, DEVICES AND SYSTEMS FOR communications. An example is Voice over Internet Protocol IMPROVED PITCH ENHANCEMENT AND (VoIP) enabling phone calls over the Internet. AUTOCORRELATION IN VOICE CODECS Wireless and Wireline data communications using Wireless local area netWorks (WLAN), such as IEEE 802.11 compli CROSS-REFERENCE TO RELATED ant, have become especially popular in a Wide range of instal APPLICATIONS lations, ranging from home netWorks to commercial estab lishments. Other Wireless netWorks such as IEEE 802.16 This application is related to provisional US. Patent Appli (WiMax) are emerging. Short-range Wireless data communi cation Ser. No. 60/612,494, ?led Sep. 22, 2004, titled “Meth cation according to the “Bluetooth” and other IEEE 802.15 ods, Devices and Systems for Improved Pitch Enhancement technology permits computer peripherals to communicate in Voice Codecs,” for Which priority under 35 U.S.C. 119(e) With a personal computer or Workstation Within the same (1) is hereby claimed and Which is hereby incorporated herein room. by reference. Security is important in both Wireline and Wireless com This application is related to provisional US. Patent Appli munications for improved security of retail and other business cation Ser. No. 60/612,497, ?led Sep. 22, 2004, titled “Meth commercial transactions in electronic commerce and Wher ods, Devices and Systems for Improved Codebook Search for ever personal and/or commercial privacy is desirable. Added Voice Codecs,” for Which priority under 35 U.S.C. 1 19(e)(1) features and security add further processing tasks to the com is hereby claimed and Which is hereby incorporated herein by munications system. These portend added softWare and hard reference. Ware in systems Where affordability and poWer dissipation are This application is co-?led so that the present U.S. non 20 already important concerns. provisional patent application “Methods, Devices and Sys In very general terms, a speech coder or voice coder is tems for Improved Codebook Search for Voice Codecs” Ser. based on the idea that the vocal chords and vocal tract are No. 11/231,643 and the present U.S. non-provisional patent analogous to a ?lter. The vocal chords and vocal tract gener application “Methods, Devices and Systems for Improved ally make a variety of sounds. Some sounds are voiced and Pitch Enhancement and Autocorrelation in Voice Codecs” 25 generally have a pitch level or levels at a given time. Other Ser. No. 11/231,686 each have the same application ?ling sounds are unvoiced and have a rushing or Whispering or date, and each of said patent applications hereby incorporates sudden consonantal sound to them. To facilitate the voice the other by reference. coding process, voice sounds are converted into an electrical Waveform by a microphone and analog to digital converter. STATEMENT REGARDING FEDERALLY 30 The electrical Waveform is conceptually cut up into succes SPONSORED RESEARCH OR DEVELOPMENT sive frames of a feW milliseconds in duration called a target signal. The frames are individually approximated by the voice Not applicable. coder electronics. In speech or voice coder electronics, pulses can be pro BACKGROUND OF THE INVENTION 35 vided at different times to excite a ?lter. Each pulse has a very Wide spectrum of frequencies Which are comprised in the pulse. The ?lter selects some of the frequencies such as by This invention is in the ?eld of information and communi passing only a band of frequencies, thus the term bandpass cations, and is more speci?cally directed to improved pro ?lter. Circuits and/or processes that provide various pulses, cesses, circuits, devices, and systems for information and 40 more or less ?ltered, excite the ?lter to supply as its output an communication processing, and processes of operating and approximation to the voice sounds of a target signal. Finding making them. Without limitation, the background is further the appropriate pulses to use for the excitation pulses for the described in connection With Wireless and Wireline commu voice coder approximation purposes is involved in the subject nications processing. of codebook search herein. Wireless and Wireline communications of many types have 45 The ?lter(s) are characterized by a set of numbers called gained increasing popularity in recent years. The mobile coef?cients that, for example, may represent the impulse Wireless (or “cellular”) telephone has become ubiquitous response over time When a ?lter is excited With a single pulse. around the World. Mobile telephony has recently begun to Information identifying the appropriate pulses, and the values communicate video and digital data, in addition to voice. of the ?lter coef?cients, and such other information as is Wireless devices, for communicating computer data over a 50 desired, together compactly represent the speech in a given Wide area netWork, using mobile Wireless telephone channels frame. The information is generated as bits of data by a and techniques are also available. Wireline communications processor chip that runs softWare or otherWise operates such as DSL and cable modems and Wireline and Wireless according to a procedure. Generally speaking, gateWays to other netWorks are proliferating. the output of a voice coder is this very compact representation The market for portable devices such as cell phones and 55 Which advantageously substitutes in communication for the PDAs (personal digital assistants) is expanding With many vastly larger number of bits that Would be needed to directly more features and applications. More features and applica send over a communications netWork the voice signal con tions call for microprocessors to have high performance but ver‘ted into digital form at the output of the analog to digital With loW poWer consumption. Thus, keeping the poWer con converter Were there no speech coding. sumption for the microprocessor and related cores and chips 60 A speech or voice decoder is a coder in reverse in the sense to a minimum, given a set of performance requirements, is that the decoder responds to the compact information sent very important. In both the Wireless and Wireline areas, high over a netWork from a coder and produces a digital signal ef?ciency of performance and in operational processes is representing speech that can be converted by a digital-to essential to make affordable products available to a Wider analog converter into an analog signal to produce actual public. 65 sound in a loudspeaker or earphone. Voice over Packet (VoP) communications are further Voice coders and decoders (codecs) run on RISC (Reduced expanding the options and user convenience in telephonic Instruction Set Computing) processors and digital signal pro US 7,788,091 B2 3 4 cessing (DSP) chips and/or other integrated circuit devices and to supply coded speech that depends on different values that are vital to these systems and applications. Reducing the of autocorrelation incrementally generated. computer burden of voice codecs and increasing the e?i Other forms of the invention involve systems, circuits, ciency of executing the software applications on these micro devices, wireline and wireless communication devices, pro processors generally are very important to achieve system cesses and methods of operation, as disclosed and claimed. performance and affordability goals and operate within power dissipation and battery life limits. These goals become BRIEF DESCRIPTION OF THE DRAWINGS even more important in hand held and mobile applications where small siZe is so important, to control the real-estate, FIG. 1 is a pictorial diagram of a communications system memory space and the power consumed. including a cellular base station, two cellular telephone hand sets, a WLAN AP (wireless local area network access point), SUMMARY OF THE INVENTION a WLAN gateway with VoP phone, a personal computer (PC) with VoP phone, a WLAN station on the PC, and any one, Generally, a form of the invention involves a process of some or all of the foregoing improved according to the inven backward pitch enhancement for a speech coding method of tion. processing speech in frames or subframes having a length by FIG. 2 is a block diagram of an inventive integrated circuit supplying at least one main pulse and at least sometime asso chip device with any subset or all of the chip circuits for use ciating with the main pulse at least one backward pitch in the blocks of the communications system of FIG. 1 and enhancement pulse preceding the main pulse by a portion of improved according to the invention. the length called a pitch lag. The process involves limiting in 20 number any such backward pitch enhancement pulse or FIG. 3 is a process block diagram of SMV (Selectable Mode Vocoder) as example platform for inventive improve pulses to a predetermined maximum number more than none upon an occurrence when the length divided by the pitch lag ments to blocks as taught herein resulting in an inventive vocoder for the systems and devices of FIGS. 1 and 2. is at least one more than that maximum number. FIG. 4 is a more detailed process block diagram of a Rate Generally, another form of the invention involves a method 25 of pitch enhancement including determining whether sub and Type Dependent Processing block in FIG. 3, and having frame siZe is in a predetermined range and when subframe codebooks searched according to inventive improvements siZe is in the predetermined range, limiting backward herein for exciting ?lter operation to approximate a target enhanced pulses to a maximum of two, and computing a signal Tg. pitch-enhanced ?lter impulse response based on the back 30 FIG. 5 is a process block diagram of SMV as example ward enhanced pulses. platform for inventive improvements to codebook searching Generally, still another form of the invention involves an as taught herein resulting in an inventive vocoder for the electronic circuit including a storage circuit and a micropro systems, devices and processes of FIGS. 1-4. cessor operable together with the storage circuit as a speech FIG. 6 is an illustration of a symbolic representation of data coder. The speech coder has a backward pitch enhancement in 35 structures in which a target signal, ?lter, excitation, and frames or subframes having a length and at least one main pulses are used in the inventive improvements to the pro pulse and at least one backward pitch enhancement pulse cesses ofFIGS. 3-6. preceding the main pulse by a portion of the length called a FIG. 7 is a ?ow diagram of an SMV method for SMV pitch pitch lag, and operable to limit in number any such backward enhancement. pitch enhancement pulse or pulses to a predetermined maxi 40 FIG. 8 is a ?ow diagram of an inventive method for Pitch mum number more than none upon an occurrence when the Enhancement for inventive improvements to codebook length divided by the pitch lag is at least one more than that searching as taught herein resulting in an inventive vocoder maximum number. for the systems, devices and processes of FIGS. 1-5. Generally, a further form of the invention involves a pro FIG. 9 is a data structure diagram of an autocorrelation cess of backward pitch enhancement for a speech coding 45 matrix of impulse responses, or Phi Matrix, 53x53 for Pitch method of processing speech in frames or subframes having a Lag equal to 17, for use in the inventive method for Pitch length by supplying at least one main pulse and at least Enhancement of FIG. 8. sometime associating with the main pulse at least one back FIG. 10A is a data structure diagram of another autocorre ward pitch enhancement pulse preceding the main pulse by a lation matrix of impulse responses, or Phi Matrix, 39x39 for portion of the length called a pitch lag. The process involves 50 Pitch Lag equal to 17, for use in the inventive method for Pitch incrementally generating different values of autocorrelation Enhancement of FIG. 8. of ?lter impulse response within a region of the autocorrela tion where the number of backward pitch enhancement pulses FIG. 10B is a data structure diagram of another autocorre is the same in the region; and supplying coded speech that lation matrix of impulse responses, or Phi Matrix, 39x39 for Pitch Lag equal to 25, for use in the inventive method for Pitch depends on different values of autocorrelation incrementally 55 generated. Enhancement of FIG. 8. Generally, another further form of the invention involves FIG. 10C is a data structure diagram of another autocorre an electronic circuit including a storage circuit and a micro lation matrix of impulse responses, or Phi Matrix, 39x39 for processor operable together with the storage circuit as a Pitch Lag greater than or equal to 40, for use in the inventive speech coder. The speech coder has a backward pitch 60 method for Pitch Enhancement of FIG. 8. enhancement in frames or subframes having a length and at FIG. 1 1 is a ?ow chart representing an inventive method for least one main pulse and at least one backward pitch enhance operating a processor to generate each of several regions of ment pulse preceding the main pulse by a portion of the length the Phi matrix data structure of FIGS. 9, 10A, 10B, 10C for called a pitch lag, and operable for incremental generation of use in the inventive method for Pitch Enhancement of FIG. 8. different values of autocorrelation of ?lter impulse response 65 Corresponding numerals ordinarily identify corresponding within a region of the autocorrelation where the number of parts in the various Figures of the drawing except where the backward pitch enhancement pulses is the same in the region, context indicates otherwise. US 7,788,091 B2 5 6 DETAILED DESCRIPTION circuit chip, in a manner optimally combined or partitioned betWeen the chips, to the extent needed by any of the appli In FIG. 1, an improved communications system 1000 has cations supported by the cellular telephone base station 1040, system blocks as described next. Any or all of the system personal computer(s) 1050 equipped With WLAN, WLAN blocks, such as cellular mobile telephone and data handsets access point 1060 andVoice WLAN gateWay 1080, as Well as 1010 and 1010', a cellular (telephony and data) base station cellular telephones, radios and televisions, ?xed and portable 1040, a WLAN AP (Wireless local area netWork access point, entertainment units, routers, pagers, personal digital assis IEEE 802.11 or otherWise) 1060, a Voice WLAN gateway tants (PDA), organiZers, scanners, faxes, copiers, household 1080 With user voice over packet telephone 1085, and a voice appliances, o?ice appliances, combinations thereof, and enabledpersonal computer (PC) 1050 With anotheruser voice other application products noW knoWn or hereafter devised in over packet telephone 1055, communicate With each other in Which there is desired increased, partitioned or selectively communications system 1000. Each of the system blocks determinable advantages next described. 1010,1010‘,1040,1050,1060,1080 are provided With one or In FIG. 2, an integrated circuit 1100 includes a digital more PHY physical layer blocks and interfaces as selected by baseband (DBB) block 1110 that has a RISC processor (such the skilled Worker in various products, for DSL (digital sub as MIPS core, ARM processor, or other suitable processor) scriber line broadband over tWisted pair copper infrastruc and a digital signal processor (or DSP core) 1110, communi ture), cable (DOCSIS and other forms of coaxial cable broad cations softWare and security softWare for any such processor band communications), premises poWer Wiring, ?ber (?ber or core, security accelerators 1140, and a memory controller. optic cable to premises), and Ethernet Wideband netWork. The memory controller interfaces the RISC core and the DSP Cellular base station 1040 tWo-Way communicates With the 20 core to Flash memory and SDRAM (synchronous dynamic handsets 1010, 1010', With the Internet, With cellular commu random access memory). The memories are improved by any nications netWorks and With PSTN (public sWitched tele one or more of the processes herein. On chip RAM 1120 and phone netWork). on-chip ROM 1130 also are accessible to the processors 1110 In this Way, advanced networking capability for services, for providing sequences of softWare instructions and data softWare, and content, such as cellular telephony and data, 25 thereto. audio, music, voice, video, e-mail, gaming, security, e-com Digital circuitry 1150 on integrated circuit 1100 supports merce, ?le transfer and other data services, internet, World and provides Wireless interfaces for any one or more of GSM, Wide Web broWsing, TCP/IP (transmission control protocol/ GPRS, EDGE, UMTS, and OF DMA/MIMO (Global System Internet protocol), voice over packet and voice over Internet for Mobile communications, General Packet Radio Service, protocol (VoP/VoIP), and other services accommodates and 30 Enhanced Data Rates for Global Evolution, Universal Mobile provides security for secure utiliZation and entertainment Telecommunications System, Orthogonal Frequency Divi appropriate to the just-listed and other particular applications. sion Multiple Access and Multiple Input Multiple Output The embodiments, applications and system blocks dis Antennas) Wireless, With or Without high speed digital data closed herein are suitably implemented in ?xed, portable, service, via an analog baseband chip 1200 and GSM transmit/ mobile, automotive, seaborne, and airborne, communica 35 receive chip 1300. Digital circuitry 1150 includes ciphering tions, control, set top box, and other apparatus. The personal processor CRYPT for GSM ciphering and/or other encryp computer (PC) 1050 is suitably implemented in any form tion/ decryption purposes. Blocks TPU (Time Processing Unit factor such as desktop, laptop, palmtop, organiZer, mobile real-time sequencer), TSP (Time Serial Port), GEA (GPRS phone handset, PDA personal digital assistant, intemet appli EncryptionAlgorithm block for ciphering at LLC logical link ance, Wearable computer, personal area netWork, or other 40 layer), RIF (Radio Interface), and SPI (Serial Port Interface) type are included in digital circuitry 1150. For example, handset 1010 is improved and remains Digital circuitry 1160 provides codec for CDMA (Code interoperable and able to communicate With all other simi Division Multiple Access), CDMA2000, and/or WCDMA larly improved and unimproved system blocks of communi (Wideband CDMA or UMTS) Wireless With or Without an cations system 1000. On a cell phone printed circuit board 45 HSDPA/HSUPA (High Speed DoWnlink PacketAccess, High (PCB) 1020 in handset 1010, FIGS. 1 and 2 shoW a processor Speed Uplink Packet Access) (or lxEV-DV, lxEV-DO or integrated circuit and a serial interface such as a USB inter 3xEV-DV) data feature via the analog baseband chip 1200 face connected by a USB line to the personal computer 1050. and an RF GSM/CDMA chip 1300. Digital circuitry 1160 Reception of softWare, intercommunication and updating of includes blocks MRC (maximal ratio combiner for multipath information are provided betWeen the personal computer 50 symbol combining), ENC (encryption/decryption), RX 1050 (or other originating sources external to the handset (doWnlink receive channel decoding, de-interleaving, viterbi 1010) and the handset 1010. Such intercommunication and decoding and turbo decoding) and TX (uplink transmit con updating also occur automatically and/or on request via volutional encoding, turbo encoding, interleaving and chan WLAN, Bluetooth, or other Wireless circuitry. neliZing.). Block ENC has blocks for uplink and doWnlink FIG. 2 illustrates inventive integrated circuit chips includ 55 supporting con?dentiality processes of WCDMA. ing chips 1100, 1200, 1300, 1400, 1500 for use in the blocks Audio/voice block 1170 supports audio and voice func of the communications system 1000 of FIG. 1. The skilled tions and interfacing. Speech/voice codec(s) are suitably pro Worker uses and adapts the integrated circuits to the particular vided in memory space in audio/voice block 1170 for pro parts of the communications system 1000 as appropriate to cessing by processor(s) 1110. Applications interface block the functions intended. For conciseness of description, the 60 1180 couples the digital baseband chip 1100 to an applica integrated circuits are described With particular reference to tions processor 1400. Also, a serial interface in block 1180 use of all of them in the cellular telephone handsets 1010 and interfaces from parallel digital busses on chip 1100 to USB 1010' by Way of example. (Universal Serial Bus) of PC (personal computer) 1050. The It is contemplated that the skilled Worker uses each of the serial interface includes UARTs (universal asynchronous integrated circuits shoWn in FIG. 2, or such selection from the 65 receiver/transmitter circuit) for performing the conversion of complement of blocks therein provided into appropriate other data betWeen parallel and serial lines. Chip 1100 is coupled to integrated circuit chips, or provided into one single integrated location-determining circuitry 1190 for GPS (Global Posi US 7,788,091 B2 7 8 tioning System). Chip 1100 is also coupled to a USIM 1900 MHZ and other frequency bands as appropriate. (UMTS Subscriber Identity Module) 1195 or other SIM for Depending on the band in use, the output of LNAs couples to user insertion of an identifying plastic card, or other storage GSM/GPRS/EDGE/UMTS/CDMA demodulator 1370 to element, or for sensing biometric information to identify the produce the I/O or other outputs thereof (in-phase, quadra user and activate features. ture) to the GSM/GPRS/EDGE/UMTS/CDMA baseband In FIG. 2, a mixed-signal integrated circuit 1200 includes block 1210. an analog baseband (ABB) block 1210 for GSM/GPRS/ Further in FIG. 2, an integrated circuit chip or core 1400 is EDGE/UMTS/HSDPA Which includes SPI (Serial Port Inter provided for applications processing and more off-chip face), digital-to-analog/analog-to-digital conversion DAC/ peripherals. Chip (or core) 1400 has interface circuit 1410 ADC block, and RF (radio frequency) Control pertaining to including a high-speed WLAN 802.1 la/b/ g interface coupled GSM/GPRS/EDGE/UMTS and coupled to RF (GSM etc.) to a WLAN chip 1500. Further provided on chip 1400 is an chip 1300. Block 1210 suitably provides an analogous ABB applications processing section 1420 Which includes a RISC for CDMA Wireless and any associated lxEV-DV, lxEV-DO processor (such as MIPS core, ARM processor, or other suit or 3xEV-DV data and/or voice With its respective SPI (Serial able processor), a digital signal processor (DSP), and a shared Port Interface), digital-to-analog conversion DAC/ADC memory controller MEM CTRL With DMA (direct memory block, and RF Control pertaining to CDMA and coupled to access), and a 2D (tWo-dimensional display) graphic accel RF (CDMA) chip 1300. erator. Speech/voice codec functionality is suitably processed An audio block 1220 has audio I/O (input/output) circuits in chip 1400, in chip 1100, or both chips 1400 and 1100. to a speaker 1222, a microphone 1224, and headphones (not The RISC processor and the DSP in section 1420 have shoWn). Audio block 1220 has an analog-to-digital converter 20 access via an on-chip extended memory interface (EMIF/CF) (ADC) coupled to the voice codec and a stereo DAC (digital to off-chip memory resources 1435 including as appropriate, to analog converter) for a signal path to the baseband block mobile DDR (double data rate) DRAM, and ?ash memory of 1210 including audio/voice block 1170, and With suitable any of NAND Flash, NOR Flash, and Compact Flash. On chip encryption/ decryption activated or not. 1400, the shared memory controller in circuitry 1420 inter A control interface 1230 has a primary host interface (I/F) 25 faces the RISC processor and the DSP via an on-chip bus to and a secondary host interface to DBB-related integrated on-chip memory 1440 With RAM and ROM. A 2D graphic circuit 1100 of FIG. 2 for the respective GSM and CDMA accelerator is coupled to frame buffer internal SRAM (static paths. The integrated circuit 1200 is also interfaced to an I2C random access memory) in block 1440.A security block 1450 port of applications processor chip 1400 of FIG. 2. Control includes secure hardWare accelerators having security fea interface 1230 is also coupled via access arbitration circuitry 30 tures and provided for accelerating encryption and decryption to the interfaces in circuits 1250 and the baseband 1210. of any one or more types knoWn in the art or hereafter devised. A poWer conversion block 1240 includes buck voltage conversion circuitry for DC-to-DC conversion, and loW On-chip peripherals and additional interfaces 1410 include dropout (LDO) voltage regulators for poWer management/ UART data interface and MCSI (Multi-Channel Serial Inter face) voice Wireless interface for an off-chip IEEE 802.15 sleep mode of respective parts of the chip regulated by the 35 LDOs. PoWer conversion block 1240 provides information to (“Bluetooth” and high and loW rate piconet and personal and is responsive to a poWer control state machine shoWn netWork communications) Wireless circuit 1430. Debug mes betWeen the poWer conversion block 1240 and circuits 1250. saging and serial interfacing are also available through the Circuits 1250 provide oscillator circuitry for clocking chip UART. A JTAG emulation interface couples to an off-chip emulator Debugger for test and debug. Further in peripherals 1200. The oscillators have frequencies determined by one or 40 more crystals. Circuits 1250 include a RTC real time clock 1410 are an I2C interface to analog basebandABB chip 1200, (time/date functions), general purpose I/O, a vibrator drive and an interface to applications interface 1180 of integrated (supplement to cell phone ringing features), and a USB On circuit chip 1100 having digital baseband DBB. The-Go (OTG) transceiver. A touch screen interface 1260 is Interface 1410 includes a MCSI voice interface, a UART coupled to a touch screen XY 1266 off-chip. 45 interface for controls, and a multi-channel buffered serial port Batteries such as a lithium-ion battery 1280 and backup (McBSP) for data. Timers, interrupt controller, and RTC (real battery provide poWer to the system and battery data to circuit time clock) circuitry are provided in chip 1400. Further in 1250 on suitably provided separate lines from the battery peripherals 1410 are a MicroWire (u-Wire 4 channel serial pack. When needed, the battery 1280 also receives charging port) and multi-channel buffered serial port (McBSP) to off current from a Battery Charge Controller in analog circuit 50 chip Audio codec, a touch-screen controller, and audio ampli 1250 Which includes MADC (Monitoring ADC and analog ?er 1480 to stereo speakers. External audio content and touch input multiplexer such as for on-chip charging voltage and screen (in/out) and LCD (liquid crystal display) are suitably current, and battery voltage lines, and off-chip battery volt provided. Additionally, an on-chip USB OTG interface age, current, temperature) under control of the poWer control couples to off-chip Host and Client devices. These USB com state machine. 55 munications are suitably directed outside handset 1010 such In FIG. 2 an RF integrated circuit 1300 includes a GSM/ as to PC 1050 (personal computer) and/or from PC 1050 to GPRS/EDGE/UMTS/CDMA RF transmitter block 1310 update the handset 1010. supported by oscillator circuitry With off-chip crystal (not An on-chip UART/IrDA (infrared data) interface in inter shoWn). Transmitter block 1310 is fed by baseband block faces 1410 couples to off-chip GPS (global positioning sys 1210 of chip 1200. Transmitter block 1310 drives a dual band 60 tem) and Fast IrDA infrared Wireless communications device. RF poWer ampli?er (PA) 1330. On-chip voltage regulators An interface provides EMT9 and Camera interfacing to one maintain appropriate voltage under conditions of varying or more off-chip still cameras or video cameras 1490, and/or poWer usage. Off-chip sWitchplexer 1350 couples Wireless to a CMOS sensor of radiant energy. Such cameras and other antenna and sWitch circuitry to both the transmit portion apparatus all have additional processing performed With 1310, 1330 and the receive portion next described. SWitch 65 greater speed and ef?ciency in the cameras and apparatus and plexer 1350 is coupled via band-pass ?lters 1360 to receiving in mobile devices coupled to them With improvements as LNAs (loW noise ampli?ers) for 850/900 MHZ, 1800 MHZ, described herein. Further in FIG. 2, an on-chip LCD control US 7,788,091 B2 9 1 0 ler and associated PWL (Pulse-Width Light) block in inter a ?oating point reference. SMV speech quality is believed to faces 1410 are coupled to a color LCD display and its LCD be better than EVRC (Enhanced Variable Rate Codec)(TIA light controller off-chip. IS-127) at the same average data rate (mode 0) and equivalent Further, on-chip interfaces 1410 are respectively provided to EVRC at a loWer data rate (mode 1). The complexity of for off-chip keypad and GPIO (general purpose input/ output). SMV in MIPS (millions of instructions per second) is the On-chip LPG (LED Pulse Generator) and PWT (Pulse-Width highest among CDMA speech codecs. Tone) interfaces are respectively provided for off-chip LED SMV processing involves frame processing and rate-de and buZZer peripherals. On-chip MMC/ SD multimedia and pendent excitation coding. The frame processing includes ?ash interfaces are provided for off-chip MMC Flash card, speech pre-processing, computation of spectral Envelope SD ?ash card and SDIO peripherals. Parameters, signal modi?cation, and rate selection. The SMV In FIG. 2, a WLAN integrated circuit 1500 includes MAC encoder frame processing Which includes speech pre-pro (media access controller) 1510, PHY (physical layer) 1520 cessing, LPC analysis, signal modi?cation and LSF quanti and AFE (analog front end) 1530 for use in various WLAN Zation has complexity of about 50% or half the complexity of and UMA (Unlicensed Mobile Access) modem applications. the SMV encoder. The rate-dependent excitation coding PHY 1520 includes blocks for BARKER coding, CCK, and involves an adaptive codebook search, a ?xed codebook OFDM. PHY 1520 receives PHY Clocks from a clock gen search With complexity of about 40% that of the encoder in eration block supplied With suitable off-chip host clock, such the Worst case, and gain quantiZation. Overall, the SMV as at 13, 16.8, 19.2, 26, or 38.4 MHZ. These clocks are encoder rate-dependent excitation coding is about 50% or compatible With cell phone systems and the host application half of the complexity of the SMV encoder. is suitably a cell phone or any other end-application. AFE 20 The computational complexity of the SMV speech codec is 1530 is coupled by receive (Rx), transmit (Tx) and CON higher than other CDMA speech codecs. A signi?cant portion TROL lines to WLAN RF circuitry 1540. WLAN RF 1540 of the computational complexity in the SMV speech codec includes a 2.4 GHZ (and/or 5 GHZ) direct conversion trans can be attributed to a ?xed codebook search that is done using ceiver, or otherWise, and poWer amplifer and has loW noise multiple codebooks. Some embodiments of ?xed codebook ampli?er LNA in the receive path. Bandpass ?ltering couples 25 search procedure for improving SMV and other voice coding WLAN RF 1540 to a WLAN antenna. In MAC 1510, Security processes are based on a special approach called Selective circuitry supports any one or more of various encryption/ Joint Search herein. decryption processes such as WEP (Wired Equivalent Pri SMV encodes each 20 millisecond speech frame at one of vacy), RC4, TKIP, CKIP, WPA, AES (advanced encryption four different bit rates: full-rate (1 ), half-rate (1/2), quarter-rate standard), 802.11i and others. Further in WLAN 1500, a 30 (1/4) and one-eighth-rate (Vs). The bit rate chosen depends on processor comprised of an embedded CPU (central process the mode of operation and the type of speech signal. ing unit) is connected to internal RAM and ROM and coupled Frames assigned to full-rate (Rate 1) are further classi?ed to provide QoS (Quality of Service) IEEE 802.1 1e operations as Voiced-Stationary (Type 1) and Voiced-Non-Stationary WME, WSM, and PCF (packet control function). A security (Type 0). Each of these tWo classes is associated With one or block in WLAN 1500 has busing for data in, data out, and 35 more “?xed codebooks” (FCB). Each ?xed codebook con controls interconnected With the CPU. Interface hardWare sists of a list of pulse positions or a set of pulse combinations. and internal RAM in WLAN 1500 couples the CPU With One important step in the process of encoding speech is interface 1410 of applications processor integrated circuit choosing the best pulse position(s) or combination from a 1400 thereby providing an additional Wireless interface for codebook. The best pulse combination in the one that results the system of FIG. 2. Still other additional Wireless interfaces 40 in the loWest value of an error function and the highest value such as for Wideband Wireless such as IEEE 802.16 for a Cost function (herein referring to a data structure or “WiMAX” mesh netWorking and other standards are suitably function having a value that goes up as the error function goes provided and coupled to the applications processor integrated doWn) among the pulse combinations that are searched. The circuit 1400 and other processors in the system. Cost function increases With the goodness of ?t, or goodness Further described next are improved voice codecs, struc 45 of approximation of the coded speech to the real speech being tures and processes and improving the systems and devices of coded. Thus, the Cost function is high When an error function, FIGS. 1 and 2 With them. In the subsequent Figures, Select such as the difference betWeen the coded speech and the real able ModeVocoder (SMV standard of 3GPP2 organization) is speech being coded, is small. used Without limitation as an example platform for improve In the codebook search, the Cost function is maximiZed so ments. It is emphasiZed that the improvements are generally 50 that the error function is minimized. For example, suppose applicable in voice codec search procedures and all other ?rst and second tracks (lists of pulse positions in a codebook) search procedures to Which the advantages of the improve contribute respective amounts X andY to the Cost function ments herein commend their use. ACELP-based FCB and provide a combined contribution to the Cost function. searches (Algebraic Code Excited Linear Prediction Fixed Further suppose X exceeds or is greater thanY, Q(>Y). Hence CodeBook search procedures) and other procedures With 55 the second track contributes less to the Cost function, and the pitch enhancement and otherWise are suitably improved by second track is probably underperforming and hence it is to be the inventive structures and processes taught herein. re?ned. The process re?nes the underperforming tracks SMV is an ACELP based speech codec. The quality of the because that is Where re?nement can contribute the greatest speech attained by SMV and its multimodal operation capa improvement or increase to the Cost function. Note that the bility makes it quite suitable for Wireless mobile communi 60 term “track” is sometimes used herein slightly differently cation. The multi-mode feature of SMV varies the Rate and than may be the case in the SMV spec. Herein, “track” can trades off channel bandWidth and voice quality as the Rate is refer to the list or set of pulse positions available to a respec changed. Applications include Wireline and Wireless voice tive pulse, even When another pulse may have an identical list gateWays and 3G third generation and higher generation cell or set of pulse positions available to it. In case a choice needs phone Wireless handsets as Well as other products shoWn in 65 to be made about re?nement as betWeen pulses having an FIG. 1. Minimum performance speci?cations are de?ned for identical list, the pulse having a pulse position in a previous SMV by subjective and objective comparison With respect to search that contributed less to the Cost function ranks higher US 7,788,091 B2 11 12 or more in need of re?nement than a second pulse having the Rate 1/2 (). Note in particular a Fixed-Codebook identical list of pulse positions available to it. based analysis-by-synthesis feedback circuit 410. This circuit In the voiced-stationary case (Type 1) of SMV 1, 410 is related to the subject of the improvements discussed a single codebook of eight (8) pulse tracks is used. In the case herein. Circuit 410 receives a “target signal” T8 at a subtractor of eight tracks, after the re?nement is over, the result is that 420. Target signal Tg represents the speech (remaining after the target T8 is noW approximated by all eight (8) pulse posi adaptive codebook operations in a block 480 near block 410) tion in eight tracks, i.e., one pulse position from each of the to be optimally coded by block 410. The ?xed codebook eight tracks, namely the tWo (2) highest-contributing tracks block 410 includes a Fixed Codebook operations block 430 plus six (6) underperforrning tracks that got re?ned and put folloWed by a synthesis ?lter 440. A perceptual Weighting through ?lter H. The tWo highest tracks are included because ?lter 450 couples synthesis ?lter 440 to subtractor 420. An they Were the original best tWo performers out of the eight. error signal line 460 and Minimization block 470 couple Usually, not all the track candidates are underperformers. In subtractor 420 to ?xed codebook block 430 to complete a this example, six (6) underperforming tracks are chosen as a feedback loop. Minimization block 470 is fed With control trade-offbetWeen computational complexity versus best pos parameters CTRL from Speech Classify block 370 of FIG. 3. sible track choice pulse position quality. Embodiments suit Synthesis Filter 440 is fed With LSF Quantization informa ably vary for different applications, and different implemen tion from block 375. Fixed Codebook 430 has an output that tations of the same application, in the numbers of tracks that is multiplied by optimal ?xed codebook gain. are selected for re?nement. In FIG. 4, an Adaptive Codebook ?lter block 480 is orga In the voiced-non-stationary case (Type 0) of SMV Full nized similarly to Fixed Codebook ?lter block 410 and has a Rate 1, any one of three codebooks are used, and this choice 20 similar loop of Adaptive Codebook, multiplier, Synthesis is based on secondary excitation characteristics maximizing Filter, Perceptual Weighting Filter, subtractor, and minimiza the Cost function. tion looping back to Adaptive Codebook. Block 480 has a In the description herein, the term “Cost function” is used subtractor input for Modi?ed Weighted Speech from block to refer to a degree of approximation for improving and 340. Block 480 has a multiplier input for pitch gain multipli increasing voice coding quality. The term “Cost function” is 25 cation of Adaptive Codebook output. LSF Quantization from not herein referring to ?nancial or monetary expense nor to block 375 is provided to the Synthesis Filter in block 480. technological complexity, any of Which can be reduced by the Completion of the block 480 loop With a minimization block improvements herein even though the Cost function is applies to voiced non-stationary (Type 0) frames. Minimiza increased. tion is omitted from the block 480 loop for processing voiced FIG. 3 shoWs a method 310 for frame processing Which 30 stationary (Type 1) frames. provides the context for improvements over Selectable Mode Further in FIG. 4, an Energy block 495 is fed With Modi?ed Vocoder (SMV). Reference is made to “Selectable Mode Weighted Speech from block 340 of FIG. 3, and With respec Vocoder Service Option for Wideband Spread Spectrum tive outputs from Adaptive Codebook ACB and Fixed Code Communication Systems,” 3GPP2 C.S0030-0, Version 2.0, book FCB of FIG. 4. December, 2001 for background, Which is hereby incorpo 35 A Vector Quantization Gain Codebook ?lter block 490 is rated herein by reference. organized someWhat similarly to Fixed Codebook ?lter block A Speech Pre-processor 320 provides pre-processed 410 and has a similar loop, except the Vector Quantization speech as input to a Perceptual Weighting Filter 330 that Gain Codebook feeds multipliers respectively fed by Adap produces Weighted speech as input to Signal Modi?cation tive Codebook and Fixed Codebook 430. In block 490 a block 340. Block 340 in turn supplies modi?ed Weighted 40 Synthesis Filter receives a sum of the multiplier outputs, speech to a line 350 to Rate and Type Dependent Processing responds to LSF Quantization input, and is folloWed by Per 360. Further blocks 365, 370, 375 supply inputs to Rate and ceptual Weighting Filter, subtractor, and minimization loop Type Dependent Processing 360. Block 365 provides Rate ing back to Vector Quantization Gain Codebook. Block 490 and Frame Type Selection. Also, blocks 365 and 370 each has a subtractor input fed by the Energy block 495. interact bi-directionally With Weighted Speech Modi?cation 45 FIG. 5 summarizes an aspect of the process of ?nding the block 340. Block 370 provides controls CTRL pertaining to right pulses to excite a ?lter to approximate the target signal speech classi?cation. Block 375 supplies LSF (Line Spectral Tg. Pre-processed speech from block 320 is Weighted by Frequency) Quantization information. Line Spectral Fre block 330 and is modi?ed by block 340 and sent to codebook quencies (LSFs) represent the digital ?lter coef?cients in a processing 550. A ?xed codebook has predetermined infor pseudo-frequency domain for application in the Synthesis 50 mation that designates time positions for each of a predeter Filter 440. mined number of pulses that are alloWed to excite the ?lter(s) A Pitch Estimation block 380 is fed by Perceptual Weight for a given type of voice frame. Rate and Type decision ing Filter 330, and in turn supplies pitch estimation informa signals from block 520 are coupled to the Codebook Process tion to Weighted Speech Modi?cation 340, to Select Rate and ing block 550 in response to processed speech frames origi Frame Type block 365 and to Speech Classify block 370. 55 nated at block 320. Codebook Processing block 550 has adap Speech Classify block 370 is fed With pre-processed speech tive codebook ACB and ?xed codebook FCB. For instance, from Speech Pre-processing block 320, and With controls for analyzing Rate 1 frames, a ?xed codebook is provided for from a (VAD) block 385. VAD 385 analyzing Type 1 frames. Multiple sub-codebooks FCB1, also feeds an output to an LSF Smoothing block 390. LSF FCB2, FCB3 are provided for analyzing Type 0 frames. Smoothing block 390 in turn is coupled to an input of LSF 60 Each of multiple excitation pulses for use in speech exci Quantization block 375. An LPC () tation approximation is allocated a “track” in the codebook Analyze block 395 is responsive to Speech Pre-processing (or sub-codebook). The track for a respective pulse has a list 320 to supply LPC analysis information to VAD 385 and to of numbers that designates the set of alternative time posi LSF Smoothing 390. tions, i.e., pulse positions that the codebook alloWs that pulse FIG. 4 shoWs greater detail of Rate and Type Dependent 65 to occupy. “Codebook searching” involves ?nding the best Processing 360 of FIG. 3. FIG. 4, among other things, illus number in a given track, and the best combination of pulses trates a method for excitation coding for Rate 1 (full-rate) and With Which to de?ne the set or subset of pulses Which are US 7,788,091 B2 13 14 identi?ed and selected to excite the ?lter(s) of the analysis The sub-codebook that minimiZes the error criterion by-synthesis feedback circuit 410. In this Way, the process (maximiZes the Cost function) is selected. The chosen sub homes in on the approximation to a target signal T8, for codebook is re?ned further using three turns of sequential instance. joint search procedure. Various embodiments herein pertain to and improve ?xed Method Embodiment: In a further embodiment, one of the codebook search in fall-rate SMV and other codebook three sub-codebooks is chosen using a single pulse search. Further re?nement of the selected best sub-codebook is done searching applications in voice codecs and otherWise. The using Selective Joint Search instead of sequential joint search existing and inventive methodologies are described beloW. procedure. The same Selective Joint Search procedure as Certain aspects of the search method are also described and described in Voiced-Stationary (Type 1) case is used for illustrated in the co-?led and incorporated patent application selecting the tracks for further re?nement. In the Selective TI-38348. Joint Search procedure the search is restricted to six tracks in “Re?nement” means search each of the pairs With joint the codebook. These six tracks correspond to the pulses that search (except Where the context speci?cally refers to single contribute least to a Cost function that is maximiZed When the pulse search) and, in the search process, pick the pulses Which error function is minimiZed. The error function is based on a maximiZe the Cost function. “Search,” “re?ne” and “re?ne mean squared error criterion. ment” are often used synonymously herein. Searching Second Method Embodiment: Fast-select one sub-code includes accessing codebook tracks and picking the pulses book, single-pulse search it, then Selective Joint Search is Which maximiZe the Cost function, Which thereby improves used to search that sub-codebook. The procedure of selecting the approximation that is the goal of the procedure. 20 one among three sub-codebooks is eliminated. This elimi nates the complexity of searching additional tWo more sub Rate 1 Voiced-Stationary (Type 1): codebooks. The sub-codebook chosen is a priori decided, or Standard SMV Methodology: The FCB for SMV Full Rate dynamically predetermined prior to the single-pulse search, 1 consists of a combination of eight (8) pulses. The FCB based on input parameters to the sub-codebook search. search procedure consists of a sequence of repeated re?ne 25 The just-described Method Embodiments reduce the com ments referred to as “turns”. putational complexity of the ?xed codebook search by 66% Each turn consists of several iterations. In each iteration for Without affecting the perceptual quality With respect to stan a given “tum,” the process searches for a best pulse position of dard SMV decoded speech. each pulse or a pair of pulses, While keeping all the other Selective Joint Search is used to improve the voice coding pulses at their previously determined positions. 30 by restricting the search procedure to a reduced number of The eight (8) pulse codebook is searched in tWo (2) turns tracks in the codebook. The tracks associated With the pulses using a standard “sequential joint search” procedure. A that contribute least to a Cost function criterion are selected as sequential joint search ?nds out best tWo (2) pulses position they are more likely to be modi?ed in further re?nements. from the given set of candidate pulse positions speci?ed by Among other advantages, the method embodiment is com tWo adjacent “tracks” in the FCB. Here each track consists of 35 putationally more ef?cient as it reduces the computational candidate pulse positions. This is folloWed by tWo (2) turns of complexity up to 66% With respect to the standard ?xed iterative single pulse search. This described search procedure codebook search in SMV Without affecting the perceptual is computationally very demanding. An e?icient alternative quality of speech. The speech quality for the described to this search procedure is described beloW. method embodiment is perceptually the same With respect to b) Method Embodiment: In an embodiment, single pulse 40 standard SMV. Hence, this procedure can make the imple search is done in the ?rst turn unlike the tWo (2) turns of mentation of SMV computationally more e?icient than the sequential joint search in the standard SMV methodology. standard SMV. This gives the initial estimation of the pulse positions. This is A high density code upgrade embodiment reduces the folloWed by a special process herein called Selective Joint computational complexity substantially. Greater channel Search unlike the tWo (2) turns of iterative single pulse search density in channels per DSP core (9 vs. 7 for SMV) is pro in the standard methodology. In the Selective Joint Search vided by the embodiment at the same speech quality as SMV. procedure the search is restricted to six tracks in the code Moreover, the embodiment provides higher speech quality at book. These six tracks correspond to the pulses that contribute the same channel density as EVRC. least to a Cost function that is maximiZed When the error Reduced complexity ?xed codebook search is based on function is minimiZed. The error function is based on a mean 50 Selective Joint Search as taught herein, compared to the squared error criterion. higher complexity of ?xed codebook search in SMV. In the SMV standard approach, high-complexity searches for best Using this search method embodiment reduces the compu sub-codebook and best pulse positions are used. In an tational complexity of the ?xed codebook search by around embodiment, a loW complexity intelligent search best 50% Without affecting the perceptual quality With respect to 55 guesses the pulse tracks for re?nement. Also, the remarkable standard SMV decoded speech. Selective Joint Search provides a simpler procedure to ?nd Rate 1 Voiced-Non-Stationary (Type 0): the best pulse position. Standard SMV Methodology: SMV Full Rate 1 uses three FIG. 6 shoWs an error function epsilon as a composite data (3) sub-codebooks in this case. One of the three sub-code structure or function of target signal Tg, gain g, ?lter matrix H, books that best models the present secondary excitation is 60 and excitation vector c. The error function is the mean square chosen. “Secondary excitation” herein refers to excitation of the difference signal 460 (recall subtractor 420 of FIG. 4) pulses Which Would be a best selection to drive the ?lter in produced as the subtraction difference betWeen the target block 410 to approximate the target signal Tg. “Secondary” signal T8 and the approximation of the codebook pulses refers to block 410 being coupled second electronically after excited ?lter(s). (The error function someWhat resembles block 480 in FIG. 4. In order to determine the best sub 65 error variance, also knoWn as mean square of residuals, as codebook, a single pulse search procedure is adopted for all used in the terminology of regression analysis in statistics, but the three sub-codebooks. here a very rapidly occurring time series of data comprised in US 7,788,091 B2 15 16 the frame is involved.) That approximation is represented by matrix multiplication product “g H c” in FIG. 6, Where c is the excitation vector including the several the pulses pi, H is an Ne2 Nil (3B) impulse response matrix representing the ?lter(s), and g is a gain or multiplier. For purposes of FIG. 6, codebook search involves proper selection of the pulses pl. that, summed together, compose the In Words, Equation (3B) represents a process of squaring column vector c. (The impulse response ?lter matrix H is many quantities identi?ed in the output of Filter matrix H loWer-triangular When backward pitch enhancements are When excited With a sum of pulses at pulse positions pl folded into the code-vector. The impulse response matrix is selected from a codebook and making up code-vector c of not necessarily loWer-triangular When backward pitch FIG. 6. Note that Equation (3B) uses the symbol “p,” to enhancements are folded into the impulse response matrix.) represents the numerical position of the singleton one (1) Here the approach is to break up vector c into a single pulse pl surrounded by Zeroes in a corresponding pulse vector pl- of (loWer right one “1” in column of Zeroes) added to a vector of FIG. 6. Since FIG. 6 illustrates a pulse vector, and Equation everything else (“c-”) that may have so far resulted from (3B) uses the scalar numerical position of the singleton one codebook search to determine vector c. The “c-” vector cor (1) in that pulse vector to index into the Phi Matrix, so that the respondingly has a Zero in the roW entry Where single pulse pl use of the same symbol pl. facilitates description of this pro has a one (1). The roWs of vector c correspond to pulse cess. positions. This squaring process in Equation (3B) produces a sum of Much of this discussion is devoted to improving the pro 20 many squared terms represented by the ?rst summation (at cess of searching to ?nd hoW many “ones” (or pulses) should left) over Phi on various values in its main diagonal. Added to be entered into Which roWs (estimated pulse positions) of the left sum, there folloWs on the right in Equation (3B) a vector c. double summation of many cross-product terms betWeen the To reduce the computational complexity, some embodi linear ?lter H impulse responses to the various pulses. In other ments perform the search using the Co st function epsilon tilde 25 Words, the double summation sums up various off-diagonal as a goodness of ?t metric. Instead, of squaring many differ values in the Phi Matrix. Since the autocorrelation compactly ences, the processor is operated to generate a bit-representa provides the various terms, the Phi Matrix is quite useful tion of a number and then square it to obtain a numerator, and herein. then computes a bit-representation of a denominator number In Equation (3B), the letter N represents the number of and then performs a division of the numerator by the denomi 30 pulse vectors in code-vector c of Equation 1. nator. A goal in Fixed Codebook search is to minimiZe the epsilon The pulses can have either a positive (+) or negative (—) sign S. Such pulse signs are included in the pulse combination (error function) in the equation (1) represented by vector c. Thus, vector c contains the sign information. The sign information is unnecessary to the com 35 Alternatively this is equivalent to maximiZing epsilon tilde putation of Phi Matrix. The sign information is included as folloWs. Epsilon tilde is an example of What is called a during computation of denominator ||y2|| Which is described in “Cost function” herein. Equation (3B). Since SMV also adds pitch enhancements, the symbols S1- and Sj used in SMV are suitably used as a multi

40 plier inside the double-summation of Equation (3B). In such case, letter S represents the Sign value of plus one (+1) or minus one (—I) corresponding to the plus or minus value of a pre-computed product (bTgpi) of the target signal Tg by ?lter matrix H by a particular pulse pi. Substituting symbols bTg (HTTg)T and yIHc, also yields 45 In some embodiments as described herein, the process of the form: generating the autocorrelation matrix Phi Matrix 4) via Equa tion (3B) for use in obtaining the Cost function via Equation (3A), and using Phi Matrix anyWhere else that Phi Matrix is suitably used, is greatly simpli?ed and thereby processing is 50 made sWifter and more ef?cient. In this Way, generating and maximiZing the Cost function epsilon tilde is greatly facili tated. The advantages are even more critical When a voice In some of the ?xed codebook search embodiments herein, coding feature called Pitch Enhancement is used, as described the Cost function epsilon tilde g is maximiZed. Maximizing elseWhere herein. Still further improvements are also herein that Cost function is computationally simpler than and 55 described for processes generating data structures When Pitch equivalent to minimiZing the error function e itself. In the Enhancement is used. description herein, the term “Cost function” is used to refer to The improvements taught herein have a domino effect of a degree of approximation for improving and increasing voice making processing sWifter and more ef?cient for the voice coding quality. The term “Cost function” is not herein refer coder as a Whole. An ultimate result is that cell telephones and ring to ?nancial or monetary expense nor to technological 60 other Wireless telecommunications devices using the embodi complexity, any of Which can be reduced by the improve ments operate With comparable voice quality, and save poWer ments herein even though the Cost function is increased. consumption due to voice coding and voice codec operation, Maximizing Cost function epsilon tilde is described next burden the processor less, increase channel density, and make and elseWhere herein. Note that generating the denominator processor time available for other applications. ||y2|| is an important part of the processing. The process of 65 Before describing Pitch Enhancement and generating the generating the denominator ||y2|| involves an autocorrelation autocorrelation matrix Phi Matrix 4), this description ?rst matrix called Phi Matrix 4). describes, Without limitation methods by Which the Cost US 7,788,091 B2 17 18 function epsilon tilde is maximized after it is generated. More on some selection of loWer-contribution track(s) suitably detail on these methods is provided in the co-?led U.S. non guides the processor operations. provisional patent application TI-38348 “Methods, Devices Accordingly, applying the important selection method of and Systems for Improved Codebook Search for Voice “least contribution” as taught herein comprehends a variety of Codecs” Ser. No. 11/231,643, Which is hereby incorporated alternative embodiments of operational methods Which may herein by reference. involve selecting a highest or loWest value of a function With In ?xed codebook FCB search, ?nding the best combina track omitted, or a highest or loWest value of a difference tion of pulse positions in tracks Which maximize the Cost related function betWeen values With none, feWer and more function g is more important than, ?nding the combination of subset(s) of track(s) omitted. individual best pulses from each track T. In the Selective Joint Search approach herein, the contribution C(Tx) from a par Pitch Enhancement and Autocorrelation ticular track Tx is de?ned, for one example and one type of SMV uses pitch enhancement for the ?xed codebook FCB method embodiment, as the difference in Cost function g after in order to increase the speech quality. Some SMV-based eliminating the candidate pulse position from the initial state terms are described next. A “main pulse” is a pulse at a before Selective Joint Search. For example, let x,y,Z,W be position selected from a list in a pulse codebook. “Pitch candidate pulse positions from different tracks Tx, Ty, TZ, TW enhancement” refers to insertion of one or more additional before the start of selective joint search. The overall Cost pulses before or after the main pulse in a subframe in a function is €(X,y,Z,W). The contribution C of position x to the manner repeating the main pulse and spaced from the main Cost function is de?ned as pulse or nearest one of the additional pulses by an interval 20 equal to an integer number called the “pitch lag” of the sub frame. The integer (INT) Pitch (P) lag (loWer case ell “l”) is symboliZed 1PINT. “Forward pitch enhancement” inserts the one or more additional pulses after the main pulse. “Back Ward pitch enhancement” inserts the one or more additional 25 pulses before the main pulse. CELP (Code Excited Linear Prediction) based codecs can use some form of pitch enhancement for the ?xed codebook excitation. In some CELP codecs, forWard pitch enhance NoW if Cx is highest among Cx, Cy, CZ, CW, then elimi ment is used and not backWard pitch enhancement. SMV uses nating candidate pulse position x Will result in high error. In 30 both forWard pitch enhancement and backward pitch other Words, the candidate pulse position x is already Well enhancement to increase the speech quality. The computa ?tted With other selected pulse positions to minimize the tional complexity increases signi?cantly With increased error, that is, deliver a highest possible value of the Cost backWard pitch enhancements. The improved methods herein function g. Hence, this track Tx containing candidate pulse cut doWn this higher computational complexity by position x need not be re?ned. If, for another instance, con 35 approaches Which do not need to adversely affect the percep tribution CZ is least, then re?ning the track TZ containing tual speech quality. pulse position Z is expected to improve the Cost function g in The Selectable Mode Vocoder (SMV) uses a subframe a manner Which best combines or gels With other candidate strategy to encode the pitch and secondary excitation. SMV pulse positions to give high Cost function measure €(x,y,Z',W) uses variable subframe length (also called subframe siZe), Where Z' is candidate pulse position re?ned from the track 40 based on the speech classi?cation Type. Subframe length is same as Z. (Symbol prime (') on a pulse letter here represents symboliZed L SF (Which is not to be confused With the symbol re?nement.) LSF for line spectral frequency). Note that any selecting the “least contribution” can be A particular embodiment described herein is associated With the encoder When the analysis subframe siZe L SP is 53 or accomplished using any data structure or function that either 45 increases as the differences of Equations (4) increase or, 54 samples. The SMV speech codec chooses sub-frame siZes alternatively, decreases as the differences of Equations (4) 53 or 54 for Rate 1/2 Type 1 (voiced stationary) speech frames. increase. The choice of this sub-frame siZe increases the computational Still another example recogniZes that the Cost function complexity of the search algorithm. value €(X,y,Z,W) is the same in all the difference Equations (4). 50 When subframe length L SP is 53/54 and the pitch lag 1PINT Accordingly, in this example, operations in the processor is small (17 or 18), SMV inserts up to a maximum of three suitably select ?rst for re?nement the track T (or track pair as backWard enhanced pulses With exponentially decaying the case may be) that corresponds to the highest value of in a amplitudes. It is noted herein that under these circumstances set of Cost function values {€(x,y,Z), €(W,y,Z), €(W,X,Z), the contribution of the last enhancement pulse is very mini €(W,x,y)} When the pulse having the pulse position from that 55 mal. Hence, this pulse contribution can be advantageously track is omitted. and effectively removed for sub-frame siZe 53/54 With loW pitch lag values. An improvementAspect 1 herein called Conditional Elimi (5) nation BackWard Pitch Enhancement, for Which an example The selection of Equation (5) is made because the track Ts, 60 is just given, reduces the computational complexity in calcu When omitted, is revealed to have been making the least lation of energy correlations (compare Phi Matrix (I) for gen contribution because the Cost function value With that track erating the denominator ||y2|| for Cost function epsilon tilde) Ts omitted is the highest of any of the Cost function values for impulse response used in ?xed codebook search. The even though that track Ts is omitted. Also, in some embodi improvement is different and advantageous, among other rea ments the re?nement of tracks occurs in rigorous order of 65 sons, because conditional elimination of backWard pitch least contribution, and in other embodiments as simulation enhancement for certain speci?c cases of speech simpli?es tests may suggest, another approximately-related order based backWard pitch enhancement processing substantially. US 7,788,091 B2 19 20 The Conditional Elimination Backward Pitch Enhance FIG. 8 depicts the How of an improved method embodi ment method described herein remarkably achieves fully ment here. Operations in a step 810 calculate the impulse comparable voice quality by an advantageously approximate response of the Weighted synthesis ?lter of the ?xed code approach for backward pulse enhancement using only up to book loop 410 (FIG. 4.) tWo pitch enhancement pulses. Ef?cient pre-computation Next, pitch enhancement of the ?lter impulse response is With overlaid memory usage hence effectively and further performed in a step 820 using forWard and backWard pitch reduces computer burden Without any memory penalty. enhancements, providing a number of backWard enhance The complexity of the search procedure in standard half ments that is an integer given as greatest integer less than or rate SMV for Type 1 frames is very high, because it involves equal (integer function “INT( )”) to the ratio of Subframe SiZe complex conditional logic in the search procedure. An divided by integer Pitch lag delivered to block 360 of FIG. 3 improved method embodiment described herein uses pre among the control parameters CTRL. computed correlations of the impulse response and an PmaXIINT (LSP/ZPINT) (6) improvement called Incremental Generation. This Pre-com puted Correlations and Incremental Generation, or Aspect 2, Then in FIG. 8, a decision step 830 determines Whether the improvement is used in various pitch enhancement embodi subframe siZe LSF has been selected to be 53 or 54 (i.e., a 160 ments independently of Whether Aspect 1 or Conditional sample subframe is divided in thirds of 53, 53, and 54 Elimination BackWard Pitch enhancement is used or not. samples). This Pre-computed Correlations and Incremental Genera If yes, then operations branch to a step 840 and there limit tion improvement advantageously reduces the number of 20 the number of backWard pitch enhancements to the lesser of Multiply Accumulates (MACs) up to 25% in the computation tWo (2) or Pmax from Equation (6). This is What is meant in of impulse response energy correlations Phi Matrix. The FIG. 8 by the notation Pmax:min(2, Pmax). (“min” stands for usage of Pre-computed Correlations and Incremental Genera the minimum.) In this Way, the case of three pitch enhance tion contributes up to 10%, in the computation of impulse ments otherWise permitted by standard SMV is prevented response energy correlations Phi Matrix (I), (3 MIPS in one 25 from occurring When the pitch lag is a third or less of the application and currently-typical clock frequency) for addi subframe siZe. tional process simpli?cation and computational savings. In general, various embodiments of this Conditional Elimi Among its other advantages, the improved method reduces nation BackWard Pitch Enhancement method establish a the computational complexity of impulse response energy maximum number (e.g., 2) backWard pitch enhancements Q correlations by around 25% Without affecting the quality 30 When the ratio of subframe siZe to integer pitch lag is equal to compared to the standard SMv. The improvements provide or greater than (Q+1), i.e., the ratio equals at least one more greater channel density at the same voice quality as SMV, and than the maximum number of backWard pitch enhancements. moreover provide at least as much channel density as another After step 840 When subframe siZe is 53/54, (or also after standard called EVRC but at higher voice quality. step 830 When subframe siZe is not 53/54), operations pro Summarizing some of the improved method aspects 35 ceed to a step 850. herein: Step 850 performs pitch enhancement using the forWard Limit the backWard pitch enhancement to a maximum of and backWard enhancements. In this improved Way, there only tWo exponentially decaying amplitudes When the sub results the pitch-enhanced ?lter impulse response HPP'" of frame length is 53/ 54 or otherWise more than three times the 40 hereinbeloW Equation (7A) for use in ?xed codebook FCB pitch lag. search. Pre-compute the impulse response correlations Phi Matrix Moreover, the Conditional Elimination Pitch Enhance to eliminate redundant computation. ment improvements are advantageously combined With the Reduce the number of Multiply Accumulates up to 25% by improvements to codebook searching disclosed in co-?led Incremental Generation of impulse response correlations by 45 application 11/231,643 to yield still further improved meth dividing Phi Matrix into special regions Where double nested ods, devices and systems for pitch enhancement and code loop processing is applicable and then executing the double book search for voice codecs.Another embodiment combines nested loop processing. embodiments in 11/231,643 for Rate 1 With an embodiment Obviate and eliminate signi?cant amounts of control code herein for Rate 1/2 stationary voiced (Type 1) frames. Thus, by the improved process of supplying values of Phi Matrix in 50 the inventive embodiments are applied in tWo different Rate regions by Incremental Generation. paths of a combined process. Advantageously, this provides a complexity reduction. In other Words, the 11/231,643 Selec FIG. 7 depicts a How of conventional SMV pitch enhance tive Joint Search improvements to codebook searching, and ment. Conventional SMV Pitch Enhancement is described at the Conditional Elimination Pitch Enhancement improve the 3GPP2 C.20030-0 Version 2.0 “Selectable Mode Vocoder 55 ments are allocated to different paths and this allocated struc Service Option for Wideband Spread Spectrum Communica ture provides improvements that are balanced and allocated tion Systems” document in sections 5.6.11.4 and 5.6.11.5 over plural rates in a voice codec. Advantageously, the Pre Which sections are incorporated herein by reference. Computed correlations and Incremental Generation improve In FIG. 7, at step 710, calculation of the impulse response ment is applied over both the Full Rate 1 and Half Rate (1/2) of the Weighted synthesis ?lter of the ?xed codebook loop 410 60 modes. (FIG. 4) occurs. Complexity of codebook searches in general is reduced. Next, in a step 720 pitch enhancement of the ?lter impulse Having a conditionally-limited number of backWard pitch response is performed using forWard and backWard pitch enhancements results in feWer non-causal impulse response enhancements. The number of backWard enhancements is an vectors used in the computation of the impulse response integer given by Pmax:(Int) Subframe SiZe/Pitch Lag. 65 correlations matrix (autocorrelation Phi Matrix). The com Then in a step 730, there results the pitch-enhanced ?lter plexity of computing impulse response correlation increases impulse response for use in ?xed codebook search. exponentially With number of backWard pitch enhancements.