(12) United States Patent (10) Patent No.: US 7,657.427 B2 Jelinek (45) Date of Patent: Feb
Total Page:16
File Type:pdf, Size:1020Kb
USOO7657427B2 (12) United States Patent (10) Patent No.: US 7,657.427 B2 Jelinek (45) Date of Patent: Feb. 2, 2010 (54) METHODS AND DEVICES FOR SOURCE FOREIGN PATENT DOCUMENTS CONTROLLED VARIABLE BITRATE WIDEBAND SPEECH CODING JP O8-305398 11, 1996 (75) Inventor: Milan Jelinek, Sherbrooke (CA) (Continued) (73) Assignee: Nokia Corporation, Espoo (FI) OTHER PUBLICATIONS c - r Tammi, M., et al., “Signal Modification For VoicedWideband Speech (*) Notice: SupEyssessity Coding And Its Application For IS-95 System”, IEEE 2002, pp. U.S.C. 154(b) by 768 days. 35-37. (Continued) (21) Appl. No.: 11/039,539 Primary Examiner Matthew J Sked (22) Filed: Jan. 19, 2005 (74) Attorney, Agent, or Firm Harrington & Smith, PC (65) Prior Publication Data (57) ABSTRACT US 2005/O177364 A1 Aug. 11, 2005 Speech signal classification and encoding systems and meth Related U.S. Application Data ods are disclosed herein. The signal classification is done in three steps each of them discriminating a specific signal class. (63) Ry o: lication No. PCT/CAO3/O1571, First, a voice activity detector (VAD) discriminates between ed on Oct. 9, active and inactive speech frames. If an inactive speech frame (51) Int. Cl is detected (background noise signal) then the classification Gioi iA06 (2006.01) chain ends and the frame is encoded with comfort noise GOL 9/02 (200 6. 01) generation (CNG). If an active speech frame is detected, the GOL 9/12 (200 6. 01) frame is subjected to a second classifier dedicated to discrimi 52) U.S. C. 704/208. 704/214: 704/221: nate unvoiced frames. If the classifier classifies the frame as (52) U.S. Cl. ....................... s s 704/22 9 unvoiced speech signal, the classification chain ends, and the frame is encoded using a coding method optimized for (58) Field t list Seash - - - - - - - - - -hhi - - - - - - - None unvoiced signals. Otherwise, the speech frame is passed ee application file for complete search history. through to the “stable voiced classification module. If the (56) References Cited frame is classified as stable voiced frame, then the frame is encoded using a coding method optimized for stable voiced U.S. PATENT DOCUMENTS signals. Otherwise, the frame is likely to contain a non-sta 5,911,128 A 6/1999 DeJaco ....................... TO4,221 tionary speech segment Such as a voiced onset or rapidly 6,360,199 B1 3/2002 Yokoyama .................. 704/214 evolving Voiced speech signal. In this case a general-purpose 6,604,070 B1* 8/2003 Gao et al. ................... TO4/222 speech coder is used at a high bit rate for Sustaining good 6,961,698 B1 * 1 1/2005 Gao et al. ................... TO4,229 Subjective quality. 7,472,059 B2* 12/2008 Huang ........................ TO4/220 (Continued) 12 Claims, 12 Drawing Sheets - f Voice Activity CNG encoding Detected or DTX es 8 Unvoiced Unvoiced speech Frame? optimized encoding f Voiced speech optimized encoding f fa Generic speech encoding US 7,657.427 B2 Page 2 U.S. PATENT DOCUMENTS Jelinek, M., et al., “Advances In Source-Controlled Variable Bit Rate Wideband Speech Coding. Special Workshop in Maui, Lectures by 2002.0099.548 A1 7/2002 Manjunath et al. .......... TO4/266 Masters. In Speech Processing, Jan. 2004, pp. 1-8. 2002.0143527 A1* 10, 2002 Gao et al. ... 704/223 Das et al., “Variable Dimension Spectral Coding of Speech at 2400 FOREIGN PATENT DOCUMENTS bps and Below with Phonetic Classification'. Acoustics, Speech, and Signal Processing, 1995, ICASSP-95, 1995 International Conference WO WO-96,04646 A1 2, 1996 of Detroit, MI, USA May 9-12, 1995, New York, NY, USA, IEEEUS, WO WO96,05592 2, 1996 May 9, 1995, pp. 492-495, XPO10625277 ISBN: 0-7803-2431-5. WO WO-01/22402 A1 3, 2001 Wang et al., “Phonetically-Based Vector Excitation Coding of OTHER PUBLICATIONS Speech at 3.6 kbps'. International Conference on Acoustics, Speech, and Signal Processing ICASSP 1989, May 23, 1989, pp. 49-52, Cellario, L., et al., “CELP Coding At Variable Rate”, European Trans XPO10O83.193. actions On Telecommunications and Related Technologies, vol. 5, No. 5, Sep.1994, pp. 69-79. * cited by examiner U.S. Patent US 7,657.427 B2 (77 U.S. Patent Feb. 2, 2010 Sheet 2 of 12 US 7,657.427 B2 f(0 A04 f(22 Voice Activity No CNG encoding Detected? Or DTX Yes 106 f(6 Unvoiced Yes Unvoiced speech Frame? Optimized encoding No 112 f f(0 Yes Voiced speech Optimized encoding 114 Generic speech encoding First F1 U.S. Patent Feb. 2, 2010 Sheet 3 of 12 US 7,657.427 B2 00? ET–EEPET-- U.S. Patent Feb. 2, 2010 Sheet 4 of 12 US 7,657.427 B2 300 , Pitch cycle --909 Search 310 Operation Successful? Delay contour Selection 306 Stable Voiced Operation Full-rate low bit rate Successful? generic COding Coding Pitch-synchronous modification Yes Operation Successful? FEEF 4. U.S. Patent US 7,657.427 B2 U.S. Patent Feb. 2, 2010 Sheet 6 of 12 US 7,657.427 B2 600 f02 - Voice Activity No Detected? Yes 404 Unvoiced Yes Frame? Stable Voiced Yes Frame? No 5/2 No LOW energy frame? Yes - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 604 a06 Generic Generic Half-Rate Half-Rate Full-Rate Half-Rate Voiced Unvoiced Coding and Guantization - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - U.S. Patent Feb. 2, 2010 Sheet 7 of 12 US 7,657.427 B2 600 f02 - Voice Activity Detected? Unvoiced Frame? No1 LOW energy frame? - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 604 6/4 606 606 604 t Generic Generic Half-Rate w Full-Rate Half-Rate Unvoiced HR Unvoiced QR CNGER Coding and Quantization - - - - - - - - - - - - - - - - - - - - -n - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - U.S. Patent US 7,657.427 B2 004 30/ U.S. Patent Feb. 2, 2010 Sheet 9 of 12 US 7,657.427 B2 600 f(3 - Voice Activity No Detected? Yes 106 Unvoiced Yes Frame? No 6O2 Yes - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - Af4 606 40? 402 Voiced HR Unvoiced HR CNG ER: Coding and Quantization - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - U.S. Patent Feb. 2, 2010 Sheet 10 of 12 US 7,657.427 B2 -900 Yes f(6 Unvoiced Yes Frame? 6O2 No V/UY No ff0 Transition? Yes Yes - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 674 6,06 ?(6 604 402 Generic Half-Rat Half-Rate Unvoiced HR Unvoiced QR CNC ER Coding and Quantization - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - U.S. Patent US 7,657.427 B2 03% L --------------- U.S. Patent Feb. 2, 2010 Sheet 12 of 12 US 7,657.427 B2 0007 900/. 69007 -1 --- r --------------- r------------------------- [9]'No.10) ISHT?IS/VIVO US 7,657,427 B2 1. 2 METHODS AND DEVICES FOR SOURCE In wireless systems using code division multiple access CONTROLLED VARIABLE BITRATE (CDMA) technology, the use of source-controlled variable bit WIDEBAND SPEECH CODING rate (VBR) speech coding significantly improves the system capacity. In source-controlled VBR coding, the codec oper CROSS REFERENCE TO RELATED ates at several bit rates, and a rate selection module is used to APPLICATION determine the bit rate used for encoding each speech frame based on the nature of the speech frame (e.g. Voiced, This application is a continuation of International Patent unvoiced, transient, background noise). The goal is to attain Application No. PCT/CA2003/001571 filed on Oct. 9, 2003. the best speech quality at a given average bitrate, also referred 10 to as average data rate (ADR). The codec can operate at FIELD OF THE INVENTION different modes by tuning the rate selection module to attain different ADRs at the different modes where the codec per The present invention relates to digital encoding of Sound formance is improved at increased ADRs. The mode of opera signals, in particular but not exclusively a speech signal, in tion is imposed by the system depending on channel condi view of transmitting and synthesizing this sound signal. In 15 tions. This enables the codec with a mechanism of trade-off particular, the present invention relates to signal classification between speech quality and system capacity. and rate selection methods for variable bit-rate (VBR) speech Typically, in VBR coding for CDMA systems, an eighth coding. rate is used for encoding frames without speech activity (si lence or noise-only frames). When the frame is stationary BACKGROUND OF THE INVENTION Voiced or stationary unvoiced, half-rate or quarter-rate are used depending on the operating mode. If half-rate can be Demand for efficient digital narrowband and wideband used, a CELP model without the pitch codebook is used in speech coding techniques with a good trade-off between the unvoiced case and a signal modification is used to enhance the Subjective quality and bit rate is increasing in various appli periodicity and reduce the number of bits for the pitch indices cation areas such as teleconferencing, multimedia, and wire 25 in Voiced case. If the operating mode imposes a quarter-rate, less communications. Until recently, telephone bandwidth no waveform matching is usually possible as the number of constrained into a range of 200-3400Hz has mainly been used bits is insufficient and some parametric coding is generally in speech coding applications.