(12) United States Patent (10) Patent No.: US 8,190,425 B2 Mehrotra Et Al

US008190425B2

(12) United States Patent (10) Patent No.: US 8,190,425 B2 Mehrotra et al. (45) Date of Patent: May 29, 2012

(54) COMPLEX CROSS-CORRELATION 5,539,829 A 7/1996 Lokhoffetal. PARAMETERS FOR MULTI-CHANNEL 5,559,900 A 9/1996 Jayantetal AUDIO 5,581,653 A 12/1996 Todd 5,623,577 A 4/1997 Fielder 5,627,938 A 5/1997 J hnt (75) Inventors: Sanjeev Mehrotra, Kirkland, WA (US); 5,629,780 A 5/1997 vgatsign Wei-Ge Chen, Sammamish, WA (US) 5,632,003 A 5/1997 Davidson et a1. 5,635,930 A 6/1997 OikaWa (73) Assignee: Microsoft Corporation, Redmond, WA 5,636,324 A 6/1997 Teh et al' (US) (Continued) ( * ) Notice: Subject to any disclaimer, the term of this FOREIGN PATENT DOCUMENTS patent is extended or adjusted under 35 EP 0597649 5/1994 U.S.C. 154(b) by 1040 days. (Continued) (21) Appl. No.: 11/336,403 OTHER PUBLICATIONS (22) Filed: Jan- 20: 2006 Advanced Television Systems Committee, ATSC Standard: Digital Audio Com ression AC-3 ,Revision A, 140 . 1995 . (65) P.rlor Pbrt.u lca 1011 Dta a P ( ) PP< ) Us 2007/0172071 A1 Jul. 26, 2007 (Commued) (51) Int Cl Primary Examiner * Michael N Opsasnick

(52) US. Cl...... 704/203 57 0rABSTRACT i Sparkman, (58) Field of Classi?cation Search ...... 704/203, ( ) 7()4/5()(L5()4 An audio encoder encodes a combined channel (e.g., a sum See application ?le for complete search history. Channel) for a group of Plural Physical audio Channe15~ The encoder determines plural parameters for representing indi (56) References Cited vidual physical channels of the group as modi?ed versions of the encoded combined channel. The plural parameters com US. PATENT DOCUMENTS prise ratios of poWer in each individual channel to poWer in 4,251,688 A * 2/1981 Furner ...... 381/18 the combined channel (e.g., a ratio of the poWer of a right 4,464,783 A 8/1984 Beraudet a1. channel to the poWer of the combined channel, and a ratio of 4,713,776 A 12/1987 Araseki the poWer of the left channel to the poWer of the combined 4,907,276 A 3/1990 Aldersberg channel). The plural parameters can include a complex 4,953,196 A 8/1990 Ishikawa et a1. 5,079,547 A 1/1992 Fuchigama et a1. parameter. The combined channel and the plural parameters 5,260,980 A 11/1993 Akagiriet a1. facilitate reconstruction at the audio decoder of source chan 5,274,740 A 12/1993 Davis et a1. nels. An audio decoder performs a forWard complex trans 5,285,498 A 2/1994 Johnston form on the multi-channel audio data and reconstructs plural 5,388,181 A 2/1995 Anderson et a1. channels from the multi-channel audio data. The decoder can 5,455,888 A 10/1995 Iyengar et a1. 5,473,727 A 12/1995 Nishiguchiet a1. maintain second-order statistics for the source channels. 5,487,086 A 1/1996 Bhaskar 5,524,054 A 6/1996 Spille 28 Claims, 21 Drawing Sheets

Audm encoder MID

rm: 'm. modeler u

Ran/qua cnnnolln US 8,190,425 B2 Page 2

US. PATENT DOCUMENTS 2004/0078194 A1 4/2004 Liljeryd et a1. 5,661,755 A 8/l997 Van De Kerkhof et a1‘ 2004/0225505 A1 11/2004 Andersen et al. . 2004/0267543 A1 12/2004 OJanpera 5,661,823 A 8/1997 Yamauch1 et al. 2005/0065780 A1 3/2005 W1ser- et a1. 5,682,152 A 10/1997 Wang et a1.. 2005/0165611 A1 7/2005 Mehrotra et al. 5,686,964 A 11/1997 Tabatabai et al. 2005/0246164 A1 11/2005 Ojala- et a1. 5,701,346 A 12/1997 Herre. et al. 2005/0267763 A1 12/2005 OJanpera- 5,745,275 A * 4/1998 G1les et al...... 398/94 2006/0013405 A1 1/2006 Oh et al. 5,790,759 A 8/1998 Chen 2006/0106619 A1 5/2006 Iser et al. 5,812,971 A 9/1998 Herre 2006/0259303 A1 11/2006 Bak1s- 5,822,370 A 10/1998 Graupe. 2007/0081536 A1 4/2007 Klm- et a1. 5,835,030 A 11/1998 Tsutsu1 et a1. 2007/0112559 A1 5/2007 Schu1Jers- et al. 5,845,243 A 12/1998 Smart. 2007/0140499 A1 6/2007 Dav1s- 5,890,125 A 3/1999 Dav1s et a1. - 5 956 674 A 9/l999 S t 31 2007/0168197 A1 7/2007 Vasllache 5,960,390 A 9/l999 Uglgghefal ' 2007/0172071 A1 7/2007 Mehrotra et al. 539693750 A “V1999 Hsieh et a1‘ 2007/0174062 A1 7/2007 Mehrotra et al. 5,974,380 A 10/1999 Smyth et a1~ FOREIGN PATENT DOCUMENTS 5,995,151 A 11/1999 Naveen et al. 6,029,126 A 2/2000 Malvar EP 0663740 7/ 1995 6,041,295 A 3/2000 Hinderks EP 0669724 8/ 1995 6,058,362 A 5/2000 Malvar EP 0910927 4/ 1999 6,064,954 A 5/2000 Cohen et a1. EP 0 924 962 6/ 1999 6,104,321 A 8/2000 Akagiri EP 0931386 7/ 1999 6,115,688 A 9/2000 Brandenburg et al. EP 1175030 1/2002 6,115,689 A 9/2000 Malvar EP 1408484 4/2004 6,182,034 B1 1/2001 Malvar EP 1617418 1/2006 6,205,430 B1 3/2001 Hui W0 WO 99/43110 8/ 1999 6,212,495 B1 4/2001 Chihara W0 WO 00/36754 6/2000 6,226,616 B1 50001 You W0 WO 00/79520 12/2000 6,240,380 B1 5/2001 Malvar W0 WO 02/43054 5/2002 6,249,614 B1 6/2001 Kolesnik et a1. W0 WO 2004008805 1/2004 6,341,165 B1 1/2002 Gbur et al. W0 WO 2004/008806 1/2004 6,353,807 B1 3/2002 Tsutsui et a1. W0 WO 2005/098821 10/2005 6,370,128 B1 4/2002 Raitola W0 WO 2006000842 1/2006 6,393,392 B1 5/2002 Minde 6,418,405 B1 7/2002 Satyamurti et a1. OTHER PUBLICATIONS 211311113 a1‘ Beerends, “Audio Quality Determination Based on Perceptual ‘Mea 6,473,561 B1 [0/2002 Heo surement Technlques,” Appl1cat1ons of D1g1tal S1gnal Process1ng to 6,496,798 B1 12/ 2002 Huang et al. Audio and Acoustics, Chapter 1, Ed. Mark Kahrs, KarlheinZ 6,499,010 B1 l2;2002 Faller l Brandenburg, Kluwer Acad. Publ., pp. 1-38 (1998). 2233’ 1% i311? e53 ' Bosi et al., “ISO/IEC MPEG-2 Advanced Audio Coding,” Journal of 6,697,491 B1 * 2/2004 Grijesrizlger ““““““““““ “ 381/20 the Audio Engineering Society, Audio Engineering Society, vol. 45, 6,704,711 B2 3/2004 Gustafsson et a1. N9 10,1311 789-812 (1997) 6,708,145 B1 3/ 2004 Liljeryd et al. Brandenburg, “ASPEC Coding”, AES 10th International Confer 6,738,074 B2 5/2004 Rao et al. ence, pp. 81-90 (1991). 6,760,698 B2 7/2004 G30 Caetano et al., “Rate Control Strategy for Embedded Wavelet Video 6’766’293 Bl 7/2004 Herre Coders,” Electronics Letters, pp. 1815-1817 (Oct. 14, 1999). 6,771,777 B1 8/2004 Gbur et al. . “ . ,, . 6 774 820 B2 * 8/2004 Craven et 31 341/50 Dav1s, The AC-3 Mult1channel Coder, Dolby Laborator1es, 9 pp. 638363761 Bl 0/2004 Kawashima 'et' """""" " (Downloaded from the World Wide Web on Aug. 15, 2002). 6,934,677 B2 8/2005 Chen et 31, De Luca, “AN1090 Application Note: STA013 MPEG 2.5 Layer 111 6,940,840 B2 * 9/2005 OZluturk et al...... 370/335 Source Decoder,” STMicroelectronics, 17 pp. (1999). 7,027,982 B2 4/2006 Chen et al. de QueiroZ et al., “Time-Varying Lapped Transforms and Wavelet 7,050,972 B2 5/2006 HeIlIl et 61 Packets,” IEEE Transactions on Signal Processing, vol. 41, pp. 3293 7,058,571 B2 6/2006 Tsushima et al. 3305 (1993), 7,062,445 B2 6/2006 Kadatch Dolby Laboratories, “AAC Technology,” 4 pp. [Downloaded from 7’069’2l2 B2 6/2006 Tanaka et al' the web site aac-audio.com on World Wide Web on Nov. 21, 2001.]. 7,096,240 B1 * 8/2006 Absar et al. Edler et al., “ Perceptual Audio. Cod1ng. Us1ng. a T1me-Vary1ng. . L1near. 7,240,001 B2 7/2007 Chen et al...... 704/230 . ,, . . . 7,283,955 B2 100007 Lilijeryd et a1‘ Pre-Iand Post-F1lter, 1n AES 109th Convent1on, Los Angeles, Cal1 7,299,190 B2 * 11/2007 Thumpudi et al...... 704/500 fomla, 12 PP (Sell 2000) _ 73 18,03 5 B2 [/2008 Andersen et a1‘ Fraunhofer-Gesellschaft, “MPEG Audio Layer-3,” 4 pp. [Down 7,328,162 B2 2/2008 Liljeryd et a1, loaded from the World Wide Web on Oct. 24, 2001.]. 7,386,132 B2 * 6/2008 Griesinger ...... 381/18 Fraunhofer-Gesellschaft, “MPEG-2 AAC,” 3 pp. [Downloaded from 7,394,903 B2 * 7/2008 Herre et al. 381/23 the World Wide Web on Oct. 24, 2001.]. 7,502,743 B2 * 3/2009 Thumpudi et a1...... 704/500 Gibson et al., Digital Compression for Multimedia, Title Page, Con 7,519,538 B2 4/2009 Vlllemoes et a1~ tents, “Chapter 7: Frequency Domain Coding,” Morgan Kaufman 7,602,922 B2 10/2009 1397613811“ 8t a1~ Publishers, Inc., pp. iii, V-Xl, and 227-262 (1998). 2003/0009327 A1 1/ 2003 Nllsson et 31' Mark Hasegawa-Johnson and Abeer Alwan, “Speech coding: funda 2003/0050786 A1 3/2003 JaX et al. - - » - - 2003/01 15041 A1 600% Chen et a1 mentals and appl1cat1ons, Handbook of Telecommun1cat1ons, John 2003/0115042 A1 600% Chen et a1: Wiley and Sons, Inc., pp. 1-33 (2003). [available at http://citeseer.ist. 2003/0115051 A1 6/2003 Chen et al. PSu~edW617093;11_tm1]~ _ _ 2003/01 15052 A1 6/2003 Chen et a1‘ Herley et al., “T1l1ngs of the T1me-Frequency Plane: Construct1on of 2003/0236580 A1 12/2003 Wilson et a1, Arbitrary Orthogonal Bases and Fast Tiling Algorithms,” IEEE 2004/0044527 A1 3/2004 Thumpudi et al. Transactions on Signal Processing, vol. 41, No. 12, pp. 3341-3359 2004/0049379 A1 3/2004 Thumpudi et al. (1993). US 8,190,425 B2 Page 3

“ISO/IEC 11172-3, Information Technology4Coding of Moving Search Report for European Patent Application No. 03 020 110.7. Pictures and Associated Audio for Digital Storage Media at Up to Search Report for European Patent Application No. 03 020 111.5. About 1.5 Mbit/siPart 3: Audio,” 154 pp. (1993). Solari, Digital Video and Audio Compression, Title Page, Contents, ISO/IEC 13818-7, Information technology4Generic coding of “Chapter 8: Sound and Audio,” McGraw-Hill, Inc., pp. iii, v-vi, and moving pictures and associated audio informationiPart 7: 187-211 (1997). Advanced Audio Coding (AAC), 150 pp. (1997). Th. Sporer, Kh. Brandenburg, B. Edler, “The Use of Multirate Filter “ISO/IEC 13818-7, Information Technology4Generic Coding of Banks for Coding of High Quality Digital Audio,” 6th European Moving Pictures and Associated Audio InformationiPart 7: Signal Processing Conference (EUSIPCO), Amsterdam, vol. 1, pp. Advanced Audio Coding (AAC),” 174 pp. (1997). 211-214, Jun. 1992. “ISO/IEC 13818-7, Information Technology4Generic Coding of Srinivasan et al., “High-Quality Audio Compression Using an Adap tive Wavelet Packet Decomposition and Psychoacoustic Modeling,” Moving Pictures and Associated Audio InformationiPart 7: IEEE Transactions on Signal Processing, vol. 46, No. 4, pp. 1085 Advanced Audio Coding (AAC), Technical Corrigendum 1,” 22 pp. 1093 (Apr. 1998). (1998). Stuart et al., “Lossless Compression for DVD-Audio,” in AES 9th ITU, Recommendation ITU-R BS 1115, Low Bit-Rate Audio Cod Regional Convention Tokyo, 4 pp. (1999). ing, 9 pp. (1994). Terhardt, “Calculating Virtual Pitch,” Hearing Research, 1:155-182 ITU, Recommendation ITU-R BS 1387, Method for Objective Mea (1979). surements of Perceived Audio Quality, 89 pp. (1998). Vaidyanathan, Multirate Systems and Filter Banks, Prentice Hall J esteadt et a1 ., “Forward Masking as a Function of Frequency, Masker Signal Processing Series, Cover page, pp. 745-751 (1992). Level, and Signal Delay,” Journal of Acoustical Society of America, Van Assche et al., “Lossless Compression of Pre-Press Image Using 71:950-962 (1982). a Novel Color Decorrelation Technique,” Proc. SPIE, Very High AM. KondoZ, Digital Speech: Coding for Low Bit Rate Communi Resolution and Quality III. vol. 3308, 8 pp. (1998). cations Systems, “Chapter 3 .3: Linear Predictive Modeling of Speech Wang et al., “A Multichannel Audio Coding Algorithm for Inter Signals” and “Chapter 4: LPC Parameter Quantisation Using LSFs,” Channel Redundancy Removal,” in AES 110th Convention, John Wiley & Sons, pp. 42-53 and 79-97 (1994). Amsterdam, the Netherlands, 6pp. (May 2001). Kuo et al., “A Study of Why Cross Channel Prediction is Not Appli Wang et al., “EE225a Lecture 13: Karhunen Loeve Transform and cable to Perceptual Audio Coding,” IEEE Signal Processing Letters, Discrete Cosine Transform,” Department of EECS, University of vol. 8, No. 9, 3 pp. (Sep. 2001). California at Berkley, 10 pp. (Mar. 2002). Laaksonen, “Bandwidth extension in high-quality audio coding,” Wragg et al., “An Optimised Software Solution for an ARM Master’s Thesis, 69 pp., May 30, 2005. PoweredTM MP3 Decoder,” 9 pp. [Downloaded from the World Lufti, “Additivity of Simultaneous Masking,” Journal of Acoustic Wide Web on Oct. 27, 2001.]. Society ofAmerica, 73:262-267 (1983). Yang et al., “An Inter-Channel Redundancy Removal Approach for Malvar, “Biorthogonal and Nonuniform Lapped Transforms for High-Quality Multichannel Audio Compression,” in AES 109th Con Transform Coding with Reduced Blocking and Ringing Artifacts,” vention, Los Angeles, California, 8 pp. (Sep. 2000). appeared in IEEE Transactions on Signal Processing, Special Issue Zwicker et al., Das Ohr als Nachrichtenempfanger, Title Page, Table on Multirate Systems, Filter Banks, Wavelets, and Applications, vol. of Contents, “I: Schallschwingungen,” Index, HirZel-Verlag, Stut 46, 29 pp. (1998). tgart, pp. III, IX-XI, 1-26, and 231-232 (1967). HS. Malvar, “Lapped Transforms for Ef?cient Transform/Subband Zwicker, Psychoakustik, Title Page, Table of Contents, “Teil I: Coding,” IEEE Transactions on Acoustics, Speech and Signal Pro Einfuhrung,” Index, Springer-Verlag, Berlin Heidelberg, New York, cessing, vol. 38, No. 6, pp. 969-978 (1990). pp. II, IX-XI, 1-30, and 157-162 (1982). HS. Malvar, Signal Processing with Lapped Transforms, Artech Bier, “Digital Audio Compression: Why, What, and How,” © 2000 House, Norwood, MA, pp. iv, vii-xi, 175-218, 353-357 (1992). 2002 Berkeley Design Technology, Inc., Dec. 2, 2002, 15 pages. Meares, D.J., “Matrixed Surround Sound in an MPEG Digital Brandenburg, “MP3 and AAC Explained,” AES 17th International World,” Journal of the Audio Engineering Society, vol. 46, No. 4, 13 Conference on High Quality Audio Coding, 1999, 12 pages. pp. (Apr. 1998). Gibson et al., Digital Compression for Multimedia, Title Page, Con “Method for Objective Measurements of Perceived Audio Quality”, tents, “Chapter 8: Frequency Domain Speech and Audio Coding Rec. ITU-R BS1387 (Question ITU-R 210/10) 1998. Standards,” Morgan Kaufman Publishers, Inc., pp. 263-290 (1998). “MPEG2 Audio for DVD: the Compromise Choice,” 5 pp. (Oct. Gillespie et al., “Speech dereverberation via maximum-kurtosis sub 1996). band adaptive ?ltering,” Proc. IEEE ICASSP, 2001, pp. 3701-3704. NajafZadeh-AZghandi, Hossein and Kabal, Peter, “Perceptual coding Herre, “From Joint Stereo to Spatial Audio CodingiRecent Progress of narrowband audio signals at 8 Kbit/s” (1997), available at http:// and Standardization,” Proc. of the 7th Int. Conference on Digital citeseer.ist.psu.edu/najafZadeh-aZghandi97perceptual.html. Audio Effects (DAFx’04), 2004, pp. 157-162. OPTICOM GmbH, “Objective Perceptual Measurement,” 14 pp. Herre et al., “Intensity Stereo Coding,” presented at AES 96th Con [Downloaded from the World Wide Web on Oct. 24, 2001.]. vention, 1994, 11 pages. Painter, T. and Spanias, A., “Perceptual Coding of Digital Audio,” Puschel et al., “The Algebraic Approach to the Discrete Cosine and Proceedings of the IEEE, vol. 88, Issue 4, pp. 451-515, Apr. 2000, Sine Transforms and their Fast Algorithms,” SIAM Journal of Com available at http://www.eas.asu.edu/~spanias/papers/paper-audio puting 2003, vol. 32, No. 5, pp. 1280-1316. tedspanias-00.pdf. “Radio Engineering,” authored by KPRi-Services, Inc ., printed from Phamdo, “Speech Compression,” 13 pp. [Downloaded from the internet on Dec. 13, 2005, 3 pages. World Wide Web on Nov. 25, 2001.]. Schroeder, ‘“Colorless’ Arti?cial Reverberation,” presented at Audio Ribas Corbera et al., “Rate Control in DCT Video Coding for Low Engineering Society 12th Annual Meeting, 1960, 18 pages. Delay Communications,” IEEE Transactions on Circuits and Sys Schroeder, “Natural Sounding Arti?cial Reverberation,” presented at tems for Video Technology, vol. 9, No. 1, pp. 172-185 (Feb. 1999). the Audio Engineering Society 13thAnnual Meeting, 1961, 18 pages. Seymour Schlien, “The Modulated Lapped Transform, Its Time “Smart ProjectiAlgebraic Theorgy of Signal Processing,” http:// Varying Forms, and Its Application to Audio Coding Standards,” www.ece.cmu.edu/~smart/papers/dttaglo.html, printed from internet IEEE Transactions on Speech andAudio Processing, vol. 5, No. 4, pp. on Jun. 30, 2006, 2 pages. 359-366 (Jul. 1997). Smith, “Physical Audio Signal Processing: for Virtual Musical M. Schroeder, B. Atal, “Code-excited linear prediction (CELP): Instruments and Digital Audio Effects,” (Global Contentsi13 High-quality speech at very low bit rates,” Proc. IEEE Int. Conf pages, Allpass Filtersi2 pages, Schroeder Allpass Sectionsi2 ASSP, pp. 937-940, 1985. pages, and A Schroeder Reverberator called JCRevi2 pages) of Schulz, D., “Improving audio codecs by noise substitution,” Journal online book at http://ccrma.stanford.edu/~jos/pasp/, Center for Com ofthe AES, vol. 44, No. 7/8, pp. 593-598, Jul/Aug. 1996. puter Research in Music and Acoustics (CCRMA), Stanford Univer Search Report from PCT/US2004/024935. sity, printed from internet on Dec. 20, 2005, 19 pages. US 8,190,425 B2 Page 4

Yang et al., “Adaptive Karhunen-Loeve Transform for Enhanced Herre et al., “The Reference Model Architecture for MPEG Spatial Multichannel Audio Coding,” Proc. SPIE vol. 4475, 12 pp., Math Audio Coding,” Proc. 1 18th AES Convention, Barcelona, Spain, May ematics of Data/Image Coding, Compression, and Encryption IV, 28-31, 2005, pp. 1-13. with Applications, Mark S. Schmalz, Editor, Dec. 2001, pp. 43-54. Malvar, “A Modulated Complex Lapped Transform and its Applica tions to Audio Processing,” in Proc. IEEE Int. Conf. on Acoustics, Dietz et al., “Spectral Band Replication, a novel approach in audio Speech, and Signal Processing, Phoenix, AZ, May 1999, pp. 1-9. coding,” Preprint 5553, 112th AES Convention, Munich, 8 pages, Chen, “Low-Complexity Wideband Speech Coding,” Proceedings May 2002. IEEE Workshop on Speech Coding for Telecommunications, Sep. Ekstrand, “Bandwidth Extension of Audio Signals by Spectral Band 20-22, 1995, pp. 27-28. Replication,” Proc 1st EEE Benelux Workshop on Model based Pro Ferreira, “Perceptual Coding Using Sinusoidal Modeling in the cessing and Coding of Audio, Leuven, Belgium, Nov. 2002, pp. MDCT Domain,” Audio Engineering Society Convention Paper 73-79. 5569, 112th Convention, Munich, Germany, 10 pages, May 10-13, Kornagel, “Techniques for arti?cial bandwidth extension of tele 2002. phone speech,” Signal Processing, vol. 86, No. 6, pp. 1296-1306, Oct. Fowler, “Adaptive Vector Quantization for the Coding of Nonstation ary Sources,” SPANN Laboratory Technical Report TR-95-05, The 2005. Ohio State University, 31 pages, Apr. 1995. Lopez et al., “Software Toolbox for Multichannel Sound Reproduc Iwakami et al., “Fast Encoding Algorithms for MPEG-4 TwinVQ tion,” Proceedings of Digital Audio Effects Conference (DAFX), Audio Tool,” ICASSP '01 Proceedings oftheAcoustics, Speech, and Barcelona, Spain, Dec. 1998, 4 pp. Signal Processing, 4 pages, 2001. Search Report from PCT/US2007/000021. Jung et al., “A Bit-Rate/Bandwidth Scalable Speech Coder Based on Geiger et al., “Audio Coding Based on Integer Transforms,” AES ITU-T G.723.1 Standard,” Proceedings IEEE International Confer Convention Paper 5471, 11 1th AES Convention, NewYork, NY, Sep. ence on Acoustics, Speech, and Signal Processing, pp. 285-288, May 21-24, 2001. 17-21, 2004. Audio Codec Processing Functions; Extended AMR Wideband Najaf-Zadeh et al., “Narrowband Perceptual Audio Coding: Codec; Transcoding Functions (Release 6), 3rd Generation Partner Enhancements for Speech” Eurospeech 2001 Scandinavia, Aalborg, ship Technical Speci?cation, Sep. 2004, pp. 1-86. Denmark, Sep. 3-7, 2001, pp. 1993-1996. Autti et al., “Mobile Audioifrom MP3 to AAC and further,” Najafzadeh-Azhgandi et al., “Improving Perceptual Coding of Nar Helsinki University of Technology, Nov. 2004, pp. 1-20. rowband Audio Signals at Low Rates,” Proc. IEEE Int. Conf. on Breebaart et al., “Parametric Coding of Stereo Audio,” EURASIP Acoustics, Speech, Signal Processing (Phoenix, Arizona), pp. 913 Jour. Applied Signal Proc., Sep. 2005, pp. 1305-1322. 916, Mar. 15-19, 1999. Purnhagen, “Low Complexity Parametric Stereo Coding in MPEG Norden et al., “Companded Quantization of Speech MDCT Coef? 4,” Proc. of the 7th Int. Conference on Digital Audio Effects, Oct. cients,” IEEE Transactions on Speech and Audio Processing, vol. 13, 2004, pp. 163-168. No. 2, pp. 163-173, Mar. 2005. Schuijers et al., “Low Complexity Parametric Stereo Coding,” 116th Oshikiri et al., “A Scalable Coder Designed for 10-KHZ Bandwidth convention ofthe AES, May 2004, pp. 1-11. Speech]7 Proceedings IEEE WorkshopSpeech Coding, pp. 111-113, Moriya et al., “Extension and Complexity Reduction of TWINVQ Oct. 6-9, 2002. Audio Coder,” 1996 IEEE, pp. 1029-1032. Unno et al., “A Robust Narrowband to Wideband Extension System Soon et al., “Bandwidth Extension of Narrowband Speech Using Featuring Enhanced Codebook Mapping,” pp. 805-808, Mar. 18-23, Soft-decision Vector Quantization,” ICICS 2005, pp. 734-738. 2005. Wright, “Notes on Ogg Vorbis and the MDCT,” www.free-comp Cheng, “Statistical Recovery of Wideband Speech from Narrowband shop.com, 7 pp. (May 2003). Speech,” IEEE Trans. on Speech and Audio Processing, vol. 2, Issue Non-?nal Of?ce Action dated Aug. 31, 2009, US. Appl. No. 4, pp. 544-548 (Oct. 1994). 11/336,606, 16 pages. Ramprashad, “Stereophonic CELP Coding Using Cross Channel Breebaart et al., “MPEG Spatial Audio Coding/MPEG Surround: Prediction,” IEEE, pp. 136-138 (Sep. 2000). Overview and Current Status,” in Proc. 1 19th AES Conv., NewYork, NY, Oct. 7-10, 2005, pp. 1-17. * cited by examiner US. Patent May 29, 2012 Sheet 1 0121 US 8,190,425 B2

Communication connection(s) 170 l l Input device(s) 150 l Processing : unit(s) 110 Output device(s) 160 | I I r - - - - ‘I Storage 140 I I

Software 180 implementing audio encoder and/or decoder US. Patent May 29, 2012 Sheet 2 or 21 US 8,190,425 B2

F1 gure 2 Input audio samples 205 Audio encoder / 200

Frequency transformer 210 >

Perception Multi-channel > modeler 230 transformer 220 l Output bitstream —> Weighter 240 > Bitstream ——>295 MUX 280

—-—> Quantizer 250 >

Rate/quality controller 270 Entropggcgncoder ’ US. Patent May 29, 2012 Sheet 3 0f 21 US 8,190,425 B2

Figure 3

Audio decoder 300

Entropy decoder 320 l Inverse quantizer 330

Noise enerator Input ——> 340g + l bitstream 305 Bitstream Inverse —> DEMUX weighter 3 50 3 1 0 l

Inverse M/C _ transformer 360 l Inverse > frequency transformer 370 l Reconstructed audio 395 US. Patent May 29, 2012 Sheet 4 or 21 US 8,190,425 B2

Input audio Figure 4 samples 405

Audio M/C pre encoder 400 processor 410 l con?gurerTile 422

Windowing 420 <—

J\ v + 1 Frequency transformer 430 Perception modeler 440

——> Weighter 442 —> i Output bitstream MC trans- MUX 495 former 450 490 Y i Mixed/pure lossless coder <—> —> Quantizer 460 ——> 472 Rate/quality controller 480

Entropy encoder Entropy encoder 474 470 US. Patent May 29, 2012 Sheet 5 or 21 US 8,190,425 B2

F1 gure 5 Audio decoder / 500 Entropy —> decoder 520 ‘

Tlle con?guration decoder 530 Inverse M/C Input —> transformer 540 <———— bitstream 505 DB“ —> MUX 510 Inv. quantizer/ ‘——> inv. weighter <-—— 550 l Inv. frequency transformer 560 l . Overlapper/ <—-___> Mlxedlpure lossless adder 570 <————— decoder 522

MC post processor 580 $ Reconstructed audio 595 US. Patent May 29, 2012 Sheet 6 0f 21 US 8,190,425 B2

cow

@220o wé?u_ 652vN m$520 w@535 @520m PBwE@ US. Patent May 29, 2012 Sheet 7 0f 21 US 8,190,425 B2

Figure 7 700 % / Perform multi-channel 7 10 —" . pre-processlng

V Encode multi-channel audio data

Figure 8 4@800 Decode multi-channel 810 ~" audio data l Perform multi-channel 820 —" post-processing % US. Patent May 29, 2012 Sheet 8 of 21 US 8,190,425 B2 Figure 9 l,900 910 ~" Form combined channel(s) l 920 4‘ Derive parameter(s) for combined channel 0

Figure 10 l;1000 101 OJ Receive combined channel and parameter(s) l 1020 _,_ Scale combined channel coef?cients using parameter(s) 0 US. Patent May 29, 2012 Sheet 9 0f 21 US 8,190,425 B2

Figure 11

Combined channel 1 120 Scaling of complex coefficients Left channel 1130

Right channel 1140

Figure 12 1202 ‘1200

1210 US. Patent May 29, 2012 Sheet 10 or 21 US 8,190,425 B2 Figure 13

Figure 14

Figure 15

Figure 16

Figure 17 US. Patent May 29, 2012 Sheet 11 0f 21 US 8,190,425 B2

(151 = atan[

Figure 20

Figure 21 W0 WOF W1 Wm

Figure 22

W0 So_abOOW0F s,_00cdW, W11: US. Patent May 29, 2012 Sheet 12 0f 21 US 8,190,425 B2 Figure 23 Zo [Si-L,S0 _ aCO bC0Kim-L, Zo _ aC0 b/a0 up]O W

Figure 24 XOXJ XOXI' RXX : XIXOa X1 X1a

R31

Figure 27

Figure 28 US. Patent May 29, 2012 Sheet 13 or 21 US 8,190,425 B2

Figure 29

1/2 - - . U A V u0o uOI cosa) smw _ uo0cosw—uws1nw uoosmco+uolcosco ul0 ull —sinca coscu ulocosa)—unsinw ulosinw+uncosw

Figure 30

u00 sin a) + um cosa) : —(ul0 sin a) + uH cos co) m : atan2(—u11 _uo|,u00 +7110)

Figure 31

aC0 b/a O cCl 0 d/c

Figure 32

Figure 33

for some constant T. US. Patent May 29, 2012 Sheet 14 0f 21 US 8,190,425 B2

Figure 34

3400 \ Spectral coef?cients 3415

Base-band / extended-band partitioning 3420

Base-band / extended-band coef?cients and side information 3425

Coding 3430

Coded coef?cients and side information 3435 US. Patent May 29, 2012 Sheet 15 0f 21 US 8,190,425 B2

Figure 35

Calculate scale parameter of 3500 3510*” / , current extended band l Calculate shape parameter of 3520-” current extended band l Search for closest matching band in baseband portion

3532 Close? No Yes Search for matching band in 4“ 3540 ?xed codebook

Determine to be normalized 3 542 random noise vector Yes

Determine vector as index to matching band of codebook Y Determine vector pointing to 3534~" closest matching band

Y | Next extended band I U.S. Patent May 29, 2012 Sheet 16 0f 21 US 8,190,425 B2 F1 gure 3 6

3600 \ Bitstream 3605

Baseband Decoder 3640

Baseband Spectral Extended Band Coefficients _ ' ’ Decoder 3650

Y Extended Band Spectral Coef?cients / Inverse Transform 3680

Reconstructed Audio Blocks