Achievd Quality(NER)

US007295971B2

(12) United States Patent (10) Patent N0.: US 7,295,971 B2 Chen et a]. (45) Date of Patent: *Nov. 13, 2007

(54) ACCOUNTING FOR NON-MONOTONICITY 4,493,091 A 1/1985 Gundry OF QUALITY AS A FUNCTION OF 4,706,260 A 11/1987 Fedele et a1. QUANTIZATION IN QUALITY AND RATE 4,802,224 A 1/1989 Shiraki et a1. CONTROL FOR DIGITAL AUDIO 4,954,892 A 9/1990 Asai et a1. 5,043,919 A 8/1991 Callaway et a1. 5,089,889 A 2/1992 Sugiyama (75) Inventors: Wei-Ge Chen, Sammamish, WA (US); 5,136,377 A 8/1992 Johnston et al. Naveen Thumpudi, Sammamish, WA (US); Ming-Chieh Lee, Bellevue, WA (Continued) (Us) OTHER PUBLICATIONS Assignee: Microsoft Corporation, Redmond, WA (73) Advanced Television Systems Committee, “ATSC Standard: Digital (Us) Audio Compression (AC-3), Revision A,” pp. 1-140 (Aug. 2001). Baron et a1, “Coding the Audio Signal,” Digital Image and Audio (*) Notice: Subject to any disclaimer, the term of this Communications, pp. 101-128 (1998). patent is extended or adjusted under 35 Beerends, “Audio Quality Determination Based on Perceptual Mea U.S.C. 154(b) by 0 days. surement Techniques,” Applications ofDigital Signal Processing to Audio and Acoustics, Chapter 1, Ed. Mark Kahrs, Karlheinz This patent is subject to a terminal dis Brandenburg, Kluwer Acad. Publ., pp. 1-38 (1998). claimer. (Continued) (21) Appl. N0.: 11/599,686 Primary ExamineriDonald L. Storm (74) Attorney, Agent, or FirmiKlarquist Sparkman, LLP (22) Filed: Nov. 14, 2006 (57) ABSTRACT (65) Prior Publication Data US 2007/0061138 A1 Mar. 15, 2007 An audio encoder regulates quality and bitrate With a control strategy. The strategy includes several features. First, an Related US. Application Data encoder regulates quantization using quality, minimum bit (60) Continuation of application No. 11/066,898, ?led on count, and maximum bit count parameters. Second, an Feb. 24, 2005, Which is a division of application No. encoder regulates quantization using a noise measure that 10/017,694, ?led on Dec. 14, 2001, noW Pat. No. indicates reliability of a complexity measure. Third, an 7,027,982. encoder normalizes a control parameter value according to block size for a variable-size block. Fourth, an encoder uses (51) Int. Cl. a bit-count control loop de-linked from a quality control G10L 19/12 (2006.01) loop. Fifth, an encoder addresses non-monotonicity of qual (52) US. Cl...... 704/222; 704/230; 375/240.05 ity measurement as a function of quantization level When selecting a quantization level. Sixth, an encoder uses par (58) Field of Classi?cation Search ...... None See application ?le for complete search history. ticular interpolation rules to ?nd a quantization level in a quality or bit-count control loop. Seventh, an encoder ?lters (56) References Cited a control parameter value to smooth quality. Eighth, an encoder corrects model bias by adjusting a control parameter U.S. PATENT DOCUMENTS value in vieW of current buifer fullness. 4,051,470 A 9/1977 Esteban et a1. 4,454,546 A 6/1984 Mori et al. 20 Claims, 13 Drawing Sheets

350 300 25o 1100\ AchievedQuality(NER) 200 150 100 50

I l v I 0 50 100 150 200 250 300 Quantization Step Size US 7,295,971 B2 Page 2

US. PATENT DOCUMENTS 6,654,417 B1 11/2003 Hui 6,654,419 B1 11/2003 Sriram et al. 5,235,618 A 8/1993 Sakai et al. 6,728,317 B1 4/2004 Demos 5,266,941 A 11/1993 Akeley er 91- 6,732,071 B2 5/2004 Lopez-Estrada et al. 5,317,672 A 5/1994 Crossman et al. 6,760,598 B1 7/2004 Kurjenniemi 5,394,170 A 2/1995 Akeley eta1~ 6,810,083 B2 10/2004 Chen etal. 5,398,069 A 3/ 1995 Huang et 91- 6,876,703 B2 4/2005 Ismaeil et al. 5,400,371 A 3/1995 Natarajan 6,895,050 B2 5/2005 Lee 5,414,796 A 5/1995 Jacobs et al. 6,934,677 B2 8/2005 Chen et a1‘ 5,448,297 A 9/1995 Alan“ G‘ a1~ 7,027,982 B2 4/2006 Chen etal. 5,457,495 A 10/1995 Hammg 7,143,030 B2 11/2006 Chen et al. 2,225,523 2 2x332 11511112112121? 31' 7,146,313 B2 12/2006 Chen et al. 5’570’363 A “V1996 Holm 2002/0143556 A1 10/2002 Kadatch 5’579’430 A 11/1996 Grill et a1‘ 2002/0154693 A1 10/2002 Demos 535863200 A 0/1996 Devaney et a1‘ 2002/0176624 A1 11/2002 Kostrzewski et al. 5,602,959 A 21997 Bergstrom et a1‘ 2003/0110236 A1 6/2003 Yang et al. 5,623,424 A 4/1997 Azadegan et al. 2003/0115041 A1 6/2003 Chen 5,650,860 A 7/1997 UZ 2003/0115042 A1 6/2003 Chen 5,654,760 A * 8/1997 Ohtsuki ...... 375/240.04 2003/0115050 A1 6/2003 Chen 5,661,755 A 8/1997 Van De Kerkhof et al. 2003/0115051 A1 6/2003 Chen 5,666,161 A 9/1997 Kohiyama et al. 2003/0115052 A1 6/2003 Chen 5,686,964 A 11/1997 Tabatabai et al. 2003/0125932 A1 7/2003 Wang et al. 5,724,453 A * 3/1998 Ratnakar et al...... 382/251 2005/0015528 A1 1/2005 Du 5,742,735 A 4/1998 Ebeflein 6t a1~ 2005/0084166 A1 4/2005 Boneh et al. 5,754,974 A 5/1998 Grif?n er a1 2005/0135484 A1 6/2005 Lee 5787203 A 7/1998 Lee et 91- 2005/0157784 A1 7/2005 Tanizawa etal. 5,802,213 A 9/1998 Gard“ 2006/0062302 A1 3/2006 Yin et al. 5,819,215 A 10/1998 Dobson et al. 5,825,310 A 10/1998 Tsutsui 5,835,149 A 11/1998 Astle OTHER PUBLICATIONS i :tt 3' Caetano et al., “Rate Control Strategy for Embedded Wavelet Video 5’884’039 A 3/1999 Ludvlii et Coders,” Electronic Letters, pp. 1815-1817 (Oct. 14, 1999). 5’886’276 A 3/l999 Levineget a1" Cheung et al., “A Comparison of Scalar Quantization Strategies for 5’926’226 A 7/l999 Proctor et a1‘ No1sy Data Channel Data Transm1ss1on,” IEEE Transactions on 5’933’45l A 8/1999 OZkan et a1‘ Communications, vol. 43, No. 2/3/4, pp. 738-742 (Apr. 1995). 5’952’943 A 9 / 1999 Walsh et a1‘ C-risafulli et al., “Adaptive Quantization: Solution via Nonadaptive 5’982’305 A “H999 Taylor L1near Control,” IEEE Transactions on Communications, vol. 41, 5,986,712 A 11/1999 Peterson et al. PP‘ 741'748 (May 1993)‘ 5,990,945 A “H999 Sinha et a1‘ Dalgic et al., “Characterization of Quality and Traf?c for Various 5,995,151 A “H999 Naveen et a1‘ Video Encoding Schemes and Various Encoder Control Schemes,” 630023439 A 0/1999 Mumkaml- et a1‘ Technical Report No. CSL-TR-96-701 (Aug. 1996). 6,029,126 A 20000 Malvar De Luca, “AN1090 Application Note: STA013 MPEG 2.5 Layer 111 6,049,630 A 40000 Wang et 31‘ Source Decoder,” STMicroelectronics, 17 pp. (1999). 6,058,362 A 5/2000 Malvar de Queiroz et al., “Time-Varying Lapped Transforms and Wavelet 6,072,831 A 6/2000 Chen Packets,” IEEE Transactions on Signal Processing, vol. 41, pp. 6,073,153 A 6/2000 Malvar 3293-3305 (1993) 6,075,768 A 6/2000 Mishra “DivX Multi StandardVideo Encoder,”2pp. (Downloaded fromthe 6,088,392 A 7/2000 Rosenberg World Wide Web on Jan 24, 2006) 6,111,914 A 8/2()()() Bist Dolby Laboratories, “AAC Technology,” 4 pp. [Downloaded from 6,115,689 A 9/2()()() Malvar the web site aac-audio.com on World Wide Web on Nov. 21, 2001]. 6,160,846 A 12/2000 Chiang et a1, Fraunhofer-Gesellschaft, “MPEG Audio Layer-3,” 4 pp. [Down 6,182,034 B1 1/2001 Malvar loaded from the World Wide Web on Oct. 24, 2001]. 6,212,232 B1 4/2001 Reed et al. Fraunhofer-Gesellschaft, “MPEG-2 AAC,” 3 pp. [Downloaded 6,215,820 B1 4/2001 Bagni et al. from the World Wide Web on Oct. 24, 2001]. 6,223,162 B1 4/2001 Chen et al. Gibson et a1, Digital Compression for Multimedia, Chapter 4, 6,226,407 B1 5/2001 Zabih et al. “Quantization,” pp. 113-138 (1998). 6,240,380 B1 5/2001 Malvar Gibson et al., Digital Compression for Multimedia, Chapter 7, 6,243,497 B1 6/2001 Chiang et al. “Frequency Domain Coding,” Morgan Kaufman Publishers, Inc., 6,278,735 B1 8/2001 Mohsenian pp. iii, v-xi, and 227-262 (1998). 6,320,825 B1 ll/ZOOI Brllekers et 61. Gibson et al., Digital Compression for Multimedia, Chapter 8, 6,351,226 B1 2/2002 Saunders et a1~ “Frequency Domain Speech and Audio Coding Standards,” Morgan 6,370,502 B1 4/2002 Wu et a1~ Kaufman Publishers, Inc., pp. 263-290 (1998). 6,421,738 B1 7/2002 R?tan et a1~ Gibson et al., Digital Compression for Multimedia, Chapter 11.4, 6,421,739 B1 7/2002 Holiday “MPEG Audio,” Morgan Kaufman Publishers, Inc., pp. 398-402 6,441,754 B1 * 8/2002 Wang et al...... 341/50 (1998)‘ 6,473,409 B1 10/2002 Malvar Gibson et al., Digital Compression for Multimedia, Chapter 11.6. 6,490,554 B2 12/2002 Endo et a1~ 2-11.6.4, “More MPEG,” Morgan Kaufman Publishers, Inc., pp. 6,501,798 B1 12/2002 Sivan 415.416 (1998), 6,522,693 B1 2/2003 Lu et a1~ Gill et al., “Creating High-Quality Content with Microsoft Windows 6,573,915 B1 6/2003 Sivan et al. Media Encoder 7,”4 pp. (2000). [Downloaded from the World Wide 6,574,593 B1 6/2003 Gao et al. Web on May 1, 2002.]. US 7,295,971 B2 Page 3

Herley et al., “Tilings of the Time-Frequency Plane: Construction of Schaar-Mitrea et al., “Hybrid Compression of Video with Graphics Arbitrary Orthogonal Bases and Fast Tiling Algorithms,” IEEE in DTV Communications Systems,” IEEE Trans. On Consumer Transactions on Signal Processing, vol. 41, No. 12, pp. 3341-3359 Electronics, pp. 1007-1017 (2000). (1993). Shlien, “The Modulated Lapped Transform, Its Time-Varying ISO, “MPEG-4 Video Veri?cation Model version 18.0,” ISO/IEC Forms, and Its Applications to Audio Coding Standards,” IEEE JTC1/SC29/WG11 N3908, Pisa, pp. 1-10, 299-311 (Jan. 2001). Transactions on Speech and Audio Processing, vol. 5, No. 4, pp. ISO/IEC 11172-3, Information Technology4Coding of Moving 359-366 (Jul. 1997). Pictures and Associated Audio for Digital Storage Media at Up to Schuster et al., “A Theory for the Optimal Bit Allocation Between About 1.5 Mbit/siPart 3 Audio, 154 pp, (1993). Displacement Vector Field and Displaced Frame Difference,” IEEE ISO/IEC 13818-7, “Information Technology-Generic Coding of J. on SelectedAreas in Comm., vol. 15, No. 9, pp. 1739-1751 (Dec. Moving Pictures and Associated Audio Information,” Part 7: 1997). Advanced Audio Coding (AAC), pp. i-iv, 1-145 (1997). Sidiropoulos, “Optimal Adaptive Scalar Quantization and image ISO/IEC 13818-7, Technical Corrigendum 1, “Information Tech Compression,” ICIP, pp. 574-578 (1998). nology-Generic Coding of Moving Pictures and Associated Audio Solari, Digital I/ldeo and Audio Compression, Title Page, Contents, Information,” Part 7: Advanced Audio Coding (AAC), Technical “Chapter 8: Sound and Audio,” McGraw-Hill, Inc., pp. iii, v-vi, and Corrigendum, pp. 1-22 (1997). 187-211 (1997). ITU, Recommendation ITU-R BS 1115, Low Bit-Rate Audio Cod Srinivasan et al., “High-Quality Audio Compression Using an ing, 9 pp. (1994). Adaptive Wavelet Packet Decomposition and Psychoacoustic Mod ITU, Recommendation ITU-R BS 1387, Method for Objective eling,” IEEE Transactions on Signal Processing, vol. 46, No. 4, pp. Measurements of Perceived Audio Quality, 89 pp. (1998). 1085-1093 (Apr. 1998). Jafarkhani et al., “Entropy-Constrained Successively Re?nable Sca Sullivan, “Optimal Entropy Constrained Scalar Quantization for lar Quantization,” IEEE Data Compression Conference, pp. 337 Exponential and Laplacian Random Variables,” ICASSP, pp. V-265 346 (1997). V-268 (1994). Jayant et al., “Digital Coding of Waveforms, Principles and Appli Sullivan et al., “Rate-Distortion Optimization for Video Compres cations to Speech and Video,” Prentice Hall, pp. 428-445 (1984). sion,” IEEE Signal Processing Magazine, pp. 74-90 (Nov. 1998). Jesteadt et al., “Forward Masking as a Function of Frequency, Tao et al., “Adaptive Model-driven Bit Allocation for MPEG Video Masker Level, and Signal Delay,” Journal of Acoustical Society of Coding,” IEEE Transactions on Circuits and Systems for I/ldeo America, vol. 71, pp. 950-962 (1982). Tech., vol. 10, No. 1, pp. 147-157 (Feb. 2000). Kondoz, Digital Speech.‘ Coding for Low Bit Rate Communications Terhardt, “Calculating Virtual Pitch,” Hearing Research, vol. 1, pp. Systems, “Chapter 3.3: Linear Predictive Modeling of Speech 155-182 (1979). Signals,” and “Chapter 4: LPC Parameter Quantisation Using Trushkin, “On the Design on an Optimal Quantizer,” IEEE Trans LSFs,” John Wiley & Sons, pp. 42-53 and 79-97 (1994). actions on Information Theory, vol. 39, No. 4, pp. 1180-1194 (Jul. Li et al., “Optimal Linear Interpolation Coding for Server-Based 1993). Computing,” Proc. IEEE Int’! Conf on Communications, 5 pp. Tsang et al., “Fuzzy based rate control for real-time MPEG video,” (2002). 12 pp. Lut?, “Additivity of Simultaneous Masking,” Journal of Acoustic Vetro et al., “An Overview of MPEG-4 Object-Based Encoding Society ofAmerica, vol. 73, pp. 262-267 (1983). Algorithms,” IEEE International Symposium on Information Tech Malvar, “Biorthogonal and Nonuniform Lapped Transforms for nology, pp. 366-369 (2001). Transform Coding with Reduced Blocking and Ringing Artifacts,” Westerink et al., “Two-pass MPEG-2 Variable-bit-rate Encoding,” IEEE Transactions on Signal Processing, Special Issue on Multirate IBM]. Res. Develop., vol. 43, No. 4, pp. 471-488 (1999). Systems, Filter Banks, Wavelets, and Applications, vol. 46, 29 pp. Wong, “Progressively Adaptive Scalar Quantization,” ICIP, pp. (1998). 357-360 (1996). Malvar, “Lapped Transforms for Ef?cient Transform/Subband Cod Wragg et al., “An Optimised Software Solution for an ARM ing,” IEEE Transactions on Acoustics, Speech and Signal Process PoweredTM MP3 Decoder,” 9 pp. [Downloaded from the World ing, vol. 38, No. 6, pp. 969-978 (1990). Wide Web on Oct. 27, 2001]. Malvar, Signal Processing with Lapped Transforms, Artech House, Wu et al., “Entropy-Constrained Scalar Quantization and Minimum Norwood, MA, pp. iv, vii-xi, 175-218, and 353-357 (1992). Entropy with Error Bound by Discrete Wavelet Transforms in Image Naveen et al., “Subband Finite State Scalar Quantization,” IEEE Compression,” IEEE Transactions on Image Processing, vol. 48, Transactions on Image Processing, vol. 5, No. 1, pp. 150-155 (Jan. No. 4, pp. 1133-1143 (Apr. 2000). 1996). Wu et al., “Quantizer Monotonicities and Globally Optimally Scalar OPTICOM GmbH, “Objective Perceptual Measurement,” 14 pp. Quantizer Design,” IEEE Transactions on Information Theory, vol. [Downloaded from the World Wide Web on Oct. 24, 2001]. 39, No. 3, pp. 1049-1053 (May 1993). Ortega et al., “Adaptive Scalar Quantization Without Side Infor Yang et al., “Rate Control for Videophone Using Local Perceptual mation,”IEEE Transactions on Image Processing, vol. 6, No. 5, pp. Cues,” IEEE Transactions on Circuits and Systems for I/ldeo Tech., 665-676 (May 1997). vol. 15, No. 4, pp. 496-507 (Apr. 2005). Ortega et al., “Optimal Buffer-Constrained Source Quantization and Zwicker et al., Das Ohr als Nachrichtenempfanger, Title Page, Fast Approximation,” IEEE, pp. 192-195 (1992). Table of Contents, “I: Schallschwingungen,” Index, Hirzel-Verlag, Stuttgart, pp. iii, ix-xi, 1-26 and 231-232 (1967). Phamdo, “Speech Compression,” 13 pp. [Downloaded from the Zwicker, Psychoakustik, Title Page, Table of Contents, “Teil I: World Wide Web on Nov. 25, 2001]. Einfuhrung,” Index, Springer-Verlag, Berlin Heidelberg, New York, Ramchandran et al., “Bit Allocation for Dependent Quantization pp. ii, ix-xi, 1-30 and 157-162 (1982). with Applications to MPEG Video Coders,” IEEE, pp. v-381-v-3 84 Reed et al., “Constrained Bit-Rate Control for Very Low Bit-Rate (1993). Streaming-Video Applications,” IEEE Transactions on Circuits and Ratnakar et al., “RD-OPT: An E?icient Algorithm for Optimization Systems for I/ldeo Technology, vol. 11, No. 7, pp. 882-889 (Jul. DCT Quantization Tables,” 11 pp. 2001). Ribas Corbera et al., “Rate Control in DCT Video Coding for Sheu et al., “A Buffer Allocation Mechanism for VBR Video Low-Delay Communications,” IEEE Transactions on Circuits and Playback,” Communication Tech. Proc. 2000, WCC-ICCT 2000, Systems for I/ldeo Technology, vol. 9, No. 1, pp. 172-185 (Feb. vol. 2, pp. 1641-1644 (2000). 1999). Walpole et al., “A Player for Adaptive MPEG Video Streaming over Ronda et al., “Rate Control and Bit Allocation for MPEG-4,” IEEE the Internet,” Proc. SPIE, vol. 3240, pp. 270-281 (1998). Transactions on Circuits and Systems for I/ldeo Technology, pp. 1243-1258 (1999). * cited by examiner

U.S. Patent Nov. 13, 2007 Sheet 2 0f 13 US 7,295,971 B2

F lg u re 2

Input Audio Samples 205 Audio Encoder / 200

Frequency <-'—> Transformer 210

ModelerPerception 230 |Yl|_u|t'-Channelransformer 220

Output Bitstream 295 > Weighter 240 Bitstream MUX 280

> Quantizer 250

Rate/Quality Controller 270

‘ I Entropy Encoder 260 U.S. Patent Nov. 13, 2007 Sheet 3 0f 13 US 7,295,971 B2

Flguré. 3 De‘cjolioerA d‘ / 300

Entropy Decoder 320 l > Inverse Quantizer 330 GeneratorNoise 340 } l Bitstream 305 Bitstream Inverse ——> Dilflllgx Weighter 350 l Inverse Multi > Channel Transf. 360 l lnv. Frequency > Transformer 370 l Reconstructed Audio Samples 395

U.S. Patent Nov. 13, 2007 Sheet 6 0f 13 US 7,295,971 B2

Figure 8a Figure 8b

b ias_ correcriorp max((0.7 5 - b Hm‘Ml (0.0 62 5' b “hum,” U.S. Patent Nov. 13, 2007 Sheet 7 0f 13 US 7,295,971 B2

Figure 9 900 \ Q

91 DJ‘ Get First Block 1 Determine Current 920-" Block Size l 930_,- Compute Value for Control Parameter

940 @ es Get Next Block no 5 950 U.S. Patent Nov. 13, 2007 Sheet 8 0f 13 US 7,295,971 B2

3523:60EWm MS826:80E825 :ozm?EmaOQoo|_ M35@5260EMFQZ aN.82 22 All2.5WAIIM52228m£26 WcozmgcmaOQ03 bwwmmw< siékliW$38582 95mm.2239m2 102m553213 956580 4/002...... n. :o=m~=:m:083on.‘ EEEEQE EPGEQQQMj_ .SEBaQUQUOQ .QQQEQBNNZ§>N=B=Q U.S. Patent Nov. 13, 2007 Sheet 9 0f 13 US 7,295,971 B2

Figure 11

3 5 o _ 1100 \ 231

0o5 _. Emz.>5263322 o0

50 100 150 200 250 300 Quantization Step Size

Figure 12

7000 -

6000 - 1200 5000 - / 4000 - umumhocmmu$5 3000 - 2000 '

1000 -

50 100 150 200 250 300 Quantization Step Size U.S. Patent Nov. 13, 2007 Sheet 10 0f 13 US 7,295,971 B2

F lg u re 1 4

1410-’- Clear ls, NER] Array /1400

1 412; Select inltiai Step Size st

1414_,_ Select Initial Bracket [s|, sh] 1472 i 1480 i L Check for Non 1420~1~ Guam‘? WI-th Step <—‘ Adjust Step Size st <— Monotonicity in ize s ‘ Updated Bracket

1430~f Reconstruct Block ‘ 147O_,. Update Bracket ts.- sh] 1440.,- Measure NERt for Block

Check for Non Record [sv NERt] Pair —> Monotonicity in Recorded Pairs 5 1460 5 1462 1490-" sNER = st U.S. Patent Nov. 13, 2007 Sheet 11 0f 13 US 7,295,971 B2

Figure 15

Clear [s, bit count] /1500 1510-’ Array l Select Initial Step 1512—'' Size s'=sNER l Select Initial Bracket 1514~f [sp sh] l Quantize With Step 1520“ Adjust Step Size st ~1580 Size st l 1530-" Entropy Code Block l Update[SP BracketSh] x1 Measure Bit Count bt for Block

Record [5,, b,] Pair ~1560

1590’ U.S. Patent Nov. 13, 2007 Sheet 12 0f 13 US 7,295,971 B2

Figure 17

1710-4‘ Get First Block l 1720_,- Compute Value for Control Parameter l 1730-’ Check Buffer l 1740J' Correct Bias

es 1750 @ y Get NextBlock no 8 1 760 U.S. Patent Nov. 13, 2007 Sheet 13 0f 13 US 7,295,971 B2

Figure 18 M Q 1810~f Get First Block 1 18204- Compute Value for Control Parameter l 1830~f Adjust Lowpass Filter l 1840-" Lowpass Filter Value

es 1850 @ y Get Next Block no 8 1860 US 7,295 ,971 B2 1 2 ACCOUNTING FOR NON-MONOTONICITY cies of sound can be represented. Some common sampling OF QUALITY AS A FUNCTION OF rates are 8,000, 11,025, 22,050, 32,000, 44,100, 48,000, and QUANTIZATION IN QUALITY AND RATE 96,000 samples/second. CONTROL FOR DIGITAL AUDIO Mono and stereo are tWo common channel modes for audio. In mono mode, audio information is present in one RELATED APPLICATION INFORMATION channel. In stereo mode, audio information is present in tWo channels usually labeled the left and right channels. Other The present application is a continuation of US. patent modes With more channels, such as 5-channel surround application Ser. No. 11/066,893, ?led Feb. 24, 2005, entitled sound, are also possible. Table 1 shoWs several formats of “Accounting for Non-monotonicity of Quality as a Function audio With different quality levels, along With corresponding of Quantization in Quality and Rate Control For Digital raW bitrate costs. Audio,” Which is a divisional of US. patent application Ser. No. 10/017,694, ?led Dec. 14, 2001, entitled “Quality and TABLE 1 Rate Control Strategy For Digital Audio,” now US. Pat. No. 7,027,982, the disclosure of Which is hereby incorporated by Bitrates for different gualig audio information reference. The following US. patent applications relate to Sample the present application: 1) US. patent application Ser. No. Depth Sampling Rate RaW Bitrate 10/020,703, entitled, “Adaptive WindoW-Size Selection in Quality (bits/sample) (samples/second) Mode (bits/second) Transform Coding,” ?led Dec. 14, 2001, the disclosure of Internet telephony 8 8,000 mono 64,000 Which is hereby incorporated by reference; 2) US. patent 20 telephone 8 11,025 mono 88,200 application Ser. No. 10/016,918, entitled, “Quality Improve CD audio 16 44,100 stereo 1,411,200 ment Techniques in an Audio Encoder,” ?led Dec. 14, 2001, high quality audio 16 48,000 stereo 1,536,000 now US. Pat. No. 7,240,001, the disclosure of Which is hereby incorporated by reference; 3) US. patent application As Table 1 shoWs, the cost of high quality audio infor Ser. No. 10/017,702, entitled, “Quantization Matrices Based 25 mation such as CD audio is high bitrate. High quality audio on Critical Band Pattern Information for Digital Audio information consumes large amounts of computer storage Wherein Quantization Bands Differ from Critical Bands,” and transmission capacity. ?led Dec. 14, 2001, now US. Pat. No. 6,934,677, the Compression (also called encoding or coding) decreases disclosure of Which is hereby incorporated by reference; and the cost of storing and transmitting audio information by 4) US. patent application Ser. No. 10/017,861, entitled, 30 “Techniques for Measurement of Perceptual Audio Quality,” converting the information into a loWer bitrate form. Com pression can be lossless (in Which quality does not suffer) or ?led Dec. 14, 2001, now US. Pat. No. 7,146,313, the disclosure of Which is hereby incorporated by reference. lossy (in Which quality suffers). Decompression (also called decoding) extracts a reconstructed version of the original information from the compressed form. TECHNICAL FIELD 35 Quantization is a conventional lossy compression tech The present invention relates to a quality and rate control nique. There are many different kinds of quantization includ strategy for digital audio. In one embodiment, an audio ing uniform and non-uniform quantization, scalar and vector encoder controls quality and bitrate by adjusting quantiza quantization, and adaptive and non-adaptive quantization. tion of audio information. 40 Quantization maps ranges of input values to single values. For example, With uniform, scalar quantization by a factor of BACKGROUND 3.0, a sample With a value anyWhere betWeen —1.5 and 1.499 is mapped to 0, a sample With a value anyWhere betWeen 1.5 With the introduction of compact disks, digital Wireless and 4.499 is mapped to 1, etc. To reconstruct the sample, the telephone networks, and audio delivery over the Internet, 45 quantized value is multiplied by the quantization factor, but digital audio has become commonplace. Engineers use a the reconstruction is imprecise. Continuing the example variety of techniques to control the quality and bitrate of started above, the quantized value 1 reconstructs to 1><3:3; digital audio. To understand these techniques, it helps to it is impossible to determine Where the original sample value understand hoW audio information is represented in a com Was in the range 1.5 to 4.499. Quantization causes a loss in puter and hoW humans perceive audio. 50 ?delity of the reconstructed value compared to the original value. Quantization can dramatically improve the effective I. Representation of Audio Information in a Computer ness of subsequent lossless compression, hoWever, thereby A computer processes audio information as a series of reducing bitrate. numbers representing the audio information. For example, a An audio encoder can use various techniques to provide single number can represent an audio sample, Which is an 55 the best possible quality for a given bitrate, including amplitude (i.e., loudness) at a particular time. Several factors transform coding, modeling human perception of audio, and affect the quality of the audio information, including sample rate control. As a result of these techniques, an audio signal depth, sampling rate, and channel mode. can be more heavily quantized at selected frequencies or Sample depth (or precision) indicates the range of num times to decrease bitrate, yet the increased quantization Will bers used to represent a sample. The more values possible 60 not signi?cantly degrade perceived quality for a listener. for the sample, the higher the quality because the number Transform coding techniques convert information into a can capture more subtle variations in amplitude. For form that makes it easier to separate perceptually important example, an 8-bit sample has 256 possible values, While a information from perceptually unimportant information. The 16-bit sample has 65,536 possible values. less important information can then be quantized heavily, The sampling rate (usually measured as the number of 65 While the more important information is preserved, so as to samples per second) also affects quality. The higher the provide the best perceived quality for a given bitrate. Trans sampling rate, the higher the quality because more frequen form coding techniques typically convert information into US 7,295 ,971 B2 3 4 the frequency (or spectral) domain. For example, a trans With loWer quality For periods of complex audio information form coder converts a time series of audio samples into due to increased quantiZation and higher quality for periods frequency coef?cients. Transform coding techniques include of simple audio information due to decreased quantiZation. Discrete Cosine Transform [“DCT”], Modulated Lapped While adjustment of quantiZation and audio quality is Transform [“MLT”], and Fast Fourier Transform [“FFT”]. In necessary at times to satisfy constant bitrate requirements, practice, the input to a transform coder is partitioned into current CBR encoders can cause unnecessary changes in blocks, and each block is transform coded. Blocks may have quality, Which can result in thrashing betWeen high quality varying or ?xed siZes, and may or may not overlap With an and loW quality around the appropriate, middle quality. adjacent block. After transform coding, a frequency range of Moreover, When changes in audio quality are necessary, coef?cients may be grouped for the purpose of quantiZation, current CBR encoders often cause abrupt changes, Which are in Which case each coef?cient is quantiZed like the others in more noticeable and objectionable than smooth changes. the group, and the frequency range is called a quantiZation Microsoft Corporation’s WindoWs Media Audio version band. For more information about transform coding and 7.0 [“WMA7”] includes an audio encoder that can be used MLT in particular, see Gibson et al., Digital Compressianfor to compress audio information for streaming at a constant Multimedia, “Chapter 7: Frequency Domain Coding,” Mor bitrate. The WMA7 encoder uses a virtual buffer and rate gan Kaufman Publishers, Inc., pp. 227-262 (1998); US. Pat. control to handle variations in bitrate due to changes in the No. 6,115,689 to Malvar; H. S. Malvar, Signal Processing complexity of audio information. with Lapped Transforms, Artech House, NorWood, Mass., To handle short-term ?uctuations around the constant 1992; or Seymour Schlein, “The Modulated Lapped Trans bitrate (such as those due to brief variations in complexity), form, Its Time-Varying Forms, and Its Application to Audio 20 the WMA7 encoder uses a virtual buffer that stores some Coding Standards,” IEEE Transactions on Speech and Audio duration of compressed audio information. For example, the Processing, Vol. 5, No. 4, pp. 359-66, July 1997. virtual buffer stores compressed audio information for 5 In addition to the factors that determine objective audio seconds of audio playback. The virtual buffer outputs the quality, perceived audio quality also depends on hoW the compressed audio information at the constant bitrate, so human body processes audio information. For this reason, 25 long as the virtual buffer does not under?oW or over?oW. audio processing tools often process audio information Using the virtual buffer, the encoder can compress audio according to an auditory model of human perception. information at relatively constant quality despite variations Typically, an auditory model considers the range of in complexity, so long as the virtual buffer is long enough to human hearing and critical bands. Humans can hear sounds smooth out the variations. In practice, virtual buffers must be ranging from roughly 20 HZ to 20 kHZ, and are most 30 limited in duration in order to limit system delay, hoWever, sensitive to sounds in the 2-4 kHZ range. The human nervous and buffer under?oW or over?oW can occur unless the system integrates sub-ranges of frequencies. For this reason, encoder intervenes. an auditory model may organiZe and process audio infor To handle longer-term deviations from the constant bitrate mation by critical bands. Aside from range and critical (such as those due to extended periods of complexity or bands, interactions betWeen audio signals can dramatically 35 silence), the WMA7 encoder adjusts the quantiZation step a?fect perception. An audio signal that is clearly audible if siZe of a uniform, scalar quantiZer in a rate control loop. The presented alone can be completely inaudible in the presence relation betWeen quantiZation step siZe and bitrate is com of another audio signal, called the masker or the masking plex and hard to predict in advance, so the encoder tries one signal. The human ear is relatively insensitive to distortion or more different quantiZation step siZes until the encoder or other loss in ?delity (i.e., noise) in the masked signal, so 40 ?nds one that results in compressed audio information With the masked signal can include more distortion Without a bitrate suf?ciently close to a target bitrate. The encoder sets degrading perceived audio quality. An auditory model typi the target bitrate to reach a desired buffer fullness, prevent cally incorporates other factors relating to physical or neural ing buffer under?oW and over?oW. Based upon the com aspects of human perception of sound. plexity of the audio information, the encoder can also Using an auditory model, an audio encoder can determine 45 allocate additional bits for a block or deallocate bits When Which parts of an audio signal can be heavily quantiZed setting the target bitrate for the rate control loop. Without introducing audible distortion, and Which parts The WMA7 encoder measures the quality of the recon should be quantiZed lightly or not at all. Thus, the encoder structed audio information for certain operations (e.g., can spread distortion across the signal so as to decrease the deciding Which bands to truncate). The WMA7 encoder does 50 audibility of the distortion. not use the quality measurement in conjunction With adjust II. Controlling Rate and Quality of Audio Information ment of the quantiZation step siZe in a quantiZation loop, Different audio applications have different quality and hoWever. bitrate requirements. Certain applications require constant The WMA7 encoder controls bitrate and provides good quality over time for compressed audio information. Other 55 quality for a given bitrate, but can cause unnecessary quality applications require variable quality and bitrate. Still other changes. Moreover, With the WMA7 encoder, necessary applications require constant or relatively constant bitrate changes in audio quality are not as smooth as they could be [collectively, “constant bitrate” or “CBR”]. One such CBR in transitions from one level of quality to another. application is encoding audio for streaming over the Inter Numerous other audio encoders use rate control strate net. 60 gies; for example, see US. Pat. No. 5,845,243 to Smart et al. A CBR encoder outputs compressed audio information at Such rate control strategies potentially consider information a constant bitrate despite changes in the complexity of the other than or in addition to current buffer fullness, for audio information. Complex audio information is typically example, the complexity of the audio information. less compressible than simple audio information. For the Several international standards describe audio encoders CBR encoder to meet bitrate requirements, the CBR encoder 65 that incorporate distortion and rate control. The Motion can adjust hoW the audio information is quantiZed. The Picture Experts Group, Audio Layer 3 [“MP3”] and Motion quality of the compressed audio information then varies, Picture Experts Group 2, Advanced Audio Coding [“AAC”] US 7,295 ,971 B2 5 6 standards each describe techniques for controlling distortion formats band information such that information for less and bitrate of compressed audio information. perceptually important bands can be incrementally removed In MP3, the encoder uses nested quantization loops to from a bitstream, if necessary, While preserving the most control distortion and bitrate for a block of audio informa information possible for a given bitrate. For more informa tion called a granule. Within an outer quantization loop for tion about zero tree coding, see Srinivasan et al., “High controlling distortion, the MP3 encoder calls an inner quan Quality Audio Compression Using an Adaptive Wavelet tization loop for controlling bitrate. Packet Decomposition and Psychoacoustic Modeling,” In the outer quantization loop, the MP3 encoder compares IEEE Transactions on Signal Processing, Vol. 46, No. 4, pp. distortions for scale factor bands to alloWed distortion (April 1998). thresholds for the scale factor bands. A scale factor band is While this strategy Works for high quality, high complex a range of frequency coef?cients for Which the encoder ity applications, it does not Work as Well for very loW to calculates a Weight called a scale factor. Each scale factor mid-bitrate applications. Moreover, the strategy assumes starts With a minimum Weight for a scale factor band. After predictable rate-distortion characteristics in the audio, and an iteration of the inner quantization loop, the encoder does not address situations in Which distortion increases ampli?es the scale factors until the distortion in each scale With the number of bits allocated. factor band is less than the alloWed distortion threshold for Outside of the ?eld of audio encoding, various joint that scale factor band, With the encoder calling the inner quality and bitrate control strategies for video encoding have quantization loop for each set of scale factors. In special been published. For example, see US. Pat. No. 5,686,964 to cases, the encoder exits the outer quantization loop even if Naveen et al.; US. Pat. No. 5,995,151 to Naveen et al.; distortion exceeds the alloWed distortion threshold for a 20 Caetano et al., “Rate Control Strategy for Embedded Wave scale factor band (e.g., if all scale factors have been ampli let Video Coders,” IEEE Electronics Letters, pp 1815-17 ?ed or if a scale factor has reached a maximum ampli?ca (Oct. 14, 1999); and Ribas-Corbera et al., “Rate Control in tion). DCT V1deo Coding for LoW-Delay Communications,” IEEE In the inner quantization loop, the MP3 encoder ?nds a Trans Circuits and Systems for Video Technology, Vol. 9, No satisfactory quantization step size for a given set of scale 25 1, (February 1999). factors. The encoder starts With a quantization step size As one might expect given the importance of quality and expected to yield more than the number of available bits for rate control to encoder performance, the ?elds of quality and the granule. The encoder then gradually increases the quan rate control for audio and video applications are Well devel tization step size until it ?nds one that yields feWer than the oped. Whatever the advantages of previous quality and rate number of available bits. 30 control strategies, hoWever, they do not offer the perfor The MP3 encoder calculates the number of available bits mance advantages of the present invention. for the granule based upon the average number of bits per granule, the number of bits in a bit reservoir, and an estimate SUMMARY of complexity of the granule called perceptual entropy. The bit reservoir counts unused bits from previous granules. If a 35 The present invention relates to a strategy for jointly granule uses less than the number of available bits, the MP3 controlling the quality and bitrate of audio information. The encoder adds the unused bits to the bit reservoir. When the control strategy regulates the bitrate of audio information bit reservoir gets too full, the MP3 encoder preemptively While also reducing quality changes and smoothing quality allocates more bits to granules or adds padding bits to the changes over time. The joint quality and bitrate control compressed audio information. The MP3 encoder uses a 40 strategy includes various techniques and tools, Which can be psychoacoustic model to calculate the perceptual entropy of used in combination or independently. the granule based upon the energy, distortion thresholds, and According to a ?rst aspect of the control strategy, quan Widths for frequency ranges called threshold calculation tization of audio information in an audio encoder is based at partitions. Based upon the perceptual entropy, the encoder least in part upon values of a target-quality parameter, a can allocate more than the average number of bits to a 45 target minimum-bits parameter, and a target maximum-bits granule. parameter. For example, the target minimum- and maxi For additional information about MP3 and AAC, see the mum-bits parameters de?ne a range of acceptable numbers MP3 standard (“ISO/IEC 11172-3, Information Technol of produced bits Within Which the audio encoder has free ogyiCoding of Moving Pictures and Associated Audio for dom to satisfy the target quality parameter. Digital Storage Media at Up to About 1.5 Mbit/siPart 3: 50 According to a second aspect of the control strategy, an Audio”) and the AAC standard. audio encoder regulates quantization of audio information Although MP3 encoding has achieved Widespread adop based at least in part upon the value of a complexity estimate tion, it is unsuitable for some applications (for example, reliability measure. For example, the complexity estimate real-time audio streaming at very loW to mid bitrates) for reliability measure indicates hoW much Weight the audio several reasons. First, the nested quantization loops can be 55 encoder should give to a measure of past or future com too time-consuming. Second, the nested quantization loops plexity When regulating quantization of the audio informa are designed for high quality applications, and do not Work tion. as Well for loWer bitrates Which require the introduction of According to a third aspect of the control strategy, an some audible distortion. Third, the MP3 control strategy audio encoder normalizes according to block size When assumes predictable rate-distortion characteristics in the 60 computing the value of a control parameter for a variable audio (in Which distortion decreases With the number of bits size block. For example, the audio encoder multiplies the allocated), and does not address situations in Which distor value by the ratio of the maximum block size to the current tion increases With the number of bits allocated. block size, Which provides continuity in the values for the Other audio encoders use a combination of ?ltering and control parameter from block to block despite changes in zero tree coding to jointly control quality and bitrate. An 65 block size. audio encoder decomposes an audio signal into bands at According to a fourth aspect of the control strategy, an different frequencies and temporal resolutions. The encoder audio encoder adjusts quantization of audio information US 7,295 ,971 B2 7 8 using a bitrate control quantization loop following and FIG. 11 is a chart showing a trace of noise to excitation outside of a quality control quantization loop. The de-linked ratio as a function of quantization step size for a block quantization loops help the encoder quickly adjust quanti according to the illustrative embodiment. zation in view of quality and bitrate goals. For example, the FIG. 12 is a chart showing a trace of number of bits audio encoder ?nds a quantization step size that satis?es produced as a function of quantization step size for a block quality criteria in the quality control loop. The audio encoder according to the illustrative embodiment. then ?nds a quantization step size that satis?es bitrate FIG. 13 is a ?owchart showing a technique for controlling criteria in the bit-count control loop, starting the testing with quality and bitrate in de-linked quantization loops according the step size found in the quality control loop. to the illustrative embodiment. According to a ?fth aspect of the control strategy, an FIG. 14 is a ?owchart showing a technique for computing audio encoder selects a quantization level (e.g., a quantiza a quantization step size in a quality control quantization loop tion step size) in a way that accounts for non-monotonicity according to the illustrative embodiment. of quality measure as a function of quantization level. This FIG. 15 is a ?owchart showing a technique for computing helps the encoder avoid selection of inferior quantization a quantization step size in a bit-count control quantization levels. loop according to the illustrative embodiment. According to a sixth aspect of the control strategy, an FIG. 16 is a table showing a non-linear function used in audio encoder uses interpolation rules for a quantization computing a value for a bias-corrected bit-count parameter control loop or bit-count control loop to ?nd a quantization according to the illustrative embodiment. level in the loop. The particular interpolation rules help the FIG. 17 is a ?owchart showing a technique for correcting encoder quickly ?nd a satisfactory quantization level. 20 model bias by adjusting a value of a control parameter According to a seventh aspect of the control strategy, an according to the illustrative embodiment. audio encoder ?lters a value of a control parameter. For FIG. 18 is a ?owchart showing a technique for lowpass example, the audio encoder lowpass ?lters the value as part ?ltering a value of a control parameter according to the of a sequence of previously computed values for the control illustrative embodiment. parameter, which smoothes the sequence of values, thereby 25 smoothing quality in the encoder. DETAILED DESCRIPTION According to a eighth aspect of the control strategy, an audio encoder corrects bias in a model by adjusting the value The illustrative embodiment of the present invention is of a control parameter based at least in part upon current directed to an audio encoder that jointly controls the quality buffer fullness. This can help the audio encoder compensate 30 and bitrate of audio information. The audio encoder adjusts for systematic mismatches between the model and this audio quantization of the audio information to satisfy constant or information being compressed. relatively constant bitrate [collectively, “constant bitrate”] Additional features and advantages of the invention will requirements, while reducing unnecessary variations in be made apparent from the following detailed description of quality and ensuring that any necessary variations in quality an illustrative embodiment that proceeds with reference to are smooth over time. the accompanying drawings. The audio encoder uses several techniques to control the quality and bitrate of audio information. While the tech BRIEF DESCRIPTION OF THE DRAWINGS niques are typically described herein as part of a single, integrated system, the techniques can be applied separately 40 FIG. 1 is a block diagram of a suitable computing in quality and/or rate control, potentially in combination environment in which the illustrative embodiment may be with other rate control strategies. implemented. In the illustrative embodiment, an audio encoder imple ments the various techniques of the joint quality and rate FIG. 2 is a block diagram of a generalized audio encoder according to the illustrative embodiment. control strategy. In alternative embodiments, another type of 45 audio processing tool implements one or more of the tech FIG. 3 is a block diagram of a generalized audio decoder niques to control the quality and/or bitrate of audio infor according to the illustrative embodiment. mation. FIG. 4 is a block diagram of a joint rate/ quality controller The illustrative embodiment relates to a quality and according to the illustrative embodiment. bitrate control strategy for audio compression. In alternative FIGS. 5a and 5b are tables showing a non-linear function 50 embodiments, a video encoder applies one or more of the used in computing a value for a target maximum-bits param control strategy techniques to control the quality and bitrate eter according to the illustrative embodiment. of video information FIG. 6 is a table showing a non-linear function used in I. Computing Environment computing a value for a target minimum-bits parameter according to the illustrative embodiment. 55 FIG. 1 illustrates a generalized example of a suitable computing environment (100) in which the illustrative FIGS. 7a and 7b are tables showing a non-linear function embodiment may be implemented. The computing environ used in computing a value for a desired buffer fullness ment (100) is not intended to suggest any limitation as to parameter according to the illustrative embodiment. scope of use or functionality of the invention, as the present FIGS. 8a and 8b are tables showing a non-linear function 60 invention may be implemented in diverse general-purpose or used in computing a value for a desired transition time special-purpose computing environments. parameter according to the illustrative embodiment. With reference to FIG. 1, the computing environment FIG. 9 is a ?owchart showing a technique for normalizing (100) includes at least one processing unit (110) and block size when computing values for a control parameter memory (120). In FIG. 1, this most basic con?guration (130) for a block according to the illustrative embodiment. 65 is included within a dashed line. The processing unit (110) FIG. 10 is a block diagram of a quantization loop accord executes computer-executable instructions and may be a real ing to the illustrative embodiment. or a virtual processor. In a multi-processing system, multiple