<<

USOO889245OB2

(12) United States Patent (10) Patent No.: US 8,892.450 B2 Schildbach et al. (45) Date of Patent: Nov. 18, 2014

(54) SIGNAL CLIPPING PROTECTION USING 381/17:381/27; 381/107: 381/106; 381/98: PRE-EXISTING AUDIO GAIN 700/94; 379/406.01 (75) Inventors: Wolfgang A. Schildbach, Nuremberg (58) Field of Classification Search (DE); Alexander Groeschel Nurember USPC ...... 704/500, 210, 225, 503: 381/17, 107, (DES s 9. 381/106, 98, 27; 379/406.01: 700/94 See application file for complete search history. (73) Assignee: poly International AB, Amsterdam (56) References Cited (*) Notice: Subject to any disclaimer, the term of this U.S. PATENT DOCUMENTS patent is extended or adjusted under 35 U.S.C. 154(b) by 845 days. 5,821,889 A * top N ...... 341,139 (21) Appl. No.: 13/125,846 (Continued)t (22) PCT Filed: Oct. 26, 2009 FOREIGN PATENT DOCUMENTS (86). PCT No.: PCT/US2O09/062004 JP 2002-151975 5, 2002 JP 4726898 2, 2008 S371 (c)(1), (Continued) (2), (4) Date: Apr. 25, 2011 OTHER PUBLICATIONS (87) PCT Pub. No.: WO2010/053728 Schildbach, W., et al., “Transcoding of dynamic range control coef PCT Pub. Date: May 14, 2010 ficients and other Metadata into MPEG-4 HE AAC’ 123rd Conven (65) Prior Publication Data tion of AES, New York, NY. Paper 7217, Oct. 5-8, 2007, pp. 1-8.

US 2011/0208528A1 Aug. 25, 2011 ssistantrimary facil Examiner — Neerat ShahSharma Related U.S. Application Data (57) ABSTRACT (60) Provisional application No. 61/109.433, filed on Oct. The application describes a method and an apparatus to pre 29, 2008. vent clipping of an audio signal when protection against (51) Int. Cl. signal clipping by received audio metadata is not guaranteed. GIOL 2L/00 (2013.01) The method may be used to prevent clipping for the case of GIOL 9/6 (2013.01) downmixing a multichannel signal to a stereo audio signal. GOL 9/008 (2013.01) According to the method, it is determined whether first gain GO6F 7/OO (2006.01) values (4) based O received audio metadata are sufficient for HO4M 9/08 (2006.01) protection against clipping of the audio signal. The audio HO4R5/OO (2006.01) metadata is embedded in a first audio stream (1). In case a first HO3G 3/00 (200 6,015 gain value (4) is not sufficient for protection, the respective HO3G 7/OO (200 6. 01) first gain value (4) is replaced with again value Sufficient for protection against clipping of the audio signal. Preferably, in (52) {Sci.OO (2006.01) case no metadata related to dynamic range control is present AV e. we in the first audio stream (1), the method may add gain values CPC ...... G10L 19/173 (2013.01); GI o: I s Sufficient for protection against signal clipping. 2013.O1 USPC ...... 7041500; 704/210; 704/503; 704/225; 18 Claims, 4 Drawing Sheets

gair waiie presert? f S: yes

US 8,892.450 B2 Page 2

(56) References Cited 2005/O120870 A1* 6, 2005 Ludwig ...... 84f661 2005, 0147262 A1* 7/2005 Breebaart 381 (106 U.S. PATENT DOCUMENTS 2005. O157883 A1* 7/2005 Herre et al. .. 381.17 2005/0228648 A1 10, 2005 Heikkinen 6,704,704 3, 2004 Newson et al...... 704,225 2010, 0083344 A1 4, 2010 Schildbach 7,305,346 12, 2007 Oyama et al. .. 704,503 2011/OO 13783 A1* 1, 2011 SugaWara ...... 381.98 7.464,029 12, 2008 Visser et al...... TO4/210 7,760,886 T/2010 Hellmuth et al...... 381,27 8,094.809 1, 2012 Whikehartet al. 379,406.01 FOREIGN PATENT DOCUMENTS 8, 116,485 2, 2012 Escott et al...... 381 (107 8,208.659 6, 2012 Chalil ...... 381 (106 RU 2214048 10, 2003 8,290, 181 10, 2012 Stokes et al. ... 381 (107 RU 2323551 4/2008 8,488,811 T/2013 Smithers et al. 381 (107 RU 2325046 5, 2008 8,756,066 6, 2014 Kim et al. ... 704,500 WO 2006084916 8, 2006 2002/0172376 11, 2002 Bizjak ...... 381 (94.3 WO 20081.00098 8, 2008 2004/O138769 T/2004 Akiho ...... TOO/94 2005/O105442 5/2005 Melchior * cited by examiner U.S. Patent US 8,892.450 B2

Ssput

output steaf

FIG. 2 U.S. Patent Nov. 18, 2014 Sheet 2 of 4 US 8,892.450 B2

5

peak value (samples) Canel 1

Chanel 2 7 E:

Charrel 5

FIG. 3

5 re. r --- w wr. . . rar. war w w rrrr r. vir r , s r. v. rar Incoming L1--- gain deterrine min. input 3. gait for out lock

46. determine highest outgoing peak Waiue H 1 X Hi gain eak value of out block f p 6

myo qq s

FIG. 4 U.S. Patent Nov. 18, 2014 Sheet 3 of 4 US 8,892.450 B2

gain Walte resent?

determine DRC incoising Ri . Elsar emir. irsplit & : ERC gairs fos ost block Quiggig

cete; its sighes SnOOthe t peak Nakie for i :lock. eak value :::::::::::::::::::::::::::::::::::::::::::::::::::::: 8 : kieterrine Coirpf Tibiosk (, .5 incoining utgoing compr : di:errie is : cosprairs fe: ) compr 3. 5. FIG. 6

FIG. 7 U.S. Patent Nov. 18, 2014 Sheet 4 of 4 US 8,892.450 B2

S. --

------*:: is rear reverers resis. & | fix. 98 8. i-Fi He 1- 8 ”, i. - sax 2 if g : I 9. & ; 87 92 S3 irrus...m.

FIG. 8

{ jean

risis- Estock triax H- six

US 8,892.450 B2 1. 2 SIGNAL CLIPPING PROTECTION USING The downmix step bears the risk of occasional overload PRE-EXISTING AUDIO GAIN METADATA of the digital Stereo signal, thereby generating undes ired clipping artifacts. Such clipping may occur when CROSS-REFERENCE TO RELATED the amplitude of a downmixed that APPLICATIONS would exceed the maximum (or minimum) represent able value is limited to the maximum (or minimum) This application claims priority to U.S. Patent Provisional representable value. E.g. in case of a simple unsigned Application No. 61/109,433, filed 29 Oct. 2008, hereby incor fixed point binary representation, clipping occurs porated by reference in its entirety. when the computed downmixed amplitude is limited 10 to the maximum value word where all bits correspond FIELD OF THE INVENTION to 1. In case of a signed representation in 16 bit, the maximum value may e.g. correspond to the word The patent application relates to clipping protection of an “O 111111111111111. audio signal using pre-existing audio metadata embedded in As the downmix matrices for the various downmixing 15 schemes are known at the headend, sender or content a steam. In particular, the application relates to generation side, for signals that may result in clipping clipping protection when downmixing a multichannel audio when downmixed, dynamic range control metadata signal to fewer channels. that instructs a receiver to attenuate the signals to-be downmixed prior to mixing can be added to the audio BACKGROUND OF THE INVENTION stream to dynamically prevent clipping. (3) Prevent clipping in case of boosted output: For retrans It is a common concept to embed audio metadata into a mission over dynamically very limited channels (e.g. digital audio stream, e.g. in digital broadcast environments. from a set-top-box via an analog RF link to the RF input Such metadata is “data about data”, i.e. data about the digital of a TV), the signal is boosted, typically by 11 dB, to audio in the stream. The metadata can provide information to 25 achieve a better signal-to-noise-ratio on this path. In an audio decoder about how to reproduce the audio. One type Such applications, for signals that may result in clipping of metadata is dynamic range control information which rep when amplified by 11 dB, dynamic range control meta resents a time-varying gain envelope. Such dynamic range data that instructs a receiver to attenuate signals prior to control metadata can serve multiple purposes: applying the 11 dB amplification can be added to the (1) Control the dynamic range of reproduced audio: Digital 30 audio stream to dynamically prevent clipping. transmission allows for a high dynamic range, but lis From the perspective of the device receiving the audio tening conditions do not always permit taking advantage stream, it is not clear if the incoming dynamic range control of that. Although high dynamic range is desirable in metadata serves the purpose underpoint (1), i.e. control of the quiet living room conditions, it may not be appropriate dynamic range, the purpose under point (2), i.e. downmix for other conditions e.g. for a car because of the 35 clipping protection, or the purposes under both points (1) and high background noise level. To accommodate a wide (2). Often, the metadata accomplishes both tasks, but this is variety of listening conditions, metadata instructing a not always the case, so in some cases the metadata may not receiver how to reduce the dynamic range of the repro include downmix clipping protection. In addition, in case the duced audio can be inserted in the digital audio stream metadata (typically, a different gain parameter is used for RF instead of reducing the dynamic range of the audio prior 40 mode) is associated with the RF mode under point (3), the to transmission. The latter approach is not preferable as metadata may be used to prevent clipping in case of an extra it makes it impossible for a receiver to reproduce the amplification (both in case of downmixing and in case of not audio with full dynamic range. Instead, the former downmixing). approach is preferred as it allows the listener to decide if Moreover, the incoming audio stream may not include dynamic range control shall be applied or not depending 45 dynamic range control metadata at all, due to the fact that for on the listening environment. Such dynamic range con Some audio encoding formats the metadata is optional. trol metadata makes high-quality artistic dynamic range If the dynamic range control metadata is not included with compression of a decoded signal available to listeners at the compressed audio stream or is included but does not their discretion. include downmix clipping protection, undesirable clipping (2) Prevent clipping in case of a downmix operation: When 50 artifacts may be present in the decoded signal if a multi a multichannel signal (e.g. a 5.1-channel audio signal) is channel signal is downmixed into to fewer channels. downmixed, the number of channels is reduced, typi cally to two channels. In case of reproducing a multi SUMMARY OF THE INVENTION channel audio signal comprising more than two chan nels (e.g. a 5.1-channel audio signal having 5 main 55 The present invention describes a method and an apparatus channels and 1 low frequency effect channel) via Stereo to prevent clipping of an audio signal when clipping protec speakers, typically a receiverside downmix operation is tion by audio metadata is not guaranteed. performed, where the multichannel signal is mixed into A first aspect of the application relates to a method of two channels. The mixing operation can be described by providing protection against signal clipping of an audio sig a downmix matrix, e.g. a 2.5 matrix having two rows and 60 nal, e.g. a downmixed digital audio signal, which is derived 5 columns in case of downmixing a 5-channel signal into from digital audio data. According to the method, it is deter a 2-channel (stereo) signal (the low frequency effect mined whether first gain values based on received audio meta channel is typically not considered during downmix) data are sufficient for protection against clipping of the audio Different downmix schemes for mixing the 5 main chan signal. The audio metadata is embedded in a first audio nels of a 5.1-channel signal into two channels are 65 stream. E.g. it is determined whether or not the time-varying known, e.g. Lo/Ro (left only, right only) or Lt/Rt (left gain envelope metadata included with a compressed audio total, right total). stream is sufficient to prevent downmix clipping. In case a US 8,892.450 B2 3 4 first gain value is not sufficient for protection, the respective Smaller or equal to 1, the signal could be amplified and still first gain value is replaced with a gain value Sufficient for would not clip. Nevertheless, the incoming audio stream protection against clipping of the audio signal. Preferably, in requests attenuation, e.g. to fulfill dynamic range limiting case no metadata related to dynamic range control is present purposes, and thus it is preserved. in the first audio stream, the method may add gain values 5 In case the first gain value provides again larger than 1 and Sufficient for protection against signal clipping. E.g. in the the second gain value provides again Smaller or equal to 1, the case where the time-varying gain envelope metadata does not incoming first gain value would violate clipping protection, provide Sufficient downmix clip protection, or is not present at all, the time-varying gain envelope metadata is modified or and so the second gain value is taken. added, so that it does provide sufficient downmix clip protec- 10 In case both the first gain value and the second gain value tion. provide again larger than 1, the input shall be amplified. This The method allows clipping protection, in particular clip amplification is permitted as long as still no clipping happens, ping protection in case of downmix, irrespective whether gain and thus the Smaller of the first gain value and the second gain values sufficient for clipping protection are received or not. value is used. According to the method, received audio gain words (if 15 An alternative approach for determining whether the first provided) may be applied as truthfully as possible but may be gain values are sufficient for protection is to apply the first overridden when the incoming gain words do not provide gain values to audio data and to determine whether the result enough attenuation to prevent clipping, e.g. in a downmix. ing digital audio signal (e.g. the downmixed signal) clips. AS dynamic range control data serving the purpose under In case the first gain values are not sufficient for protection, point (1) bears artistic aspects, it is typically not in the duty of 20 one may iteratively determine gain values which are sufficient the receiving device (e.g. a set-top-box) to introduce this in for clipping protection starting from the first gain values as case the incoming metadata does not provide it. Properties as initial gain values. E.g., one may determine whether the audio of (2) though can and therefore should be provided by the signal clips with a gain value which is the closest gain value receiving instance. This means that the receiving device shall Smaller than the first gain value according to the resolution of try to preserve dynamic range control data intended for 25 the gain values (e.g. in case the first gain value is 0.8 and the dynamic range control under point (1) as much as possible gain value resolution is 0.1, the closest Smaller gain value while at the same time adding clipping protection. would be 0.7). If the signal still clips, one may determine There are various ways to determine whether first gain whether the audio signal clips with the next smaller gain value values based on received audio metadata are sufficient for (e.g. again value of 0.6). This is repeated until again value is protection against signal clipping. 30 found which does not result in signal clipping. According to a preferred approach, second gain values are Preferably, the method is performed as part of a transcod computed based on the digital audio data, where the second ing process, where the first audio stream in a first audio gain values are Sufficient for clipping protection of the audio coding format (e.g. the AAC format or the High Efficiency signal. The second gain values may be the maximum allow AAC (HE-AAC) format, also known as aacPlus) is able gain values which do not result in clipping. 35 transcoded into a second audio stream coded in a second Preferably, the method determines whether the first gain audio coding format (e.g. the format or the values are sufficient in Such a way that it compares the first Dolby Digital Plus format). The second audio stream com gain values based on the received audio metadata and the prises the replaced gain values Sufficient for clipping or has computed second gain values. The method may compare one gain values derived therefrom. first value associated with a segment of the audio data with the 40 Often audio transcoding is necessary, since the digital com respective second gain value associated with the same seg pression format for carrying the audio data cannot be kept ment of audio data. throughout the whole transmission chain until the final audio In dependency thereon, a clipping protection compliant decoderin the transmission chain (e.g. until the decoder of the stream of gain values may be generated from the first and AVR—audio/video receiver). In case of broadcast, this is second gain values. Preferably, Such gain values are selected 45 because, e.g., different coding schemes may be used for the from the first gain values and the computed second gain over-the-air broadcast (or broadcast to the consumer via values in dependency on the comparison operations. By cable) and the transmission of the audio between the receiv selecting a second computed gain value instead of the first ing device (e.g. a set-top-box—STB) and the final decoder in gain value, the first gain value is replaced with the selected the transmission chain (e.g. the decoder in the AVR or the second gain value. 50 audio decoder in the TV set). E.g., the audio data may be Preferably, the minimum of a pair of first and second gain broadcast over-the-air via the AAC format or the HE-AAC values is selected. If the first gain value is larger than the format, and then the audio data may be transcoded into the computed second gain value Sufficient for protection, this Dolby Digital format or the Dolby Digital Plus format for indicates that there is a risk that the first gain value is not transmission from the STB to the AVR. In consequence, a sufficient for clipping protection and thus should be replaced 55 transcoding step may be performed, e.g. in the STB, to get with the respective second gain value. Otherwise, if the first from one format to the other. Such transcoding step comprises gain value is Smaller than the computed second gain value the transcoding of the audio data itself, but ideally also sufficient for protection, this indicates that there is no risk of transcoding of the accompanying metadata as well, in par signal clipping and the first gain value should be preserved. ticular the dynamic range control data. According to a pre The selection of gain values from the first and second gain 60 ferred embodiment, the method provides transcoded audio values may be carried out as explained below: gain metadata in the second audio stream, with the gain In case both the first gain value and the second gain value metadata sufficient for protection against signal clipping. provide again Smaller or equal to 1, the minimum of both is The method may be very useful in any device that taken. This means that either the first gain value already transcodes a signal from one compressed audio stream format guarantees clipping protection, or if not, it will be replaced by 65 to another, where it is not known ahead of time whether the the second gain value. In case the gain of the second gain time-varying gain control metadata, if any, carried by the first value is larger than 1 and the first gain value provides again format includes downmix clipping protection (e.g. in an US 8,892.450 B2 5 6 AAC/HE-AAC to Dolby Digital transcoder, a Dolby E to outgoing data segment is determined According to a different AAC/HE-AAC transcoder, or a Dolby Digital to AAC/HE example, first each maximum for each downmixed channel AAC transcoder). signal is computed per outgoing data segment (downsam Preferably, for determining whether the first gain values pling) and then the peak value of these maxima is determined. are sufficient for protection, the digital audio data is down Based on the determined maximum, a gain value may be mixed according to at least one downmixing scheme, e.g. computed by inverting the determined maximum. If 1 is the according to a Lt/Rt downmixing scheme. The downmixing maximum signal value which can be represented, inverting results in one or more signals, e.g. in one signal associated the determined maximum directly yields again factor. When with the right channel and one signal associated with the left the gain factor is applied to the maximum of the (filtered) channel. In addition, a plurality of downmixing schemes may 10 peak values, the resulting value equals 1, i.e. the maximum be considered and the digital audio data is downmixed signal value. This means that each audio sample to which the according to more than one downmixing scheme. gain is applied is kept below 1 or equals 1, thus avoiding Preferably, an actual peak value of various signals derived clipping for this data segment. In case 1 is the maximum from the audio signal is continuously determined, i.e. at a signal level, 1 corresponds to 0 dBFS decibels relative to given time it is determined which of the various signals has 15 full scale; generally 0 dBFS is assigned to the maximum the highest signal value. For computing a peak value, the possible level. method may determine the maximum of the absolute values Instead of simply inverting the determined maximum, a of two or more signals at a given time. The two or more gain value may be computed by dividing a maximum signal signals may include one or more signals after downmixing value (which corresponds to 0 dBFS) by the determined according to a first downmixing scheme, e.g. the absolute maximum associated with a data segment. However, the com value of a sample of the downmixed right channel signal and putational costs are higher compared to a simple inversion. the absolute value of a simultaneous sample of the down In case of transcoding, the data segment (e.g. block or mixed left channel signal. In addition, for computing the peak frame) lengths are often different for the first audio coding value, the method may also consider the absolute value of one format (format of input stream) and the second audio coding or more signals after downmixing according to a second (and 25 format (format of output stream). E.g. in AAC a block typi even third) downmixing scheme. Moreover, the peak value cally contains 128 samples (in HE-AAC: 256 samples per determination may consider the absolute value of one or more block), whereas in Dolby Digital a block typically contains audio signals before downmixing, e.g. the absolute value of 256 samples. Thus, the number of samples per block each of the 5 main channels of a 5.1-channel signal at the increases when transcoding from AAC to Dolby Digital. In same time. It should be noted that in case of transcoding it is 30 AAC a frame comprises typically 1024 samples (in HE-AAC: typically not known whether the multichannel signal is later 2048 samples per frame), wherein in Dolby Digital a frame played back over discrete channels or if downmixing accord typically comprises 1536 samples (6 blocks). Thus, the num ing to a downmixing scheme is performed. ber of samples per frame also increases when transcoding A peak value corresponds to the maximum of these simul from AAC to Dolby Digital. The granularity of the dynamic taneous signal sample values, thereby indicating the maxi 35 range control data is mostly either the block size or the frame mum amplitude the signal can have for all possible cases at a size. E.g. the granularity of the dynamic range control meta particular time instance, and this is the worst case the clipping data “DRC in MPEG for the HE-AAC stream and of the gain protection algorithm should take into account. metadata “dynrng' in Dolby Digital is the block size. In The dynamic range control data is typically time-varying in contrast, the granularity of the gain metadata “compr” in a certain granularity that generally relates to the length of the 40 Dolby Digital and of the gain metadata "heavy compression' data segment (e.g. block) of the respective audio coding for in DVB (digital video ) for the HE-AAC stream mat or integer parts of it. Thus, also a second gain value is is the frame size. preferably computed per data segment. In addition, the sampling rates may be different for the Therefore, the sampling rate of the peak values or consecu input stream (e.g. 32 KHZ, or 44.1 KHZ) and the output stream tive peak values is preferably reduced (downsampling). This 45 (e.g. 48 KHZ), i.e. the audio is resampled. This also alters the may be done by determining the maximum of a plurality of length relations between the incoming data segments and the consecutive peak values or consecutive filtered peak values. outgoing data segments. Moreover, the incoming and outgo In particular, the method may determine the maximum of a ing data segments may not be aligned. In addition, it should be plurality of consecutive (filtered) peak values associated with noted that metadata transmitted in an input data segment (e.g. a data segment, e.g. a block or frame. In case of transcoding, 50 block or frame) has an area of dynamic range control impact the method may determine the highest peak value of a plural (i.e. a range in the stream where the application of the gain ity of consecutive (filtered) peak values associated with a data value has effect) that is often not exactly as large as the data segment of the second (outgoing) data stream. It should be segment but larger. This is due to the overlap-add character noted that preferably not only the consecutive peak values istics of the used transform and to the fact that the dynamic based on signal samples in an outgoing segment are consid 55 range control is often applied in the spectral domain. The ered for determining the maximum but also additional (prior same often holds true for the dynamic range control data of and later) peak values which would influence the decoding of the outgoing audio stream. Therefore, for determining which the data segment, i.e. peak values which relate to signal input gain values influence a given output data segment one samples at the beginning and end of a decoding window. may look at the overlap of input and output impact lengths These peak values are also associated with the data segment. 60 (instead of considering the overlap of the input and the output Instead of choosing the highest peak value, one may com data segments) as will be explained in detail later on. pute a different value per data segment for reducing the sam Due to the reasons discussed above, transcoding of the pling rate. dynamic range control data should take into account that an It should be noted that samples derived from the audio data outgoing dynamic range control value may be influenced by other than peak values may be downsampled. E.g. the audio 65 more than one incoming dynamic range control value. In this data may be downmixed to a single channel (mono) and only case, a resampling (reframing) of the dynamic range control the maximum of the downmixed consecutive samples per data may be performed when transcoding the data stream. US 8,892.450 B2 7 8 Therefore, the method may comprise the step of resam stream may be read from a digital data storage medium, e.g. pling gain values derived from the received audio metadata of a DVD (Digital Versatile Disc) or a Blu-ray disc. the first audio stream. When a data segment of the first audio The above remarks related to the first and second aspects of stream covers a shorter length of time than a data segment of the application are also applicable to the third aspect of the the second audio stream, the gain values are downsampled. 5 application. A resampled gain value may be determined by computing the minimum of a plurality of consecutive gain values. In BRIEF DESCRIPTION OF THE DRAWINGS other words: from a number of input dynamic range control gains (which are relevant for an outgoing data segment), the The invention is explained below in an exemplary manner smallest one is chosen. The motivation for this is to preserve 10 with reference to the accompanying drawings, wherein FIG. 1 illustrates an embodiment of a transcoder providing the incoming values as much as possible (in case the values do clipping protection; not result in signal clipping). However, this often is not pos FIG. 2 illustrates a preferred approach for reframing of sible since the gain values have to be resampled. Therefore, metadata; the Smallest gain value is chosen, which tends to reduce the 15 FIG. 3 illustrates an embodiment for determining peak signal amplitude. However, this reduction of the signal ampli values based on received audio data; tude is regarded as less noticeable or annoying. Preferably, FIG. 4 illustrates an embodiment for merging incoming Such minimum is determined per output data segment. dynamic range control data with computed gain values suffi In case no gain metadata related to dynamic range control cient for clipping protection; is present in the first audio stream, the method preferably adds FIG. 5 illustrates the selection of the outgoing gain values: gain values Sufficient for protection against clipping in the FIG. 6 illustrates an alternative embodiment for merging second audio stream (outgoing stream). These gain values incoming dynamic range control data with computed gain should be preferably limited so that they do not exceed again values sufficient for clipping protection; of 1. The reason for preventing the gain values from exceed FIG. 7 illustrates an embodiment of a smoothing filter ing 1 is that the signal should not be unnecessarily amplified 25 Stage, to get close to the clipping border. FIG. 8 illustrates another embodiment for providing clip Thus, in case a respective computed second gain value has ping protection; again below 1, the respective added gain value corresponds to FIG. 9 illustrates still another embodiment for providing the computed second gain value. In case a respective com clipping protection; and puted second gain value is above 1, the respective added gain 30 FIG. 10 illustrates a receiving device receiving the value is set to again of 1. transcoded audio stream. A second aspect of the application relates to an apparatus for providing protection against signal clipping of an audio DETAILED DESCRIPTION signal derived from digital audio data. The apparatus is con figured to carry out the method as discussed above. The 35 AAC/HE-AAC and Dolby Digital/Dolby Digital Plus Sup features of the apparatus correspond to the features of the port the concept of metadata, more specifically gain words method as discussed above. Accordingly, the apparatus com that carry a time varying gain to be optionally applied to the prises means for determining whether first gain values based audio data upon decoding. For the purpose of reducing the on received audio metadata are sufficient for protection data, these gain words are typically only sent once per data against clipping of the audio signal. Further, the apparatus 40 segment, e.g. per block or frame. In said audio formats these comprises means for replacing a first gain value with a gain gain words are optional, i.e. it is technically possible to not value Sufficient for protection against clipping of the audio send the data. Dolby Digital and Dolby Digital Plus encoders signal in case the first gain value is not sufficient. typically send the gain words, whereas AAC and HE-AAC Preferably, the determining means comprise means for encoders often do not send the gain words. However, the computing second gain values based on the digital audio data, 45 numbers of AAC and HE-AAC encoders which send the gain where the second gain values are sufficient for clipping pro words is increasing. The application allows decoders or tection of the audio signal. More preferably, the determining transcoders receiving an audio stream to do “the right thing means also comprise comparing means for comparing the in both situations. If audio gain words are provided, “the right first gain values based on the received audio metadata and the thing” would be to process the received audio gain words as computed second gain values. In dependency thereon, gain 50 truthfully as possible, but override them when the incoming values are selected from the first gain values and the com gain words do not provide enough attenuation to prevent puted second gain values. signal clipping, e.g. in case of a downmix. If no gain values The above remarks related to the first aspect of the appli are provided, “the right thing would be to calculate and cation are also applicable to the second aspect of the applica provide gain values which prevent signal clipping. tion. 55 FIG. 1 shows an embodiment of a transcoder, with the A third aspect of the application relates to a transcoder, transcoder providing protection against signal clipping, in where the transcoder is configured to transcode an audio particular protection against clipping in case of downmixing stream from a first audio coding format into a second audio (e.g. downmixing from a 5.1-channel signal to a 2-channel coding format. The transcoder comprises the apparatus signal). The transcoder receives a digital audio stream 1 com according to the second aspect of the application. Preferably, 60 prising audio metadata. E.g., the digital audio stream is an the transcoder is part of a receiving device receiving the first AAC or HE-AAC (HE-AAC version 1 or HE-AAC version 2) audio stream, where the first audio stream is a digital broad digital audio stream. The digital audio stream may be part of cast signal, e.g. an audio stream of a digital signal a DVB video/audio stream, e.g. a DVB-T, DVB-S or DVB-C (e.g. DVB-T, DVB-S, DVB-C) or a signal (e.g. a stream. The transcoder transcodes the received audio stream DAB signal). E.g. the receiving device is a set-top-box. The 65 1 into an output audio stream 14 which is encoded in a audio stream may be also broadcast via the (e.g. different format, e.g. Dolby Digital or Dolby Digital Plus. Internet TV or ). Alternatively, the first audio Typically, Dolby Digital decoders Support downmixing of US 8,892.450 B2 9 10 multichannel signals and assume that the time-varying gain data segment of the outgoing audio stream are not aligned envelopes included in received Dolby Digital metadata (even if the data segment lengths are identical). Thus, a map include downmix clip protection. Unfortunately, bit stream 1 ping fromingoing metadata to outgoing metadata is typically (e.g. an AAC/HE-AAC bitstream) does not necessarily con necessary. taintime-varying gain envelope metadata, and even in case of 5 FIG. 2 illustrates a preferred approach for mapping incom carrying Such data it is not clear whether the data includes ing metadata to outgoing metadata. As discussed earlier, typi clipping protection. The transcoder prevents a decoder (e.g. a cally each data segment (e.g. block or frame) has one gain Dolby Digital decoder) in a receiving device (downstream of value of dynamic range control data (or a plurality of gain the transcoder) from producing output signals that contain values, e.g. 8 gain values). However, metadata transmitted clipping artifacts when downmixing the signal. The 10 alongside an input data segment (e.g. block or frame) has an transcoder ensures that output audio stream 14 contains time area of dynamic range control impact (i.e. a range in the varying gain envelope metadata including downmix clipping stream where the application of the gain value has effect) that protection. is often not exactly as large as the data segment but larger. In FIG. 1, unit 2 reads out dynamic range control gain This is due to the overlap-add characteristics of the used values 3 contained in the audio metadata of audio stream 1. 15 transform (i.e. windows are used which are larger than the Optionally, gain values 3 are further processed in unit 5, e.g. data segment and the windows overlap) and to the fact that the the gain values 3 are resampled and transcoded according to dynamic range control is often applied in the spectral domain. the data segment timing of the transcoded output audio stream The same often holds true for the dynamic range control data 14. The resampling and transcoding of metadata gain values of the outgoing audio bit stream. In FIG. 2 the solid lines mark is discussed in the document “Transcoding of dynamic range the beginning and the end of a data segment 20-23 in the input control coefficients and other metadata into MPEG-4 HE stream, and the beginning and end of a data segment 24-26 in AAC, Wolfgang Schildbach et al., Audio Engineering Soci the output stream. In FIG. 2 each area of dynamic range ety Convention Paper, presented at the 123" Convention Oct. control impact 30-33 and 34-36 of a gain value extends 5-8, 2007, New York. The disclosure of this paper, in particu beyond the end and the beginning of the respective data lar the concepts for resampling and transcoding of metadata 25 segment. Each area of impact30-33 and 34-36 is indicated by gain values, is hereby incorporated by reference. In addition, the dashed-dotted lines. on Sep. 30, 2008 the Applicant filed U.S. provisional appli E.g. in HE-AAC, the block size is 256 samples, whereas a cation 61/101,497 having the title “Transcoding of Audio window for decoding has 512 samples. The whole window of Metadata”, with the US provisional application relating to 512 samples may be regarded as an area of impact; however, resampling and transcoding of metadata gain values. The 30 the impact of the gain value at the outer edges of the windows disclosure of this application, in particular the concepts for is smaller compared to impact at the middle of the window. resampling and transcoding of metadata gain values, is Thus, the area of impact may be also regarded as a portion of hereby incorporated by reference. the window. The area of impact may be a number of samples In parallel to resampling, audio data in audio stream 1 is selected from the block/frame size (here: 256 samples) up to decoded by a decoder 6, typically to PCM (pulse code modu 35 the window size (here: 512 samples). Preferably, the used lation) audio data. The decoded audio data 7 comprises a area of impact is larger than the size of the data segment plurality of parallel signal channels, e.g. 6 signal channels in (block or frame). case of a 5.1-channel signal, or 8 signal channels in case of a For determining which input dynamic range control values 7.1-channel signal. influence a given output data segment, it is preferred to look A computing unit 8 determines computed gain values 9 40 at the overlap of input and output impact areas (instead of based on audio data 7. The computed gain values 9 are suffi looking at the overlap of the input and the output data seg cient for protection against signal clipping in a receiving ments). In FIG. 2, it is determined which areas of impact device downstream of the transcoder which receives the 30-33 in the input stream overlap with an area of impact34-36 transcoded audio stream, in particular when downmixing the of a given output data segment 24-26. E.g., the area of impact signal in the receiving device. Such device may be an AVR or 45 34 of data segment 24 in the output stream overlaps with the a TV set. The computed gain values should guarantee that the areas 30, 31, 32 and 33. Therefore, preferably, gain values downmixed signal maximally reaches 0 dBFS or less. Gain associated with four data segments 20, 21, 22 and 23 are values 4 derived from the metadata in audio stream 1 and considered when determining the gain value of the first data computed gain values 9 are compared to each other in unit 10. segment 24 in the illustrated output stream. The first data Unit 10 outputs gain values 11, where a gain value of gain 50 segment 24 is influenced by the 4 input data segments 20-23. value stream 4 is replaced by a gain value derived from gain Alternatively, the method may look at the overlap of the input value stream 9 in case the respective gain value of gain value impact areas and the output signal segment, or at the overlap stream 4 is not sufficient to prevent signal clipping in the of the input data segments and the output data segment. receiving device. In parallel, audio data 7 is encoded by Such mapping or resampling process may be carried out in encoder 12 to an output audio encoding format, e.g. to Dolby 55 unit 5 of FIG. 1, which receives gain values 3 of the input Digital or Dolby Digital Plus. The encoded audio data and steam 1 and maps one or more of the gain values 3 to again gain values 11 are combined in unit 13. The resulting audio value 4. stream provides audio gain metadata which prevents signal FIG.3 illustrates an embodiment of block 50 for determin clipping, in particular for the case of signal downmix. ing peak values based on received audio data. Such peak Generally, ingoing audio gain metadata should be pre 60 determining block 50 may be part of block 8 in FIG.1. Based served as much as possible as long as the gain metadata on the decoded multichannel audio data 7 comprising a plu provides protection against signal clipping. In most cases, the rality of channels (here 5 channels of a 5.1-channel signal, the length of a data segment (e.g. block or frame) of the input low frequency effect channel is not considered), downmixing audio stream (see 1 in FIG.1) and the length of a data segment is performed according to one or more downmix schemes (i.e. (e.g. block or frame) of the output audio stream (see 14 in 65 according to one or more downmixing matrices). It should be FIG. 1) are different. Moreover, typically the beginning of a noted that the transcoder does not know whether downmixing data segment of the input audio stream and the beginning of a is performed in the receiving device at all and which down US 8,892.450 B2 11 12 mixing scheme is then used in the receiving device. Thus, it is that guarantees that each audio sample of the data segment unknown if a multichannel signal is played back over discrete (e.g. block) is below or equal to the maximum signal level 1 channels or if downmixing according to one of several (corresponding to 0 dBFS) when the gain is applied to the schemes is performed. The transcoder simulates all cases and respective audio sample. This avoids clipping for this data determines the worst case. segment. It should be noted that the maximum signal level In the example in FIG. 3, downmixing according to the means the maximum signal level of a signal in the receiver of Lo/Ro downmixing scheme is performed in block 41, down the transcoded audio stream; thus, at the output of block 60 mixing according to the Pro Logic (PL) downmixing scheme the amplitude may be higher than 1 (when C<1). is performed in block 42, and downmixing according to the The computed gain C is the maximum allowable gain that Pro Logic II (PL II) downmixing scheme is performed in 10 prevents clipping; a smaller gain value than the computed block 43. The PL downmixing scheme and the PL II down gain C may be also used (in this case the resulting signal is mixing scheme are two variants of the Lt/Rt downmixing even smaller). It should be noted that in case the gain C is scheme as discussed before. Each downmixing scheme out below 1, the gain C (or a smaller gain) has to be applied, puts a right channel signal and a left channel signal. Then, the otherwise the signal would clip at least in the worst-case absolute values of the signals after downmixing are computed 15 scenario. (see blocks 44 in FIG.3). Preferably, also the absolute sample In block 5, the incoming gain values 3 from the metadata values of the various channels of the multichannel audio undergo a resampling as well. From a number of incoming signal 7 are computed (see blocks 40 for determining the gains relevant for an output data segment, the Smallest gain is absolute values). Considering also the absolute values of the chosen and used for further processing. Preferably, the resa channels (without downmixing) is helpful to prevent signal mpling is performed as discussed in connection with FIG. 2: clipping in other cases than downmixing, e.g. in case the For determining which incoming gain values are relevant for signal is later amplified by an extra gain (e.g. 11 dB gain in an output data segment, the overlap of the input and output case of the RF mode as discussed later on). impact areas is considered. If the impact area of an incoming The maximum (peak value) of the absolute values at a data segment overlaps with the impact area of a given output time is computed in block 45. Computing the maximum is 25 data segment, the incoming data segment is considered (and continuously performed, thereby generating a stream of peak thus its gain value) when determining the Smallest gain value. values 46. It may be possible that the various samples have Instead, also the two alternative approaches as discussed in different signal delay due to different signal processing. Such connection with FIG.2 may be used. different signal delays may be aligned (not shown). The maxi The motivation for this is to preserve the incoming values. mum of the sample values indicates the maximum amplitude 30 However, this is not possible since the gain values have to be a signal can have for all cases, and so this is the worst case the resampled according to the timing of the output stream. Using clipping protection algorithm takes into account. The the Smallest gain value from a plurality of consecutive gain transcoder thus simulates the worst-case amplitude of the values tends to reduce the signal amplitude which is regarded signal in the receiving device at a time. A dynamic range in tendency as less noticeable or annoying. control value that achieves protection against clipping should 35 In case relevant dynamic range control data is present in the attenuate (or amplify) the signal in a fashion that it reaches 0 incoming data stream 1, a comparison between this gain dBFS maximally. (preferably after resampling in block 5) and the computed It should be noted that block 50 may determine a peak gain values 9 sufficient for clipping protection is done in value based on fewer absolute values than illustrated in FIG. block 10. Block 62 determines the minimum between a resa 3 (e.g. without considering the absolute values of the non 40 mpled gain value 4 and a computed gain value 9, with the downmixed channels) or based on additional absolute values Smaller gain value being used as the outgoing gain value not shown in FIG. 3 (e.g. absolute values of other downmix (block 62 forms a minimum selector). ing schemes). Alternatively, it is possible to downmix the In case no incoming gain values are present, Switch 63 in channels 7 without determining a peak value: E.g., the two FIG. 4 will switch to the upper position, with block 62 then resulting channels may be combined and the combined signal 45 determining the minimum between again of 1 and the com is processed further (instead of using peak values 46 as out puted gain value, with the Smallergain value being used as the putted by block 45). outgoing gain value. Thus, in case no incoming gain value is The further processing of peak values 46 is indicated in present, the outgoing gain value is limited to a maximum gain FIG. 4. Figurative elements in FIGS. 1 and 4 denoted by the of 1. same reference signs are basically the same. Peak values 46 50 The following table illustrates the operation of comparison undergo a step of blocking and maximum building in unit 60. block 10. Here, the term “I” denotes the incoming dynamic Here, the highest peak value is determined for a given output range control gain 4 (after resampling), and the term 'C' data segment (e.g. a block). In other words: the peak values are downsampled by selecting the highest peak value (which denotes the computed gain 9. is the most critical one) for an output data segment from a 55 plurality of peak values. It should be noted that preferably not only consecutive peak values corresponding to the signal Is 1 I> 1 I not present samples in an output segment are considered for determining Cs 1 min(I, C) min(I, C) = C C the maximum. Rather, also additional (prior and later) peak C > 1 min(I, C) = I min(I, C) 1 values which would influence a given data segment are con 60 sidered, i.e. peak values which relate to signal samples at the In case both I and C are smaller or equal to 1, the minimum beginning and end of a decoding window. Preferably, all is taken. This means that either I already guarantees clip samples of the window are considered. protection, or if not, it will be replaced by C. The result of this sampling is inverted in block 61 accord In case C-1 and Is 1, the signal could be amplified and still ing to the formula C=1/X, where C refers to a computed gain 65 would not clip. The incoming stream though requests attenu value 9 and X refers to the respective highest peak for the ation, e.g. to fulfill dynamic range limiting purposes, and thus block of the output stream 14. The result C is a factor (gain) I is preserved (I is the minimum of I and C in this case). US 8,892.450 B2 13 14 In case ID1 and Cs1, the incoming value would violate PRL indicates a reference loudness of the audio content (e.g. clipping protection, and So C is taken (C is the minimum of I in HE-AAC, the PRL can vary between 0 dB and -31.75 dB). and C in this case). Application of the PRL lowers the loudness of the audio to a In case both I and C are larger than 1, the input shall be defined target reference level. In dependency of the audio amplified. This amplification is permitted as long as still no encoding format other terms for the reference are common, clipping happens, and thus the Smaller of I and C is used. e.g. dialogue level, dialogue normalization or dialnorm. In case no incoming dynamic range value is present, clip In FIG. 6 the highest peak value for a data block (as gen ping protection is ensured by using C as long as Cs1. In case erated by unit 60) is level adjusted in unit 70 independency on C>1, the signal shall not be modified (i.e. the signal should the received PRL (normally, the level is reduced by the PRL). not be unnecessarily amplified to get close to the clipping 10 For computing gain values associated with the line mode, the border). So unity is taken as the outgoing gain. In both cases level adjusted samples are inverted in block 61, thereby gen when no incoming gain values are present, the minimum of 1 erating computed gain values which guarantee that each and C is used (instead of the minimum between I and C). audio sample of the block is below or equal to the maximum FIG. 5 illustrates the selection of the outgoing gain values signal level 1 in case the audio signal is adjusted in the 11 in form of a flowchart. It is determined whether a gain 15 receiver by the PRL. The resampling of the incoming DRC value I is present (see reference 130 in FIG. 5). If a gain value data 3 in block 5, and the comparison of the resampled gain I is currently present, the outgoing gain value depends on the values 4 and the computed gain values are identical to FIG. 4. values of the incoming gain value I and the computed gain For computing gain values associated with the RF mode, value C. If Island Cs1, the selected gain value corresponds the level adjusted samples are amplified by 11 dB in block 71 to the minimum of I and C (see reference 131). If Island C-1, since in the receiver the signal is also amplified by 11 dB in the selected gain value corresponds to I (see reference 132). If case of using the RF mode. The transcoder thus simulates the D1 and Cs1, the selected gain value corresponds to C (see worst-case amplitude of the signal in the receiving device. reference 133). If D1 and C-1, the selected gain value cor The boosted samples are inverted in block 61", thereby gen responds to the minimum of I and C (see reference 134). It erating computed gain values for the RF mode which guar should be noted that in all these four cases, the outgoing value 25 antee that each audio sample of the block is below or equal to still corresponds to the minimum of I and C. Thus, it is not 1 (maximum signal amplitude) in case the audio signal is necessary to determine whether I and C are sl or not. adjusted in the receiver by the PRL and boosted by 11 dB. If no gain value I is currently present, the outgoing gain The embodiment in FIG. 6 is preferably used for a value depends on the value of the computed gain value C. If transcoder outputting a Dolby Digital audio stream (e.g. an Cs1, the outgoing gain value corresponds to C (see reference 30 HE-AAC to Dolby Digital transcoder or an AAC to Dolby 135). If C1, the outgoing gain value corresponds to 1 (see Digital transcoder). According to Dolby Digital, in the line reference 136). It should be noted that in both cases, the mode each coding block has a “DRC (dynamic range con outgoing value still corresponds to the minimum of 1 and C. trol) gain value, whereas in the RF mode each frame (which Thus, it is not necessary to determine whether C is s1 or not. comprises 6 blocks) has a “compr” gain value. Nevertheless, The embodiment as discussed above achieves that incom 35 both types of gain values relate to dynamic range control. The ing dynamics are preserved and only in case clipping would computed gain value for the RF mode is downsampled from occur, the dynamics are modified to prevent clipping. In case the block rate to the frame rate in block 73. Block 73 deter no dynamic range control values are present, Sufficient mines the minimum of the computed gain values for a total dynamic range control values are added to the stream to number of 6 consecutive blocks, with each minimum prevent clipping. The Switching between the modes works 40 assigned to the computed gain value 72 for the whole frame. instantaneously and Smoothly, thereby mitigating any arti Theresampling of the incoming comprgain values 3' in block facts. 5' differs from the resampling in block 5 in such away that the FIG. 6 illustrates an alternative to the embodiment in FIG. minimum for an output frame is determined. The comparison 4. Figurative elements in FIGS. 4 and 6 denoted by the same of the resampled gain values 4' and the computed frame-based reference signs are basically the same. In FIG. 6, separate 45 gain values 72 is the same as discussed before. gain metadata for two different modes, the line mode and the The embodiment in FIG. 6 provides protection not only RF mode, are received and transcoded. In the embodiment in against clipping in case of downmixing, but also against FIG. 6 different gain words for the RF mode and the line mode signal clipping when applying an extra gain of 11 dB in the RF are computed because they use two different types of meta mode (otherwise the 11 dB boosted signal may clip even data. The line mode metadata covers a smaller range of values 50 when not using signal downmixing). Therefore, it is advan and is sent more often (typically once per block), whereas the tageous to consider in block 50 also the absolute values of the RF mode metadata covers a larger range of values and is sent channels without downmix. less often (typically once per frame). In the RF mode the It should be noted that in case no PRL is received, prefer signal is boosted by an extra gain of 11 dB, which allows a ably, the PRL is set to a default value. higher signal-to-noise ratio when transmitting the signal over 55 For computing gain values, a Smoothing stage may be used. a dynamically very limited channel (e.g. from a set-top-box to FIG. 7 shows an embodiment of a smoothing stage 80 which the RF input of a TV via an analog RF link). More may be placed anywhere in the path between the output of over, since the RF mode gain metadata covers a wider range block 50 and the input of blocks 61 and 61'. Preferably, of values than the gain metadata of the line mode, the RF smoothing stage 80 is placed at the output of block 50. mode allows higher dynamic range compression. The gain 60 thereby generating Smoothed peak values 46' based on the metadata for the line mode is denoted as “DRC (see refer peak values 46. Smoothing stage 80 implements a low pass ence sign 3), whereas the gain metadata for the RF mode is filter for the input signal of the Smoothing stage, e.g. the peak denoted as “compr” (see reference sign 3). Please note that in value signal. Its purpose is to improve the audible impression DVB the gain metadata for the RF mode is denoted as “com after the clipping protection kicks in: an immediate release of pression” or “heavy compression'. Moreover, the embodi 65 a ducking gain after a period of clipping protection will sound ment in FIG. 6 also considers a program reference level annoying. Thus, as is widely done in limiterimplementations, (PRL), which may be transmitted as part of the metadata. The the peak value signal (and by that the derived gain signal; see US 8,892.450 B2 15 16 below) is filtered with a 1 order low-pass filter, which pref minimum selector 97. The gain values 98 at the output of erably operates at a time constant t of 200 m.sec. In case a new minimum selector 97 provide downmix protection and may input value demands clipping protection to a higher degree be embedded in a transcoded audio stream as discussed than the Smoothed signal would achieve (since the new input before. value is higher than the Smoothed signal), it bypasses the It should be noted that the embodiment in FIG. 8 is not Smoothing stage and gets into effect immediately. In this case necessarily part of an audio transcoder. The output gain val the upper input is larger than the lower input of the maximum ues may be directly used for adjusting the level of the received computing block 81 in FIG. 7. audio stream. In this case the apparatus of FIG.8 may be part Preferably, the embodiment in FIGS. 3-7 are part of an of an AVR or TV set. audio transcoder, e.g. from AAC and/or HE-AAC to Dolby 10 Moreover, the embodiment in FIG.8 may be used to pre Digital, or from Dolby E or Dolby Digital to AAC and/or vent signal clipping without considering downmixing. E.g. HE-AAC. However, it should be noted that the embodiments the embodiment in FIG. 8 may receive conventional PCM in FIGS. 3-7 are not necessarily part of an audio transcoder. audio samples 91 without further pre-processing in block 50. These embodiments may be part of the device receiving the In this case the embodiment in FIG. 8 prevents clipping when incoming audio stream 1 and applying the modified gain 15 PCM samples 91 are amplified by the output gain values. values (without transcoding). The modified gain values may FIG. 9 illustrates another alternative embodiment. Figura be directly used for adjusting the gain of the received audio tive elements in FIGS. 8 and 9 denoted by the same reference stream. E.g., the embodiments in FIGS. 3-7 may be part of an signs are basically the same. In contrast to the embodiment in AVR or a TV Set. FIG. 8, the embodiment in FIG. 9 is a block-wise operating FIG. 8 illustrates an alternative embodiment for providing version like the embodiments in FIGS. 4 and 6, where only downmix protection. The apparatus receives incoming gain one division is performed per signal block (or any other data words 90 contained in or derived from audio metadata. to segment like frame). This reduces the number of divisions per Gain words 90 may correspond to the gain values 3 or 4 in time. As discussed already in connection with FIG. 8, audio FIGS. 1 and 4. Further, the apparatus receives audio samples samples 91 may be generated by block 50 of FIG. 3. If the 91 (e.g. PCM audio samples). E.g., the audio samples 91 may 25 audio samples 91 are not absolute values, the absolute values be peak values as generated by block 50 in FIG.3. If the audio of the audio samples 91 may be determined before (not shown samples 91 are not absolute values, the absolute value of the in FIG.9). The audio samples 91 are then fed to a smoothing audio samples 91 may be determined before. In block 92 filter stage 80 which corresponds to smoothing filter stage 80 maximum allowed gain values gain (t) are computed by a in FIG. 7. In contrast to FIG. 8, smoothing filter stage 80 division according to the following equation: 30 processes audio samples instead of gain samples. Thus, Smoothing filter stage 80 uses a maximum selector 81 instead of a minimum selector 95. After smoothing, the maximum of signal.nasallowed the samples per audio block is determined in unit 100. Then, gain, (t) = signal(t) the maximum value is inverted in block 101, thereby com 35 puting the maximum allowable gain per block. This gain value is compared to the current gain value 90 in minimum Here, the term signal, denotes the maximum selector 97, with the minimum of both values being passed to allowed signal amplitude, e.g. signal?-1. The term the output of minimum selector 97. The gain values 98 at the signal(t) denotes the current audio sample 91. output of minimum selector 97 provide downmix clipping In block 93, the maximum allowed gain values gain (t) 40 protection and may be embedded in a transcoded audio are limited to a maximum gain of 1: If a value gain, (t) is stream as discussed before. The embodiment in FIG.9 may be above 1, then gain, (t) will be set to 1. However, if a value modified to generate again value 98 in a similar way when no gain, (t) is below 1 or equals 1, the value will be not modi incoming gain value 90 is present: If no incoming gain value fied. 90 is present and the computed gain is Smaller or equal to 1, The output of block 93 is fed to a smoothing filter stage 94. 45 the computed gain value is outputted. In case the computed Smoothing filter stage 94 contains a low pass filter and a gain value is larger than 1 (and no incoming gain value 90 is minimum selector 95 which selects the minimum of its two present), again value having again of 1 is outputted. This may inputs. The operation is similar to the Smoothing filter stage be realized by the additional switch 63 of FIG. 6, with the 80 in FIG. 7. However, here a minimum selector 95 instead of Switch Switching between the incoming gain value 90 and a a maximum selector 81 is used since the filter stage 94 50 gain of 1 in dependency of the presence of the incoming gain Smoothes gain values instead of audio samples (the gain value 90. values are derived by inverting audio samples). A smoothing It should be noted that the embodiments as discussed filter stage 80 may be used instead when being placed before correspond to a limiter that respects gain values com upstream of block 92 (which determines gain values by inver ing from a different compressor instance. sion). Analogously, Smoothing filter stage 94 may be used in 55 FIG. 10 illustrates a receiving device receiving the FIGS. 4 and 5 when being placed downstream of blocks 61 transcoded audio stream 14 as generated by the transcoder of and/or 61' (since downsteam of blocks 61 and/or 61' gain FIG.1. Block 121 separates the gain values 11 from the audio signals are processed). Smoothing filter stage 94 Smoothes stream 14. The receiving device further comprises a decoder the signal slope in case of an abrupt increase of the gain value 110 which generates a decoded audio signal 120. The ampli at block 93 (otherwise the audio may sound annoying). In 60 tude of the decoded audio signal 120 is adjusted in block 112 contrast, Smoothing filter stage 94 lets the gain signal pass by the gain values 11 as derived in FIG.1. In case an optional without Smoothing in case of an abrupt decrease of the gain downmix is performed in block 113, the output signal 114 value (otherwise the signal would clip). The computed gain does not clip since the gain values 11 are sufficient to prevent signal 96 at the output of smoothing filter stage 95 is com signal clipping in case of a downmix. The amplitude of the pared with the incoming gain words 90 in minimum selector 65 decoded audio signal 120 may be further adjusted by the PRL 97. The minimum of the actual computed gain value 96 and (not shown). In case the gain values 11 also consideran 11 dB the actual incoming gain word 90 is passed to the output of boost in the RF mode as discussed in connection with FIG. 6, US 8,892.450 B2 17 18 the audio signal 120 may be also boosted by 11 dB without least two audio signals at a time, the at least two audio clipping (both in case of a signal downmix and in case of no signals selected from the following group of signal downmix). at least two audio signals after downmixing according to a first downmixing scheme, The invention claimed is: at least two audio signals before downmixing, and 1. A method of providing protection against signal clipping at least two audio signals after downmixing according to a of an audio signal derived from digital audio data, the method second downmixing scheme, and comprising: wherein the plurality of consecutive signal values corre determining whether first gain values based on received spond to consecutive peak values or consecutive filtered audio metadata are sufficient for protection against clip 10 peak values. ping of the audio signal, the received audio metadata 9. The method of claim 7, embedded in a first digital audio stream; and wherein the method is performed in the course of transcod in case a first gain value is not sufficient, replacing the ing respective first gain value with again value Sufficient for the first audio stream coded in a first audio coding format protection against clipping of the audio signal, wherein 15 into the step of determining comprises the steps of a second audio stream coded in a second audio coding computing second gain values based on the digital audio format different from the first audio coding format, data, the second gain values Sufficient for clipping pro the second audio stream comprising audio metadata tection of the audio signal; having the replaced gain values Sufficient for protec comparing the first gain values based on the received audio tion against clipping of the audio signal or having gain metadata and the computed second gain values; and values derived therefrom, and Selecting a minimum of a first gain value and a correspond wherein ing second gain value based on comparing the first gain the second audio stream is organized in data segments, values and the computed second gain values. and 2. The method of claim 1, wherein the step of computing 25 the maximum of a plurality of signal values associated second gain values comprises: with a segment of the second audio stream is deter determining maximum allowable gain values. mined. 3. The method of claim 1, wherein the method is performed 10. The method of claim 7, wherein in the course of transcoding a maximum signal value is divided by the determined the first audio stream coded in a first audio coding format 30 maximum. into 11. The method of claim 7, wherein a second audio stream coded in a second audio coding the determined maximum is inverted. format different from the first audio coding format, the 12. The method of claim 1, second audio stream comprising audio metadata having wherein the method is performed in the course of transcod the replaced gain values sufficient for protection against 35 ing clipping of the audio signal or having gain values the first audio stream coded in a first audio coding format derived therefrom. into 4. The method of claim 1, wherein the audio signal is a a second audio stream coded in a second audio coding downmixed audio signal and the method provides protection format different from the first audio coding format, against signal clipping of the downmixed signal. 40 the second audio stream comprising audio metadata 5. The method of claim 1, wherein the step of determining having the replaced gain values Sufficient for protec whether first gain values are Sufficient for protection com tion against clipping of the audio signal or having gain prises the step of values derived therefrom, and downmixing the digital audio data according to at least a wherein first downmixing scheme. 45 the first audio stream is organized in data segments, at 6. The method of claim 5, wherein the step of determining least one gain value being received per data segment whether first gain values are Sufficient for protection com of the first audio stream, prises the step of the second audio stream is organized in data segments, computing peak values, wherein a peak value is computed and by determining the maximum of the absolute values of at 50 the method further comprises the step of: least two audio signals at a time, the at least two audio resampling gain values of the first audio stream. signals selected from the following group of 13. The method of claim 1, comprising the step of: at least two audio signals after downmixing according to wherein the method is performed in the course of transcod the first downmixing scheme, ing at least two audio signals before downmixing, and 55 the first audio stream coded in a first audio coding format at least two audio signals after downmixing according to a into second downmixing scheme. a second audio stream coded in a second audio coding 7. The method of claim 1, wherein the step of determining format different from the first audio coding format, whether first gain values are Sufficient for protection com the second audio stream comprising audio metadata prises the step of 60 having the replaced gain values Sufficient for protec determining the maximum of a plurality of consecutive tion against clipping of the audio signal or having gain signal values derived from the digital audio data. values derived therefrom, and 8. The method of claim 7, wherein the step of determining wherein whether first gain values are Sufficient for protection com the first audio stream is organized in data segments, at prises the step of 65 least one gain value being received per data segment computing peak values, wherein a peak value is computed of the first audio stream, by determining the maximum of the absolute values of at the second audio stream is organized in data segments, US 8,892.450 B2 19 20 the method further comprises the step of: 17. A transcoder configured to transcode determining the minimum of a plurality of consecutive a first audio stream coded in a first audio coding format into gain values of the first audio stream. a second audio stream coded in a second audio coding 14. An apparatus for providing protection against signal format, the transcoder comprising the apparatus of claim clipping of an audio signal derived from digital audio data, 14. compr1S1ng: determining means for determining whether first gain val 18. A transcoding apparatus for transcoding a first audio ues based on received audio metadata are sufficient for stream coded in a first audio coding format into a second audio stream coded in a second audio coding format, com protection against clipping of the audio signal, the prising: received audio metadata embedded in a first digital 10 audio stream; and determining means for determining whether metadata replacing means for replacing a first gain value with again related to dynamic range control is present in the first value sufficient for protection against clipping of the audio stream and, if so, whether first gain values based audio signal in case the first gain value is not sufficient on received audio metadata are sufficient for protection for protection, wherein the determining means com against clipping of the audio signal; and prise: 15 gain value adding means for adding gain values to the computing means for computing second gain values based second audio stream; wherein the determining means on the digital audio data, the second gain values suffi comprises: cient for clipping protection of the audio signal; computing means for computing second gain values based comparing means for comparing the first gain values based on the digital audio data, the second gain values suffi on the received audio metadata and the computed second cient for clipping protection of the audio signal; gain values, and for selecting a minimum of a first gain comparing means for comparing the first gain values based value and a corresponding second gain value, based on on the received audio metadata and the computed second comparing the first gain values and the computed second gain values if metadata related to dynamic range control gain values. is present in the first audio stream, and for selecting a 15. The apparatus of claim 14, wherein the apparatus is part 25 minimum of a first gain value and a corresponding sec of a transcoder, the transcoder configured to transcode the ond gain value, based on comparing the first gain values first audio stream coded in a first audio coding format into a and the computed second gain values; and Second audio stream coded in a second audio coding format wherein the gain value adding means comprises: different from the first audio coding format, the second audio means for adding the selected gain value to the second stream comprising audio metadata having the replaced gain 30 audio stream if metadata related to dynamic range con Values sufficient for protection against clipping of the audio trol is present in the first audio stream; and signal or having gain values derived therefrom. means for adding the second gain values to the second 16. The apparatus of claim 14, wherein the audio signal is audio stream if metadata related to dynamic range con a downmixed audio signal and the apparatus provides protec trol is not present in the first audio stream. tion against signal clipping of the downmixed signal. ck ck ck ck ck