Loudness standards in broadcasting. Case

study of EBU R-128 implementation at SWR Carbonell Tena, Damià

Curs 2015-2016

Director: Enric Giné Guix

GRAU EN ENGINYERIA DE SISTEMES AUDIOVISUALS

Treball de Fi de Grau

Loudness standards in broadcasting. Case study of EBU R-128 implementation at SWR

Damià Carbonell Tena

TREBALL FI DE GRAU ENGINYERIA DE SISTEMES AUDIOVISUALS ESCOLA SUPERIOR POLITÈCNICA UPF 2016

DIRECTOR DEL TREBALL ENRIC GINÉ GUIX

Dedication

Für die Familie Schaupp. Mit euch fühle ich mich wie zuhause und ich weiß dass ich eine zweite Familie in Deutschland für immer haben werde. Ohne euch würde diese Arbeit nicht möglich gewesen sein. Vielen Dank!

iv Thanks

I would like to thank the SWR for being so comprehensive with me and for letting me have this wonderful experience with them. Also for all the help, experience and time given to me. Thanks to all the engineers and technicians in the house, Jürgen Schwarz, Armin Büchele, Reiner Liebrecht, Katrin Koners, Oliver Seiler, Frauke von Mueller- Rick, Patrick Kirsammer, Christian Eickhoff, Detlef Büttner, Andreas Lemke, Klaus Nowacki and Jochen Reß that helped and advised me and a special thanks to Manfred Schwegler who was always ready to help me and to Dieter Gehrlicher for his comprehension. Also to my teacher and adviser Enric Giné for his patience and dedication and to the team of the Secretaria ESUP that answered all the questions asked during the process. Of course to my Catalan and German families for the moral (and economical) support and to Ema Madeira for all the corrections, revisions and love given during my stay far from home.

v Abstract

During the 90s and the first decade of the 2000s the loudness levels of popular increased drastically as no real loudness standard was used for mixing and mastering. The same happened in the broadcasting industry where constant loudness level changes between programs or channels annoyed the consumers. Different organizations around the world decided to stop this phenomena by creating standards like ATSC A/85 (USA, Canada) or EBU R128 (Europe). A public German broadcaster, the SWR (Südwestrundfunk), in which I had the opportunity to do my praxis semester, has already implemented the EBU R128, and it is currently fully functional. I studied and explained the European standardizations in order to increase the awareness about the topic and to provide useful information for their fully understanding. In this work I explain how the implementation was done at SWR, along with the insights from the engineers who work there, providing a review with useful information for other broadcasters.

Resum

Durant la època dels 90 i la primera dècada dels 2000, el nivell de loudness de la musica pop va augmentar dràsticament ja que en aquella època no es feia servir cap estàndard de loudness a l’hora de mesclar o masteritzar. El mateix va passar a la industria de les telecomunicacions, on els canvis constants de nivell de loudness entre canals o programes molestaven als consumidors. Diverses organitzacions d’arreu del món van decidir acabar amb aquest fenomen creant estàndards com la ATSC A/85 (EEUU, Canada) o la EBU R128 (Europa). Un canal públic alemany, la SWR (Südwestrundfunk), on vaig tenir la oportunitat de fer el meu semestre de pràctiques, ja ha implementat la EBU R128 i actualment està en complet funcionament. He estudiat i explicat les estandarditzacions europees per augmentar la coneixença d’aquest tema i per a proveir informació útil per la seva plena comprensió. En aquest treball explico com es va realitzar la implementació a la SWR, conjuntament amb el parer dels enginyers que hi treballen, proveïen així informació rellevant per a altres televisions.

vi

Prologue

With the digital revolution we achieved a new era for the audiovisual world and acquired a technology that opened new opportunities. These new technologies improved the dynamic range capabilities and reduced ground and the physical constrains inherent to analog recording. It seems, however, that we were not able to fully understand the advantages of this new technologies. The overwhelming increase in dynamic range capabilities were despised by producing louder and increasingly over-compressed material in order to louder than the rest, in an attempt to stand out (or perhaps, to sell) more than our competition.

Even though compression is needed and in some situations -especially when audio is meant to be reproduced in noisy environments such as cars or public areas- it has been quite often over-used with commercial intentions. This situation, which began at the 60s, speeded up at the end of the 1980s and reached its peak at the beginning of the 2000s, is usually called “”1, and it has reached a point where the product that producers are selling is almost harmful for the consumers. (Vickers, 2010)

Lately there has been a lot of debate about the future of the audiovisual industry. So it is that the International Associations in Telecommunications and Broadcasting has taken measures so that modern technology can be exploited while protecting the consumer, who has been complaining about the situation for years. Among other organizations, the International Telecommunications Union (ITU) has worked on the ITU BS. 1770-2 standard and the European Broadcast Union (EBU) has worked on the EBU R128 recommendations. The aim of both is to set some ground rules for mixing in order to normalize loudness levels and to avoid big changes in the loudness between TV programs or channels, but these rules could be applied to the music industry as well. The aim of this thesis is to study the implementation of these recommendations on a TV broadcaster, and what changes have to be done in the workflow so these recommendations can be fulfilled. The specific broadcaster to serve as a case study will be the Südwestrundfunk2(SWR), a public German regional TV studio in Stuttgart.

This situation has been a problem in the music and in the broadcasting industry, but, as said, in this thesis we will only be focusing on the broadcasting industry. Nevertheless, it is going to be impossible not to mention the music industry constantly as they have evolved together, and one cannot exist without the other. (E.g. Music produced for Compact Disc often is also used in TV-Productions)

1 It is called Loudness war (also known as Loudness race) as there has been a tough competition to get the mixings louder than the competitors and this, according to some experts (Shepherd., 2011), has been harmful to the public audience, therefore it is compared to a war as both are competitive and harmful. 2 http://www.swr.de/

viii

ix Content

Abstract ...... vi Prologue ...... viii Content ...... x Figures ...... xi Tables ...... xii 1. HISTORY OF LOUDNESS...... 1 1.1. Getting technical ...... 2 1.1.1. Peak vs. loudness normalization ...... 4 1.1.2. Compression distortion ...... 5 1.2. Loudness in broadcasting ...... 10 1.3. Metering ...... 10 1.3.1. Peak Program Meter ...... 11 1.3.2. The VU meter ...... 11 1.3.3. New meters ...... 12 2. RECOMMENDATIONS ...... 15 2.1. ITU BS. 1770 ...... 15 2.2. EBU R128 ...... 20 2.2.1. Program Loudness ...... 20 2.2.2. Loudness Range ...... 20 2.2.3. True Peak level ...... 22 2.2.4. Implementation in the production chain ...... 22 2.2.5. Loudness Parameters for short-form Content ...... 28 2.3. Other standards ...... 29 3. STUDIOS SWR ...... 30 3.1. Structure ...... 30 3.2. Implementation of the Loudness regulations in the SWR ...... 31 3.2.1. Loudness control equipment ...... 32 3.2.2. Loudness in to the workflow ...... 33 3.3. My experience with loudness ...... 34 3.3.1. Loudness in live broadcasting ...... 34 3.3.2. Loudness in sound production and post-production ...... 38 3.4. Daily problems, abuses and shortcomings ...... 41 3.4.1. Automatic loudness level systems ...... 42 3.4.2. Tricky situations ...... 44 3.4.3. Commercial loudness abuse ...... 47 4. CONCLUSION ...... 48 5. TERMINOLOGY...... 50 6. BIBLIOGRAPHY ...... 51

x Figures Figure i. Different track values according to the year of its release (Deruty, 2011) ...... 3 Figure ii. Effects of peak normalization (Katz, 2013, p. 67) ...... 4 Figure iii. Peak vs. Loudness normalization (EBU, 2011, p. 16) ...... 5 Figure iv. Compression cause distortion to the signal (Katz, 2013, p. 56) ...... 5 Figure v. domain changes due to clipping (Katz, 2013, p. 61) ...... 6 Figure vi. Amplitude increasing and clipping at amplitude 1 ...... 7 Figure vii. 440 Hz wave spectrogram ...... 7 Figure viii. Clipped 440 Hz wave spectrogram ...... 8 Figure ix. 22533 Hz wave spectrogram ...... 8 Figure x. Clipped 22533 Hz wave spectrogram ...... 9 Figure xi. Aliased harmonics from a 21533 Hz signal ...... 10 Figure xii. K-System scales (Katz, 2000, p. 8) ...... 13 Figure xiii. ITU BS.1770 block diagram (ITU, 2015, p. 3)...... 16 Figure xiv. K-weighting curve...... 16 Figure xv. Signal over-sampling for True Peak detection (Fleischhacker, 2014) ...... 19 Figure xvi. True-peak detection block diagram...... 19 Figure xvii. Loudness distribution, with gating thresholds and Loudness Range for the film “The Matrix” (EBU, 2016a) ...... 21 Figure xviii. Different examples for Loudness Range depending on the replay environment (EBU, 2011, p. 19) ...... 22 Figure xix. Scales of an "EBU mode" meter in LU (EBU, 2011, p. 24) ...... 24 Figure xx. ARD structure ...... 31 Figure xxi. Main display of a Lawo mc2 66 from SWR Stuttgart studios with loudness meter in LU ...... 36 Figure xxii. Parameters in the Lawo console meters ...... 36 Figure xxiii. External computer for Lawo mixing console monitor ...... 37 Figure xxiv. RTW TM7 from Lawo mixing console ...... 37 Figure xxv. RTW TM 9 touch monitor from a sound editing room in the SWR Stuttgart studios...... 39 Figure xxvi. Nugen Audio LM-Correct plug-in interface (NUGEN Audio, 2016) ...... 41 Figure xxvii. Graphic representation of a loud-soft file transition ...... 42 Figure xxviii. Graphic representation of a soft-loud file transition ...... 43 Figure xxix. Measurements from actual loudness transitions during broadcasting ...... 43 Figure xxx. Program scheme with -23LUFS voice level ...... 45 Figure xxxi. Program scheme with -23LUFS program level ...... 45 Figure xxxii. Voice level differences between programs ...... 46 Figure xxxiii. Loudness level difference between a film and a commercial ...... 47

xi Tables Table a. Aliased from 21533 Hz harmonics...... 9 Table b. Summary of new Loudness units ...... 14 Table c. Weights for every audio channel of a 5.1 surround system ...... 17 Table d. Listening levels and alignment signals summary ...... 27 Table e. Summary of the Loudness Parameters for short-form content (EBU, 2016b, p. 4)...... 28 Table f. Regional broadcasting stations members of the ARD ...... 30

xii 1. HISTORY OF LOUDNESS

The history of music is far too long to take all into account, and most of the information would be irrelevant for this case study. Therefore, we are going to start focusing on the 1940s.

In the 1940s, jukeboxes started to be popular in clubs and bars where music could be played at any time, and it could be selected by the customers. At this point music started to be produced for the mases and the business started to grow. Those jukeboxes were pre- fixed to a loudness level by the owner of the club or bar in question, so the audience could hear the music properly.

Also in the 40s, the VU meter had just been born in the broadcasting industry, but it was then also used in the music industry. This was the first meter that intended to represent the way humans perceive loudness, instead of representing only the signal characteristics. (Audio Engineering Society, 2014)

Later on, during the 50s and 60s, 45 rpm singles appeared and producers started to figure out that louder mixes were more frequently played in the jukeboxes, and they outsold the others. Therefore, producers started to mix louder discs. (Sreedhar, 2007). Also, radio broadcasters started to contribute to the war with the top 40 list. Producers started to mix hotter singles in order to get into the top 40 list. As always, in a “louder-is-better” scenario, the louder it sounded, the higher the probability you had to get into the list. Also, advertising started to be an important factor to the albums sales. (Devine, 2013)

During the 60s and 70s, artists used to make compilation albums with all their hits. At this time, when songs from different years were put together, they realized that older songs sounded softer than newer ones, so the older songs had to be remastered to make them sound louder.

That was the beginning of what we call the Loudness War. But, because of the physical constrains of the physical support for the recording at that time, the recordings could not reach the level of compression that they have nowadays. This is because at that time vinyl used to be the main carrier, and records were transferred as an analogue electroacoustic wave to the vinyl disc. This wave had to be readable with a needle, and needles cannot follow discontinuities in a wave. Such limitation would impose a typical crest factor3 (CF) of 14dB.

With the digital revolutions most physical constraints were gone, and with the CD we would be able to theoretically record any kind of wave, up to a CF = 0dB (such as a square wave). Quantization distortion could not be avoided, but with 16 bits per sample and the clever use of dithering (turning quantization THD into white noise), the Signal to Noise Ratio (SNR) had never been bigger.

It is at this point where producers and musicians, and also broadcasters, started to explore the possibilities of the digital era. At first, the dynamic range [1] was being used to create impact with loud and soft , but as what happened with single discs in the 50s, music started to become louder (and thus dynamics decreased) in order to sell more. Digital

3the crest factor (CF) is the ratio, usually expressed in dBs, between a maximum peak value and the effective root-mean-square average value of an alternate signal such an audio signal

1 compressors and peak limiters where much more powerful than analog ones, so music was compressed beyond expectations.

This compression reached its crest around 2008, when the album Death Magnetic from Metallica was released (Shepherd, So, Justin Bieber is louder than Motorhead, AC/DC and The Sex Pistols… – wait, WHAT ?, 2011). It is one of the loudest albums ever made, and it was so compressed that customers rapidly reported the bad sound quality of the record.

However, music industry has lately made a huge change with the arrival of new cloud- based playback technologies such as iTunes or . These platforms changed the way people listen music and lately applied also a loudness normalization per track by default. That means that they match a uniform loudness level for all their content, so people do not have to adjust levels when listening to different tracks. That means that highly compressed works would be punished as they would sound softer and with less dynamics than the more dynamic mixes. This can be a significant step to end the Loudness War as these platforms are the main music distributors nowadays.

Also, lots of influent music producers and big music technology companies are pursuing to make a change in the industry to get dynamics back, accepting and following the new recommendations for loudness normalization.

1.1. Getting technical

To demonstrate all that has been explained we need some empirical facts. In 2011, the magazine Sound on Sound released an article4 where the RMS value of 4500 hits from 1969 until 2010 was compared and the results plotted in a timeline. The outcome was quite impressive as we can see clearly the loudness increase.

But that was not the only consequence of the increased loudness values: the corresponding reduction in dynamics (see CF) and also the increased percentage of peak overs (instantaneous values above -1dBFS) can also be seen.

4 see Dynamic range and the loudness war – SoS, Sept. 2011 - http://www.soundonsound.com/sos/sep11/articles/loudness.htm

2

Figure i. Different track values according to the year of its release (Deruty, 2011)

3 1.1.1. Peak vs. loudness normalization

To achieve these high RMS values, we need to compress the signal. This compression is needed as nowadays, it is a common practice to do peak normalization. Peak normalization consists in adjusting the level of a piece by using break wall limiters until the highest peaks hit full scale. So, compression is used to reduce the distance between peaks and the average level of loudness of the piece, driving CF to figures no higher than 3-6dB. By doing it, we can get the average loudness level much higher.

Peak normalization has been one of the main contributors to the Loudness War, as pieces with high peaks such as percussive ones will get a lower average loudness level than the ones with low peaks. That can cause, as Bob Katz explains in his book “iTunes music”, that a string quartet would sound louder than a symphony orchestra as the orchestra has much higher crest factor [2]. As shown in the figure ii, the string quartet has a higher average level when normalized.

In order to avoid that, producers use to compress or peak limit their works so they can, even though it sounds crazy, make an orchestra sound louder than a string quartet. Figure ii. Effects of peak normalization (Katz, 2013, p. 67) This practice became more and more usual and that is the reason we can see a clear increase of the RMS value of main hits during the last decades, even though there seems to be a turning back with the new ways of music distribution.

This same trick has been used in TV. Commercials are over compressed in order to have a much higher RMS value than a film or a program so it will sound much louder. The more compressed, the louder your commercial will sound compared with your competitors’. As shown in the figure below, peak normalization encourages compression in order to sound louder than other competitors. However, loudness normalization, as we will show in further sections may be the solution for loudness changes between contents.

4

Figure iii. Peak vs. Loudness normalization (EBU, 2011, p. 16)

1.1.2. Compression distortion

Compression has always been present in the history of music and broadcasting, and it can be used as an artistic resource, but over-compression can cause distortion that could decrease the sound quality. Compression and distortion are highly related and when we have compression we would inevitably have distortion too. We can see the effects of compression in the next figure:

Figure iv. Compression cause distortion to the signal (Katz, 2013, p. 56)

This distorted wave sounds much louder as we have increased the average level, and also because, when distortion occurs it usually generates high frequency content and this is perceived louder than low frequency content.

5

This frequency content alteration can also appear by clipping the signal. Clipping means increasing the level of a signal until it loses the curvy form of a wave and it becomes squared. That happens because we are trying to increase the level beyond the maximum peak level, and the wave has to be clipped. By clipping we do not increase the sample peak level [3], but we do increase the average level and the true peak level [4] (intersample peak) [5].

Clipping does change the wave form and, therefore, it changes the frequency content of the signal. In an analog system it can be desired to enrich the sound as it would produce harmonic distortion. However, in a digital domain, the distortion caused is not completely harmonically related to the original wave, but it is partially caused by aliased harmonics. It should be kept in mind that digital clipping in a post-production context would appear after sampling, and thus, after the anti-aliasing stage. In the next figure, we can see the effects of very high frequency clipping in the frequency domain:

Figure v. Frequency domain changes due to clipping (Katz, 2013, p. 61)

This is an image of a 21.533 kHz signal with a sampling rate of 44100 samples per second. This frequency can almost not be heard, but the distortions appeared due clipping are completely audible. Those new frequency components are harmonics of the aliased frequencies created by clipping.

We have our own tests in order to confirm Katz results. To test the distortion due to clipping, we took a 440 Hz wave and we periodically increased its amplitude and clipped it to get a clipped wave of amplitude 1. The wave form can be seen in the next figure:

6

Figure vi. Amplitude increasing and clipping at amplitude 1

If we analyze the first part of the wave we would get a really defined spectrum with big peak at 440 Hz. However, if we analyze the second part (strongly clipped), we would obtain a much more different spectrum with a lot of different frequency components. We can see though that all the new frequencies are higher than the . That is because to square a wave we need to add the odd harmonics form the fundamental frequency to the original signal. As we can appreciate in the Figure viii the new peaks that appear in our spectrogram correspond to 1320 Hz, 2200 Hz, 3080 Hz, etc., which are in fact the odd harmonics of 440 Hz.

Figure vii. 440 Hz wave spectrogram

7

Figure viii. Clipped 440 Hz wave spectrogram

But what would happen if the fundamental frequency chosen is really high? The harmonics from the fundamental frequency would be higher than the Nyquist frequency (22050 Hz in this case) and even higher that the sampling frequency, therefore, they will produce aliasing to our signal. To see the effects of this distortion we reproduced the experiment as before, and we took the exact same frequency from the Katz experience, 21533 Hz. We clipped at amplitude 1 and we have generated the spectrums from the pure wave, and from the clipped wave.

Figure ix. 22533 Hz wave spectrogram

8

Figure x. Clipped 22533 Hz wave spectrogram

We can see that the effect is similar to the previous experiment with the 440 Hz, but this time the new frequency components that appear in out spectrum are below the fundamental frequency, as they are aliased harmonics from the fundamental frequency, 21533 Hz.

If we calculate the odd harmonics of our signal we will get really high frequencies (for an audio signal). The three first odd harmonics from 21533 Hz are: 64599 Hz, 107665 Hz and 150731 Hz. To calculate the aliased frequency from those harmonics, we just have to subtract the value of the nearest multiple of the sampling frequency, which are 44100 Hz, 88200 Hz and 176400 Hz respectively. The results of this process are the next frequencies: 20499 Hz, 19465 Hz and 25569 Hz. The two first are lower than the Nyquist frequency, but the last one still higher, so it is going to produce aliasing again. By doing the same process we obtain that the aliased frequency is 18531 Hz.

Harmonics from 21533 Hz Closest sampling frequency Aliased frequency

(풇) multiple (풇풔) 풇풂 = |풇 − 풇풔| 64599 Hz 44100 Hz 20499 Hz 107665 Hz 88200 Hz 19465 Hz 150731 Hz 176400 Hz 25569 Hz → 18531 Hz Table a. Aliased frequencies from 21533 Hz harmonics

If we look carefully at the spectrum of the clipped signal we will see that the peaks appeared correspond with those frequencies.

9

Figure xi. Aliased harmonics from a 21533 Hz signal

All those artifacts are in fact happening nowadays with over-compressed, post-produced audio materials. We are getting used to the way it sounds, and it may sound natural for some people5. But it is not, and expert listeners would notice them. These are also reasons why regulation and standardization are needed.

1.2. Loudness in broadcasting

Broadcasting has also been affected by the loudness war. It is clear that radio broadcasters have been affected as they use the material that producers and musicians are making, and this material has become louder in the past decades, even though radio stations have also feed backed this war.

Despite that, loudness war is also present in other aspects in broadcasting. Loudness level differences between programs or between channels is very frequent, and very loud intervening commercials are the main source of complains, as this has become more and more usual lately. Again, the louder, the more it will stand out and eventually sell out.

1.3. Metering

Loudness metering has always been an important issue. That is why there are many different types of meters and ballistics. There are meters for many different characteristics of a signal, but we are going to focus on those that have more importance in the loudness area.

5See Loudness Normalization: Paradigm Shift or Placebo for the Use of Hyper-Compression in Pop Music? - http://quod.lib.umich.edu/i/icmc/bbp2372.2014.143/1

10 The Peak Program Meter (PPM) measures the highest sample peaks of a signal (even though there are several meter variations). With this measure we can know the highest peak from our signal, but this does not give a good representation of its loudness.

The VU meter, on the other hand, was created to obtain a more realistic representation of loudness by giving out an averaged value of the signal. Even though it has been in use for many years, there was a need to create new meters to really represent how humans perceive loudness, and also to represent other characteristics of a sound like the Loudness Range.

1.3.1. Peak Program Meter

The peak program meter (PPM) is a measuring instrument to measure the level of an audio signal which was first introduced in the professional fields during the early 1930s (Yonge, 2008). It usually consisted on a needle moved mechanically by electromagnetic impulses created by the analogue audio signal. It is nowadays used for digital audio signals, typically as a bar graph made of a light array displayed vertically or horizontally. This meter has had many different variations as well as many different scales.

Firstly, the basic PPM is the True Peak meter, which indicates the peak value no matter the duration of the peak.

Secondly, the Quasi Peak meter only indicates the peaks with a certain minimum duration of a few milliseconds, about 10 ms (Schmid, 1976). The peak duration time needed is determined by the integration time, so shorter peaks will not have enough weight and their true peak level will not be shown.

Another approach is the Sample Peak meter, which shows only the sample peaks of a digital audio signal, but not the true peak that may be between two samples. Although it may have an integration function for true peaks, another solution to that is the Over- Sample Peak meter, which first oversamples the signal, and then displays the sample peak of the oversampled signal.

Since it is an old technology, there have been many different approaches of it. Many important broadcasters associations have made their own scale for their convenience, and as a result there are a large number of scales in use. Engineers needed a new, trustable and standard technology to work with. Also, regarding loudness metering, the peak meter does not give good information about how humans perceive the loudness of a signal. The VU meter was then introduced.

1.3.2. The VU meter

The VU meter was firstly introduced in 1942 and it was first used in broadcasting stations. It is one of the simplest meters for audio metering, and it has been in use since its creation in the broadcasting and the music industry. It is also a measuring instrument that measures the level of an audio signal, but it has a different approach than the peak meter. The VU meter is intentionally slow, it does not react to rapid changes in the level of the audio signal. Instead, it averages the signal within a rise/fall time of 300 ms, and gives a better

11 approach of how humans perceive loudness (Johansen, 2006). It could be understood as an approximation of the root-mean-square (RMS) value of the signal.

The VU meter uses VUs (Volume Units) as a measurement unit, where 0 VU equals the intensity of a 1KHz signal at a reference level that has been applied during 300 ms. The VU meter has to be calibrated at a reference level, which it is usually 1.23 Vrms (+4 dBu over a 600 Ohm resistance) for a pure tone (Schmid, 1976).

1.3.3. New meters

As seen with the previous technologies there was a need to create a new metering system that could measure loudness better, and that could be used for all the parties interested. There was a need for standard metrics that could measure loudness once and for all and that could be used not only for all the different genres of music but also for all kinds of broadcasters, film makers and producers.

1.3.3.1. K-System

Bob Katz, a prestigious mastering engineer, proposed this system to set some ground rules for all situations6. He divided genres in three groups: those that do not need much dynamic range, such as broadcasting and radio content or pop music; those that need slightly more dynamic range, such as rock and country music and moderately-compressed content intended for home listening; and finally, those that do need a lot of dynamic range, such as classical music, hi-fi recordings and cinema. These three groups were divided in three different set ups, K-12, K-14 and K-20, respectively.

They are named after the headroom [6] they each have, so K-12 has 12 dB of headroom and so on. Due to these dynamic range changes, the studio monitor gain must be adjusted, before mixing, depending on what we are mixing. To do it, we have to generate a standard of -20dBFS, play it out loud through the studio monitors, and adjust the gain until we measure 83dBSPL in the sweet spot using the C ponderation curve. At the end of this process, we would have set our system to the K-20 standard. If we would like to set our system to K-14 or K-12, we would have to low the monitor gain -6 and -8 dB respectively.

6 see http://www.aes.org/technical/documentDownloads.cfm?docID=65

12

Figure xii. K-System scales (Katz, 2000, p. 8)

With this ground rules set, all producers could mix at the same level and, by introducing the information of the set up used in the metadata of the file, broadcasters or customers could reproduce the piece at the right loudness level, adjusting it for every file in order to get a constant loudness level.

1.3.3.2. LUFS, LKFS, LRA

Even though Katz tentative did not get much practical success at first, it fostered the discussion that led international organizations in telecommunications and broadcasting start developing new standards, which we will deeply discuss later on. With those standards came new metrics again, the LUFS and the LKFS. Both are the same unit but with different names as they were proposed by different organizations.7

In the BS.1770, the ITU proposed the LKFS as an absolute loudness unit. It is a Loudness, K-Weighted, referenced to digital Full Scale (LKFS) unit intended to measure the average perceived loudness of a piece and give a value for the whole audio content. That means that it is a psychoacoustic loudness unit, as the K-weighting curve represents how humans perceive sounds. The EBU proposed different units, the LUFS (Loudness Units, relative to digital Full Scale) as the same concept and functioning as the LKFS. Also, the LU (Loudness Units) which is a loudness unit relative to a target level, which we will discuss later on. And

7 They were not the same, at first, when LKFS were presented in the ITU-R BS. 1770 and they were not equivalent until the publication of the ITU-R BS. 1770-2, where the ITU included some rectification according to the EBU R 128 recommendations.

13 finally, the LRA (Loudness Range) is meant to represent the dynamic range of a whole program. It was first suggested by TC-Electronics and finally added to the EBU R128.

As said before, 1 LKFS equals 1 LUFS, and they are both equivalent to 1 dB. The LU is a relative unit which describes loudness level differences and it is also equivalent with the dB scale. LRA, on the other hand, is an absolute unit that describes the overall program loudness range, and it is measured in LU. In other words, it measures the number of LUs between the softest and the loudest part of the piece. However, it ignores extreme events in order that they do not affect too much to the overall measurement.

LU Loudness Unit LKFS Loudness, K-weighting, with reference to Full Scale LUFS K-weighted Loudness Unit with reference to digital Full Scale LRA Loudness Range, measured in LU Table b. Summary of new Loudness units

These meters have another characteristic, they are gated. That means that they measure the average loudness of a whole program or piece of audio, but they omit the softer parts below a certain threshold, which will be discussed later on. This is because humans identify how loud a piece of audio is, because of the loud parts but not for the soft ones. Moreover, these new meters are meant to measure loudness for all genres, from movies, to ads, to classical pieces, and pop songs. Gating allows these new meters to work no matter what they are measuring, and they are meant to make possible to match the loudness level of a 2 hours film with a 20 seconds commercial.

14 2. RECOMMENDATIONS

2.1. ITU BS. 1770

The Radiocommunication Sector from the ITU issued ITU BS.1770, a paper in the Broadcasting Service (sound) series, with recommendations of audio measurement algorithms to determine an objective approximation of the subjective loudness level of a program together with a true-peak measurement approach. It is one of the most important standards, as it is used by several other standards from different broadcasting unions.

There have been several versions of this recommendation in which each version has included new characteristics or features in order to improve the algorithm. As the ITU has included in the recommends section of the paper, it is intended to update the recommendation when new algorithms had been developed. (ITU, 2015)

The ITU considered several aspects to build the algorithm, such as:

- the wide dynamic range that new technologies offer - the fact that most of the productions are a mix of mono, stereo and multichannel signals - the fact that listeners desire a uniform subjective loudness level

With all that in mind, they built the algorithm and recommended it to be used when an objective measurement of the loudness level of a multichannel audio is needed. Thus, indicators of loudness levels used in production and post-production should be based on these recommendations. (ITU, 2015)

As said before, the aim of this algorithm is to obtain an objective approximation of the subjective loudness level from a whole audio file. Therefore, this algorithm must take into account psychoacoustic concepts. To understand the implications and the implementation of the algorithm we will explain it step by step. However, in this thesis we will not review the background and the methodology used to develop the algorithm. To know more about this you can check the ITU BS.1770-4 quoted in the bibliography.

The multichannel algorithm is based on a previous one from a study (Soulodre, 2004) to obtain a loudness indicator for mono signals. This algorithm is known as Leq(RLB)[7]. It was designed to be very simple, and it is based on a high pass filter, known as the revised low-frequency B-curve (RLB), which is a modification of the weighting B-curve, followed by a root mean square of the sound level from a given time period, Leq (equivalent continuous sound level). After several subjective tests it was concluded that, despite its simplicity, this algorithm performed very well for monophonic signals.

Implementing a multichannel algorithm presents several more challenges than the monophonic one, as it has to work for mono, stereo and multichannel signals. With regard to the good performance of the previous algorithm, the multichannel one is based on the monophonic sound level measurement algorithm.

Now we will study the multichannel algorithm, and to begin with, a block diagram will help us to understand the algorithm as a whole. As we can see in figure xiii, the algorithm is thought as a multichannel algorithm based in a 5.1 system, but it does not take into account the low frequency channel to measure the loudness level. We can also see that

15 the first step is to apply a K-filter [8], as this will give a psycho- based approach of the sound that compensates the acoustics effects of the head and takes into account how differently frequencies are perceived. Then, the mean square value of the signal is calculated and the result is multiplied by a factor. This factor is different depending on the channel. The surround channels are increased to compensate the perceived gain of those channels because of their position on each side of the listener (Qualis Audio, Inc., 2013). Finally all signals are summed up and gated. This gating was firstly introduced in the third version of the recommendation.

Figure xiii. ITU BS.1770 block diagram (ITU, 2015, p. 3)

The K-filter is used to compensate the acoustic effect of our head and it is implemented in two steps. First, a shelving filter and then a simple high pass filter is applied. This high pass filter is known as the RLB weighting. The filter coefficients of the filters applied would change depending on the sampling rate of the signal.

Figure xiv. K-weighting curve

16 Once the signal has been filtered, the mean square of the signal is calculated. It is calculated in intervals of length 푇:

푇 1 푧 = ∫ 푦2d푡 푖 푇 푖 0

Where 푧푖 is the mean square value, 푦푖 is the filtered input signal, and 푖 represents each channel.

Then, the signal loudness for an interval 푇 is expressed as:

퐿푘 = −0.691 + 10 log10 ∑ 퐺푖 ∙ 푧푖 퐿퐾퐹푆 푖

This value is expressed in LKFS as it is a K-weighted signal, and as we can see in the formula, it is in a logarithmic scale. The -0.691 value is a constant value to calibrate the effects of the two filters (K-filter and RLB), as they modify the gain of the signal. (Carroll, Jones, & Williams, 2007). And, 퐺푖 is the weighting factor for every channel.

Channel Weighting 푮풊 Left (퐺퐿) 1.0 (0 dB) Right (퐺푅) 1.0 (0 dB) Centre (퐺퐶) 1.0 (0 dB)

Left surround (퐺퐿푠) 1.41 (~+1.5 dB)

Right surround (퐺푅푠) 1.41 (~+1.5 dB) Table c. Weights for every audio channel of a 5.1 surround system

Once the loudness calculation process is understood, we need to know how the gated loudness level is measured. The gating function is needed because not all intensities contribute the same way to the overall perceived loudness level, as the loudness level of a signal is described mostly because of the loud parts than the soft ones.

The calculation of the gated loudness level is done by blocks. The previous signal of length 푇 is divided in blocks of 푇푔 = 400 푚푠 which are overlapped a 75% with each other. The gating loudness measurements are performed coinciding with the blocks length, but those blocks that are incomplete at the end of the signal are left out. The mean square value of the different blocks is calculated as:

푇푔∙(푗∙푠푡푒푝+1) 1 푧 = ∫ 푦2푑푡 푖푗 푇 푖 푇푔∙푗∙푠푡푒푝

Where 푠푡푒푝 is 1 − 표푣푒푟푙푎푝, 푗 is the gating block and 푖 represents every channel. So, as before, the loudness level of every block is calculated as:

17 푙푗 = −0.691 + 10 log10 ∑ 퐺푖 ∙ 푧푖푗 푖

If we set a loudness threshold Γ, we will obtain a set of gating blocks that its loudness level is above the threshold, this subgroup is 퐽푔 = {푗: 푙푗 > 훤}, and its number of elements is |퐽푔|. The number of elements in this subgroup depends on the threshold fixed. The gating process is made in two steps, first with an absolute threshold Γ푎and a relative threshold Γ푟. The absolute threshold is fixed at Γ푎 = −70 퐿퐾퐹푆, and the relative threshold is calculated by subtracting 10 to the loudness value obtained applying the absolute threshold. So the first gated loudness level from the interval 푇 is calculated as:

1 퐿퐾퐺1 = −0.691 + 10 log10 ∑ 퐺푖 ∙ ( ∙ ∑ 푧푖푗) 퐿퐾퐹푆 |퐽푔| 푖 퐽푔

Where:

퐽푔 = {푗: 푙푗 > Γ푎} Γ푎 = −70 퐿퐾퐹푆

Then the relative threshold is calculated as:

Γ푟 = 퐿퐾퐺1 − 10 퐿퐾퐹푆

And the final gated loudness level would be:

1 퐿퐾퐺 = −0.691 + 10 log10 ∑ 퐺푖 ∙ ( ∙ ∑ 푧푖푗) 퐿퐾퐹푆 |퐽푔| 푖 퐽푔

Where now:

퐽푔 = {푗: 푙푗 > Γ푟}

This gating was firstly introduced in the third version of the recommendation in regard of the later EBU R128 recommendations.

Another issue covered in this recommendation is the peak detection. The ITU stated that true peak meters should be used for peak detection. Nowadays, in broadcasting, quasi- peak or sample peak detector meters are normally used to detect peaks. They are called quasi-peak detectors as they have a reaction time of approximately 10ms. This means that smaller peaks are not properly detected. Furthermore, peaks may not always be represented by a sample, as there can be inter-sample peaks. Initially, those peaks could not be detected and therefore the headroom needed could not be properly calculated, eventually causing distortion.

To solve those problems, the ITU recommends the use of an algorithm for accurate true- peak detection. This algorithm is based on some simple steps:

18 Firstly, the signal has to be attenuated 12.04 dB (that is, to ¼ of signal voltage) to leave enough headroom for the next steps of the process.

Then an over-sampling process is performed. The ITU assumes a 48 kHz sampling rate, and therefore, recommends a 4 times over-sampling process in order to achieve a sampling rate of 192 kHz. With this, we can obtain a more accurate representation of the in order to represent the peaks with samples.

Figure xv. Signal over-sampling for True Peak detection (Fleischhacker, 2014)

After that, the signal is filtered with a low-pass filter, and then the absolute value of the samples is calculated by inverting the negative values. At this point we would have a number with the true peak value of the signal. Nevertheless we still have to increase the level of the signal by 12.04 dB to compensate for the previous attenuation. The result of this process should be expressed as dBTP ( referenced to digital Full Scale measured with a True Peak Meter (EBU, 2011)) once it is converted to a logarithmic scale. Here we present a block diagram for the whole process.

Figure xvi. True-peak detection block diagram

19 2.2. EBU R128

The EBU R128 recommendation has been developed by the PLOUD EBU research group. It has extra material about Loudness Metering Specifications (EBU Tech 3341), Loudness Range Descriptors (EBU Tech 3342), Loudness Production Guidelines (EBU Tech 3343) and Distribution Guidelines (EBU Tech 3344). All together they establish a well specified work-flow methodology in order to help professionals from the broadcasting sector to identify and measure the loudness level for all contents.

The EBU R128 is based on the ITU BS.1770, but it extends its content by introducing new concepts and defining some targets. In this recommendation, a Loudness Target Level has been defined, together with a gating method for loudness normalization to assure loudness matching between contents. In order to achieve this, three new key concepts have been introduced, Program Loudness, Loudness Range and Maximum Permitted True Peak Level.

2.2.1. Program Loudness

To understand program loudness, we need to define a program. As the EBU recommends, in this document a program is understood as every single audiovisual content, no matter it is a film, a show or a commercial. Knowing that, program loudness is defined as the long-term integrated loudness over the duration of a program (EBU, 2011, p. 11). It is expressed with a number (with one number after the decimal point) that indicates the average loudness of a program in LUFS (or LKFS). This value is calculated following the ITU BS.1770 methodology explained in the previous section, but including a gating function.

The gating function basically excludes from the measurement all the parts from the program that are softer than a certain threshold. After several series of listening tests, this threshold was fixed at -8 LU taking as reference the loudness level of the ungated program in LUFS. This gating function was not firstly introduced in the ITU BS.1770-0, but it was in the ITU BS.1770-2 version, although the ITU considered a threshold of -10 LU instead. After consideration, the EBU has accepted the proposal and the gating level is set at -10 LU since then as well.

The tests also showed which should be the target loudness level for all programs. The target level should be -23.0 LUFS. However, it has been set an acceptance value of ±1 LU for technical difficulties or unpredictable programs such as live shows.

2.2.2. Loudness Range

Loudness Range (LRA) quantifies (in LU) the variation of the loudness measurement of a program based on the statistical distribution of loudness within a program (EBU, 2011). With the statistical distribution very loud isolated elements would not affect the overall loudness measure, as extreme cases are excluded.

The calculation of the loudness range is made by taking a vector of loudness levels obtained by using a 3 second overlapped sliding windows and a cascade gating method with an absolute and relative gate, as the ITU BS.1770 specifies. With this gating system,

20 extreme soft events are eliminated from the measure thanks to the absolute gate, and it makes the measure independent of the signal level thanks to the relative gate.

The gated loudness level values are distributed and the distribution width is quantified using a percentile range. The LRA is defined as the difference between the 10th and the 95th percentiles. This way extreme events are eliminated of the measurement, because, for example, a fade out at the end of a song or a single loud gunshot would deviate the measure and increase the LRA drastically (EBU, 2016a).

Figure xvii. Loudness distribution, with gating thresholds and Loudness Range for the film “The Matrix” (EBU, 2016a)

A maximum Loudness Range is not defined in this recommendation as there is no loudness range capable to fulfill all demands. Every genre should have its loudness range because an action film cannot have the same loudness range as a news magazine. In addition, we cannot define a loudness range for all listening conditions. As shown in the next figure, every listening environment and amplifying system has its own needed loudness range.

21

Figure xviii. Different examples for Loudness Range depending on the replay environment (EBU, 2011, p. 19)

That is the reason why EBU encourages the use of Loudness Range as it is a clear indicator of whether dynamics processing such as compression is needed. Also because it can indicate if there has been a process in-between the production chain that has changed the original dynamic range.

2.2.3. True Peak level

As explained before (3.1. ITU BS. 1770) inter-sample peaks can be an issue, as they cannot be detected by the most common used sample-peak meters. Therefore, in the R128, EBU encourages to follow the ITU RS.1770 recommendations to perform proper true peak detection.

As peak normalization is left behind, now, with loudness normalization, peaks are still a concern, as we must be careful with its levels. Therefore, the EBU recommendation also sets a Maximum Permitted True Peak Level. They recommend -1 dBTP in order to avoid distortion that may occur in further production chain stages. To be able to detect and treat peaks, the new loudness meters should have the “EBU mode”, which includes the True Peak meter, in all the stages of a production.

2.2.4. Implementation in the production chain

There are two ways to get the program to the target level. On the one hand, we can keep the mixing habits that we already have and do a level shift afterwards. On the other hand, we can change our mixing habit and focus the mixing towards the target level, so no level shifting will be needed afterwards.

The first method is legitimate, and it can be useful in some occasions. During the transition time (from peak to loudness normalization) can be helpful as engineers can get used to the new levels comparing their mixings with the shifted ones. Also, in direct programs where the target level could not have been achieved due to unexpected events,

22 this method could be really useful, although there is a ±1 LU of tolerance. It is worth considering, that most of the times, the shifting that will have to be performed would be negative (attenuation), and therefore, there would be no need to do any more calculations. Although, if the shift has to be positive, we would have to check the dynamic range and the maximum true peak level again.

Even though, the first method can be used, the second one is recommended. Changing the mixing habits will be good, as engineers could stop worrying about the hitting the top, and they could mix by ear once they would be used to it. This will result in much more dynamic mixes, and as the Maximum True Peak Level is set at -1 dBTP, hitting the top will then rarely be a concern.

To change the mixing habits and to be able to mix more freely and dynamically as before, engineers need a new loudness metering system, that allows them to check the loudness level at any time. Therefore, the EBU R128 also includes a new metering system.

2.2.4.1. EBU meters

This system is based in three different time scales, Momentary, Short-term and Integral Loudness.

 The shortest time scaled is the Momentary Loudness scale (abbreviated as “M”). It uses a sliding rectangular time window of 400 ms of length. The measures that indicates are not gated.  The second shortest is the Short-term Loudness scale (abbreviated as “S”). It also uses a sliding rectangular time window, but this time it is 3 s long. Its measures are also not gated.  And finally, the longest scale is the Integrated Loudness scale (abbreviated as “I”). It measures the average loudness value of the whole program, no matter its length. This measure is gated as the ITU BR. 1770 recommends [2.1. ITU BS. 1770]. (EBU, 2016c)

The new meters should also show the Loudness Range and the True Peak Level of the signal, as they are also very important parameters to follow the EBU recommendations. PLOUD group [9], and several manufacturers have agreed to produce new meters that include the “EBU mode” that should follow all the previous indications. All the specifications about the new meters can be found in the EBU Tech 3341 document. (EBU, 2016c)

The EBU Tech 3341 also recommends two different types of meter scales, depending on the dynamics required of every production and the comfort of the engineer. The two scales are the “EBU +9 scale” and the “EBU +18 scale”. The first one may be used by default, and it has a range of −18.0 LU to +9.0 LU (−41.0 LUFS to −14.0 LUFS), and the second one has a range of −36.0 LU to +18.0 LU (−59.0 LUFS to −5.0 LUFS). (EBU, 2016c) The meters can provide an absolute value expressed in LUFS or there can be fixed a 0 value (according with the recommendation at -23.0 LUFS=0.0 LU) so the measure would be a relative value expressed in LU.

23

Figure xix. Scales of an "EBU mode" meter in LU (EBU, 2011, p. 24)

2.2.4.2. What to measure

All the new meters created and explained before have to be used to measure the loudness characteristics of a signal. What matters here is to define which signal to measure in order to get representative information about the loudness level of a program.

The EBU R128 recommends measuring the entire program as it is a method that will assure a correct measure for any case or genre. However, there is another option. It is also possible to choose an anchor signal form the program (normally dialog) and determine its level alone, adjusting all the other parts in reference to this one.

The anchor signal must be a central and important part of the sound scope. This method can be useful in wide loudness range programs, but choosing the signal is a complex process which requires experienced engineers, and it is only recommended once the operators are fully familiar with the loudness normalization process. It is worth saying though, that there are automatic anchor signal discriminator algorithms that may help in this process, but they do not work perfectly for all situations.

As said, this method can be useful for wide LRA programs, but in narrow LRA programs, such as commercials, the difference between the anchor signal and the whole program level may be small. As the biggest common denominator, R 128 recommends to measure the whole program with all its elements instead of anchors, even with wide LRA material. (EBU, 2011)

24 2.2.4.3. File Based System

Nowadays, most of the broadcasters have a file-based production workflow. In this working scheme, all the previous loudness recommendations remain the same, loudness level normalization and dynamic control should be done during the production of new material. With the old production saved in the archive there are several options to normalize the loudness level. All the next options are valid and the choice of any of them depends on the company structure and workflows.

 Actually changing the loudness level of all the files from the archive material and set it to the target level. This may sound tedious, but there are automatic or semi- automatic hardware and software systems that have a good and relative fast performance.

 Actually changing the loudness level only when needed. That means that when a file is peaked from the archive, it will be normalize before is used or sent.

 Another option is not to modify the file, but measure its loudness level and adjust the playout level before it is broadcasted, without changing the loudness level of the file, in order to achieve the target level only when broadcasting. Of course all the material has to be measured before the broadcasting.

 And lastly, using the correct metadata, the loudness level of the file can be transmitted to the consumer reproduction system, and there the loudness level is adjusted to achieve the target.

Independently of the choice, it is undeniable, that metadata can play a big role in a file- based work structure.

2.2.4.4. Metadata

The metadata included in any broadcasting file can be either descriptive (format, copyright…) or active (changing the signal). Loudness normalization can be done by normalizing the signal during the mixing or also by doing a signal shift before its broadcasting or before is reproduced using the file metadata.

Of course, the first option is recommended in the EBU recommendation, but it is also recommended, that the three main loudness measures of a signal loudness have to be included in a file metadata, those are Program Loudness, Loudness Range and Maximum True Peak Level. Those three measures are already included in the header of the broadcast wave file BWF8 (EBU, 2011). For short content programs the Maximum Momentary Loudness Level and the Maximum Short-term Loudness Level are also recommended to be stored in the file metadata, as they are helpful dynamics control parameters.

Probably the most used metadata system is the Dolby-Digital system. In the Dolby AC-3 Metadata system, three characteristics of the signal that are of interest for loudness control are included. The program loudness level, the dynamic range and the down-mix coefficients. The program loudness is called dialnorm, as the Dolby system is oriented to

8see EBU Tech Doc 3285; for a detailed description of BWF

25 the anchor signal normalization taking the dialogs as anchor signal, however, this value refers to the loudness level taking all components into account. The dynamic range parameter is called dynrng and the down-mix coefficients Centre/Surround Downmix Level.

The Program Loudness parameter should be set at -23 LUFS when the signal has been mixed and normalized following the recommendations. If the signal does not fit the recommendations, the metadata parameter has to be set at the current loudness level of the signal so the distribution systems can adjust the level live.

There are many situations where the consumer may want to reduce the loudness range of a program, so many Home Theater systems have a loudness range control option to control it, but loudness range information is needed in the metadata. So, as the program loudness level, loudness range can also be adjusted in the distribution system if the parameters are set in the metadata. In the Dolby AC-3 system different compression presets are available to fit in different situations.

Also, the down-mix coefficients are important in the loudness level calculation. It is so, as in the down-mixing the surround and center channels are mixed together with the Left front and Right front channels in order to have a 2-channel-stereo signal instead of a surround one, and as explained before, the surround channels are treated differently than the front channels in the multichannel loudness level calculation. The resulting loudness level of the down-mixed signal may depend on the down-mix coefficients used, on the content of the surround channels and on the limiting used to avoid overload in the stereo channels.

To avoid the overload, good down-mix coefficients have to be used, and also dynamic processes may be also useful. It has to be taken in mind that those mixes with a lot of surround content will have a significant variation in the loudness level once down-mixed, as the +1,5 dB gain factor of the surround channels will not be applied. Those mixes with less surround presence will not have such a significant loudness level variation.

Metadata can be really helpful for an easier loudness level control, but it also has to be controlled, as a file with the wrong metadata will produce loudness variations. Therefore, the EBU recommends to be careful with the metadata, especially in those files coming from an external source, as it can be set wrongly in porpoise in order to sound louder than other productions.

2.2.4.5. Alignment signal

An alignment signal is needed in broadcasting in order to set an anchor point for all the equipment used. This signal is typically a 1 kHz sinewave at -18 dBFS. This level was specified in the EBU recommendation R68, created in 1992 and revised in 1995 and 2000 (EBU, 2000). This method is not altered with the new EBU loudness recommendations. It must be said, though, that the signal will be expressed as -18 LUFS, or as -5 LU in the relative scale, in an EBU compliant meter.

26 2.2.4.6. Monitoring level

The recommended monitoring level was defined by the EBU in the document EBU Tech Doc 3276-E ‘Listening conditions for the assessment of sound programme material’ and its supplement document Supplement 1, for Multichannel Sound. In those documents some formulas are provided to calculate the recommended listening level, one for a stereo system and another for a multichannel system.

To calibrate a stereo system, a pink noise test signal at -18 dBFS in digital devices should be sent to each separately. The gain of the loudspeaker should be adjusted so that the level (SPL) measured using an A-weighted slow response sound level meter fulfill the next formula in the sweet-spot:

퐿퐿퐼푆푇푟푒푓 = 85 − 10 log(푛)푑퐵(퐴)

Where 푛 is the number of channels of the system. (EBU, 1998)

For a multichannel system the level produced by each loudspeaker of the system in the sweet-spot should be:

퐿퐿퐼푆푇푟푒푓 = 96 푑퐵 푆푃퐿, referenced to digital Full Scale signal level. (EBU, 2004)

In this case a different test signal should be used. Noise of equal energy per octave, covering the range from 500 Hz to 2 kHz should be used in this case. The level of this signal must be the same as in the previous case, -18 dBFS. With this signal sent to each channel of the system separately, the gain of the loudspeaker must be adjusted such that the sound pressure level (SPL) measured with a C weighted slow response sound level meter is:

96 − 18 = 78 푑퐵 푆푃퐿

This listening reference levels where defined by the EBU before the loudness regulations were made. In the R128 recommendation, the EBU recommends not to change the listening levels or the alignment process. It has to be said, though, that once the recommendations are applied, the average program loudness level will be approximately up to 3 LU lower, in comparison with the productions made before the recommendation. If this level decrease is detected as a problem, a revision of the listening level and the alignment process will be made in the future (EBU, 2011).

Reproduction Listening level of reference Noise9 rage Level system 2-channel stereo 퐿퐿퐼푆푇푟푒푓 = 82 푑퐵퐴 푆푃퐿 20 Hz – 20 kHz -18 dBFSrms 5.1 MCA 퐿퐿퐼푆푇푟푒푓 = 78 푑퐵퐶 푆푃퐿 500Hz – 2 kHz -18 dBFSrms Table d. Listening levels and alignment signals summary

9 Noise of equal energy per octave

27 2.2.4.7. Low Frequency Effects Channel

The EBU recommendation is based in the ITU algorithm, and this algorithm does not take the Low Frequency Effect (LFE) channel into account due to many uncertainties about the use of this channel.

This exclusion in the loudness level calculation can cause an abusive use of the LFE channel. Although it may be included in the loudness level calculation in future revisions of the recommendation, further practical experience and investigation are needed. (EBU, 2011)

2.2.5. Loudness Parameters for short-form Content

In November 2014, the EBU released the R128s1, a supplement for the recommendation R128 that concerned additional recommendations for short-form content such as advertisements, promos or other formats of short duration. This document has been lately revised and in January of this year (2016) the EBU released a new version of it with some changes.

In the first version, the supplement recommended, as in the main recommendation, that the Maximum Permitted True Peak Level of a program should be -1 dBTP and that the Program Loudness Level should be normalized at -23 LUFS, but this time, with a deviation of just ±0.5 LU. As complementary measures they recommended a Maximum Permitted Short-term Loudness Level of -18 LUFS (or +5 LU) or, alternatively, a Maximum Permitted Momentary Loudness Level of -15 LUFS (or +8 LU). This two measures were not meant to be applied simultaneously, but alternatively depending on the needs of the situation.

In the new version, though, there are no longer two alternatives as the Maximum Permitted Momentary Loudness Level has been left out of the recommendation and now a Maximum Permitted Short-term Loudness Level of -18 LUFS (or +5 LU) is only recommended and also, the Program Loudness Level should be normalized at -23 LUFS, with a deviation of ±0.5 LU. The rest of recommendations and functions stated in the main recommendation R128 are still applicable.

Programme Loudness -23.0 LUFS ±0.5 LU Maximum True Peak Level -1 dBTP Maximum Short-term Loudness -18.0 LUFS (or +5.0 LU) Loudness Range Not applicable Table e. Summary of the Loudness Parameters for short-form content (EBU, 2016b, p. 4)

This supplement was created, as it was detected that the measure of Loudness Range was not effective in this situations, as it is based in a statistical analysis of the Short-term and for such short programs there are not sufficient data points. Therefore, for such short contents no maximum or minimum LRA value is specified. (EBU, 2016b)

As explained in the section 3.4.3. Commercial loudness abuse, some broadcasters had already detected that the recommendation did not deal well enough with commercials and

28 trailers of short duration and they already applied their own additional measures to deal with this issue.

2.3. Other standards

The loudness standardization is not only an issue in Europe, but all over the glove. Many different broadcast associations have also created their standards and some countries have even laws about it.

In the US, the ATSC (Advanced Television Systems Committee) in 2009 created the A/85, which is a standardization based on the ITU BS.1770. It recommended the normalization of the loudness level of an anchor signal for regular programs, but with commercials the recommendation is to normalize the loudness level taking all the signals into account. In contrast with the EBU R128, the target level is -24 LKFS (instead of -23 LUFS). This anchor signal based method was in 2011 revised and changed for all-source based measurements for all programs, and the A/85 was then based in the ITU BS.1770- 3, so a gating method was then included as specified in the algorithm from the ITU. (Lund, 2015)

This recommendation was stablished as a rule when the Congress redacted the CALM (Commercial Advertisement Loudness Mitigation) Act, that directed the FCC (Federal Communications Commission) to stablish some rules to make all commercials have the same average loudness level. This rules went into effect the December13, 2012. (Federal Communications Commission, 2015)

In Japan, the Association of Radio Industries and Businesses (ARIB) created the TR-B32. The standard for Japanese broadcasters is also based in the ITU BS.1770-2, and as in the American version, the target level is -24 LKFS and a gating function is included. (Lund, 2015)

In Australia, Free TV released the operational practice OP- 59. This recommendation is also based on the ITU BS.1770. It also recommends a loudness target level of -24 LUFS and a maximum true peak level of -2 dBFS. It recommends the anchor signal methodology, but not with short films such as commercials, where the full mix has to be measured, also in those circumstances where dialogues are difficult to isolate. (Free TV, 2010)

29 3. STUDIOS SWR

This thesis will study the implantation and the affectation of the previous regulations and recommendations in the SWR studios, mainly in the Stuttgart television studios. I will be performing a 6-month internship in the Stuttgart broadcast studios and during this period I will be learning how is the practical implementation of those regulations, how to work with them and what the everyday issues are, in order to provide useful information for other broadcasters or users.

3.1. Structure

The SWR is the regional public television and radio broadcaster from the south-western region of Germany, mainly from the federal states of Baden-Württemberg and Rhineland- Pfalz. Nowadays the SWR is producing programs for several television channels (e.g. SWR Fernsehen, DasErste, Phoenix…) and six radio stations in broadcasting.

It is a part of the ARD, the consortium of German public broadcasters, which is the biggest public broadcaster in the world, with 20.616,5 fix employees and a budget of 6.485 € millions last year. (Institut für Medien- und Kommunikationspolitik, 2015)

Regional ARD broadcasters Abbreviation Headquarters city Budget in M. € Westdeutscher Rundfunk WDR Köln 1.390,4 Südwestrundfunk SWR Stuttgart 1.171 Norddeutscher Rundfunk NDR Hamburg 1.078,4 Bayerischer Rundfunk BR München 1.024,6 Mitteldeutscher Rundfunk MDR Leipzig 681 Hessischer Rundfunk HR Frankfurt 491 Rundfunk Berlin-Brandenburg RBB Berlin/Potsdam 434,6 Saarländischer Rundfunk SR Saarbrücken 117,5 Radio Bremen Radio Bremen Bremen 95,9 Table f. Regional broadcasting stations members of the ARD

30

Figure xx. ARD structure

With around 3800 employees and 1.171€ millions, the SWR is the second biggest regional broadcaster in Germany. The SWR has offices in three different cities with ten studios and 23 regional offices. The offices are in Stuttgart, Baden-Baden and Mainz. Programs are produced in all three facilities, although the headquarters are in Stuttgart.

3.2. Implementation of the Loudness regulations in the SWR

As a member of the ARD, the decision to implement the new loudness recommendation in the SWR, as well as in the other regional broadcasters, came from the ARD. The ARD, as a European broadcaster, follows the recommendations from the EBU, and as the loudness regulation was released, they started to study the possibility of its implementation. After extensive awareness training with experts from different broadcasters, such as the engineer Askan Siegfried from the NDR (Nord-Deutschland Rungfunk), who was also participant in the creation of the EBU standard, or from the engineer Florian Kamerer from the ORF (ÖsterreichischerRundfunk10), who also was a member of the EBU experts group for surround sound; the ARD, the ZDF11 and the ORF, together with different private TV stations in Germany and Austria, decided to switch together to EBU R128 the September 31 of 2012. Several other European and worldwide countries followed the switch.

Even though the loudness regulation is fully implemented and functional in the SWR television; in radio and internet is not there yet. Only a few radio station are currently producing according with the EBU R128. The first ARD radio station to implement it was SWRinfo in April 2015; Bayern2, a cultural program from BR (Bayerischer Rundfunk), also implemented the regulations in July 2015, and the rest of radio stations from ARD will do it too in the future. In internet, material from different standards can still be found and the recommendations are not a reality yet.

10Austrian public broadcaster -http://der.orf.at/unternehmen/orf-english100.html 11 ZDF (Zweites Deutsches Fernsehen) – Second German television http://www.zdf.de/

31

Before the implementation of the loudness recommendations, the most of the complaints that SWR received from the consumers were concerning loudness level changes between different channels - and between different program parts within one channel. Once the recommendations were implemented, the amount of consumers’ complaints about this topic decreased drastically.

3.2.1. Loudness control equipment

The implementation in the SWR TV-facilities in Stuttgart was quite untypical as in 2012, during the implementation of the regulation, a new production building was being built and, therefore, thought and equipped to fulfill the recommendation. RTW TM7 and TM9 EBU meters were bought and installed in all the post production workstations (sound studios, image editing rooms, sound control rooms in live studios, broadcasting room, technical control, etc.). Also, Nugen loudness plug-ins were bought and installed in the sound studios and in the image editing rooms. In the other SWR studios, Baden-Baden and Mainz, the implementation was more gradual, but RTW meters were also installed in the key positions, mainly everywhere a program is finished and prepared to broadcast. All those tools are explained below in the section 3.3 My experience with loudness.

At the beginning of the implementation, not all the archive material had the same loudness level, as it was produced before the regulation, so a very important thing, if not the most important, is to control that all the programs that are being broadcasted have the right loudness level. To control that, the SWR installed automatic loudness levelers in the Play Out Center (POC) in Baden-Baden. Those automatic levelers corrected in real time the loudness level of the material that had not been normalized yet. A leveler used at that time was the Inteligain from Evertz. This system was used because the SWR already had Evertz systems installed, and the only thing needed at that time was an actualization with the Inteligain system. After some practical experience it was noticed that its performance was not sufficiently satisfying12.The problem with this and all other real-time systems was that it was not able to look into the future, which means that it could not preview the signal and, therefore, predict what gain changes would be needed. Instead, it adjusted the signal gain during broadcast, taking into account only the present signal. Also the gain transitions were slow to avoid sudden gain changes.

The system used nowadays is the Minnetonka AudioTools Loudness Control Server. It levels the loudness of those files that are not normalized before they are broadcasted. The system is really simple, as only the old files from archive go through this process. This system takes the video file that has to be normalized, it unpacks the video from the audio, it normalizes the loudness of the file by adjusting the gain, and it repacks the file together again. Doing so, the dynamic changes of the mix are respected, and only the general gain is changed to reach the target level.

In the SWR studios, no loudness metadata is used. The whole production system is thought to work without loudness metadata, as all the material produced in the studios should be already being produced as the recommendations state, and therefore no metadata is needed as no further adjustments will be done. The decision of building a “metadataless” system was taken at the implementation phase, because it was a much simpler implementation, and therefore cheaper, than a metadata-dependent system

12 See more in the section 3.4 Daily problems

32 because all the systems used in the production chain should be compatible with the same metadata and file formats. Also, with no metadata, the mixing methodology must be changed as the target level has to be reached in the mixing stage, increasing this way, the quality of the mixes, as there is no longer need to produce over-compressed material in order to be louder than the competence. The only metadata used in the SWR is the metadata included in the Dolby-E for 5.1 surround productions, where the down-mix coefficients must be included.

3.2.2. Loudness in to the workflow

To fully understand the process of loudness control it is important to understand the production workflow and what are the key spots where the, previous mentioned, loudness control equipment is used and why. Therefore, we are going to see the production workflow of a SWR production from its recording until its broadcasting. It must be said, that this is a standard workflow that may not apply to all products as not all of them have the same time or resources constrains.

A standard production workflow follows the following steps:

 Recording: the raw material is recorded inside the studios or outside with recording teams.  Ingest: the recorded raw material is ingested to the studios server system.  Video cutting: some parts of the raw material are selected and edited to form the clip.  Sound editing: all the sounds and music are mixed and if needed an off-voice is recorded and added to the mix.  Image editing: graphics and effects are inserted to the clip.  Color correction: the color of the clip is corrected.  Scrutineering: an extensive technical control is done in order to check if the clip has all the technical specifications needed to continue to the next step, if not, it must be corrected in the previous steps.  Archiving: the clip can be stored in the main storage system.  Preparation for broadcasting: before the broadcast the clip must be loaded from the storage system to the broadcast server and checked.  Broadcasting: the clip is send through the play out center.

To start producing material compliant with the EBU recommendation in the SWR, no significant workflow changes had to be done. The control equipment changed, and engineers had to pay attention to different characteristics than before, but it is more a change of habits than a change of workflow. The next list is the workflow again, but we are going to see where can we find loudness control equipment and why.

 Recording  Ingest  Video cutting: RTW TM7 and Nugen Plug-ins are used, as some clips are finished in this stage because they are not significantly complex and there are big time constrains (news), so there is no time to do an extra sound editing. The RTW TM7 is used to control that the final product reach the targets of the recommendation. The Nugen Plug-in is sometimes used to normalize the loudness level of old material from the archive before to use

33 it and/or mix it with new material. If the material is not finished in this stage, the loudness level is just approximated to the target, as it is not the final product yet.  Sound editing: RTW TM9 and Nugen Plug-in are also used. All the sounds, music and voices are mixed and it is normally the final stage for sound edition. There are clips that are only part of another program, e.g. clips that are going to be played live in a TV show. Those clips are just approximated to the target level, but they have a deviation usually never bigger than ±1 LU, as the engineer in the live control will adjust the final volume of the whole program to reach the target. If the material that is being mixed is an entire program ready to be sent to the POC, the target is tried to be reached while mixing, but if there is a small deviation, the Nugen Plug-in is used to adjust all the parameters.  Image editing: RTW TM7 is also used here, as some of the graphics may contain a sound effect too and also because the editing rooms are also used for video cutting.  Color correction  Scrutineering: A RTW TM9 is used to control that the production has the technical specifications needed. If the specifications are not fulfilled they must be corrected. Depending on the time constrains, the product can be edited again, but if there is no time, an automatic loudness corrector will adjust the signal level before the broadcasting.  Archiving  Preparation for broadcasting: All the material produced before September 2012 has to be corrected before it is broadcasted. It can be done manually in the sound studios, but if there is no time, an automatic loudness corrector is used. The current loudness leveler used is the Minnetonka AudioTools Loudness Control Server, which analyses the file and adjusts the gain of the file to reach the target level.  Broadcasting

As seen, a special case that has to be taken into account, is the procedure to follow with the old material, which is not loudness normalized. When a file from the archive is used, it is immediately normalized, in order to be able to mix it with new material without loudness level differences. If the file that has to be used is a whole program, the Minetonka server in the Play Out Centre analyses de file and normalizes the loudness level before its broadcasting.

3.3. My experience with loudness

During my praxis semester I was performing different functions in the broadcaster, which gave me the opportunity to have a general vision of the whole production chain, and I could see how and where the loudness level recommendations are applied.

3.3.1. Loudness in live broadcasting

My first experience in the SWR studios was in the live productions, such as magazines, political debates, news, etc. I was first learning the methodology in the studio and also in the control room.

34

The in-studio work consists basically in establishing communications between the control room and the studio, activating all the necessary microphones and , rooting all the necessary signals from and to the control room, and checking that all work perfectly all the time. We are not going to focus in this part as it is irrelevant as loudness regulations are concerned.

In the control room, the sound engineer has to take all those signals from the studio and mix them live, and of course, the final result should have the correct Loudness Level, Loudness Range and Maximum Peak Level. To achieve it, the sound engineer has different tools to use.

The sound mixer used in the live control is the Lawo mc2 66. It is a professional mixing console from the German brand, Lawo. The model used in SWR Stuttgart studios has 32+8 faders that can be multiplied as there can be up to 6 banks with two different layers each. It is based in a routing system between the console and the processing unit, with redundant inter-connection paths. It can be accessed from an external computer in order to assure full control at all time. It is compatible with mono, stereo and surround systems up to 7.1, also with Dolby E, and it has an integrated loudness metering system compliant with ITU-R BS.1770. There are many more features to talk about, but we are going to focus in the loudness metering systems that are integrated in the console.13

The mc2 66 it selves has an integrated metering system compliant with ITU-R BS.1770, which is largely compatible with EBU R128 and ATSC A/85, but as a European broadcaster, in SWR the EBU standard is used. That means that the metering offers momentary, short-term and integrated loudness measures, and also in two different scales, +9 and +18.

What I saw in my practical experience is that the display of the loudness measure can be differently configured. The configuration in SWR Stuttgart studios is the next: in the central console panel the integrated loudness value can be shown in a big central number, so the sound engineer is fully aware of the level of the program at all time, the sound engineer can chose which parameter to choose (Loudness Level, time code..), and the loudness level can also be shown in LUFS or as a relative value in LU. The dynamics processes, such as compression, that are applied to the signal are also shown in the main display. Also, the momentary, short-term and integrated loudness measures can be shown in display, but normally the external computer that is also connected to the console is normally showing those levels in an external display. The scale used is the +9 LU scale as it has enough dynamic range for broadcasting. All those parameters can be changed and al the displays can be fully adapted to the engineer’s preferences.

13 More information can be found in: https://www.lawo.com/products/audio-production- consoles/mc266.html

35

Figure xxi. Main display of a Lawo mc2 66 from SWR Stuttgart studios with loudness meter in LU

Of course every signal, which are distributed in the console faders, has its own meter mainly to see graphically what signals are present at that moment, the meter chosen by default in the SWR Studios is a peak meter with a peak hold of 3 seconds to control that the signals do not have too high peaks. The meter for every fader can be changed to the next different options: the previously commented peak meter, a fast peak meter with a 1 ms integration time, a momentary loudness meter, or a VU meter.

Figure xxii. Parameters in the Lawo console meters

36 The signal that goes through the EBU R128 compliant loudness meter is only the final mix of all the signals, as this meter is meant to measure the loudness of a whole program. In the next figure we can see the screen of the external computer linked to the mixing console, where EBU compliant meters are shown. The one with the three scales is metering the final sound of the program which is being broadcasted. All the others have just the M scale for both channels of the stereo signal. This structure can be fully adapted to show different parameters for each signal.

Figure xxiii. External computer for Lawo mixing console monitor

The mc2 66 has yet another EBU meter integrated, but this one is independent of the others as in fact it is a RTW TM7. The RTW TM7 is a touch monitor which normally is a separated module, but in this case, thanks to a business deal that was announced in 2010 in the NAB (National Association of Broadcasters) show in Las Vegas (RTW GmbH & Co. , 2010), it is fully integrated in the Lawo mc3 66 just next to the main display.

The touch monitor is fully adaptable to every sound engineer. There are different presets with different configurations in order to reach the target easier. Some sound engineers like to see all three meters in the display (M, S and I), some others prefer to see just the M and S measures as the integrated one can also be seen in the main display. All this presets and configuration data are stored together with the project session of the Lawo mixing console for Figure xxiv. RTW TM7 from Lawo mixing console every sound engineer in the house.

37 Even though the two metering systems (Lawo and RTW) are independent, they normally do not differ for more than 0.1 LU from each other, and it is mainly because they are not reset at the exact same moment.

With this redundant adaptable metering system, sound engineers do not have problems to reach the -23 LUFS target level. As usually before the live program starts all microphones are tested, and the levels are pre-adjusted depending on the presenter. Also, all the music pieces and external videos are checked and small notations are made depending on how loud every signal is, because attenuation or an increment of the signal level might be needed when played live in order to reach the target. However, all reportages produced in the SWR should all have approximately -23 LUFS as integrated loudness level, so no further big adjustments should be needed when playing them live.

Before the program starts, there is another thing to check. The SWR has studios in three different cities, and before a program starts in one of those studios, the connection between the studios and the Play Out Center has to be checked. To do so, the SWR does not follow exactly the EBU R68 recommendation. Instead of 1 kHz at -18 dBFS, the SWR uses an escalating signal of 1 kHz starting at -30 dBFS, after two seconds it increases the level until -18 dBFS, and after two seconds more it increases until -9 dBFS; after two seconds more it starts again. This way, the EBU recommendation is integrated, and an extra parameter is checked. By increasing the level of the signal the engineers can also detect if there is an unwanted dynamic process in the connection path between the two stations.

3.3.2. Loudness in sound production and post-production

Once I learned how the studio work was done, my next step in my praxis semester was in sound production and post production. There, small clips for different programs are usually mixed, but also bigger productions are done. The material comes ready to mix from the video cutters, where the producer already selected the ambient sounds and the music.

The system in postproduction is from Avid, and it consists on a Media Composer computer connected via Interplay to the ProTools computer. The two systems need to be connected as they have to work together in order to provide video and sound at the same time. They do not have a master-slave relation, instead, they just run the files together.

The ProTools used is the ProTools 10 HD, which is controlled by a mix console from Digidesign. This mix console is, in fact, not a mixer but a remote control from the ProTools system. There are many sound engineers in the house and each of them has an individual ProTools session with different fader distribution and screen memories.

Once all the systems are on and working properly, the mixing can be started. In order to achieve an appropriate sound level, ProTools has its usual peak meters to control that no signal is clipping, but the most important metering instrument is the TM9 from RTW. This is a touch monitor really similar to the previously explained, TM 7 incorporated in the Lawo mixing console, but this time it is a separate module. It is a 9 inch touch monitor, with a flexible configuration graphical user interface, on which the engineer can chose what parameters to display. The configuration used in the SWR consists on an 8-channel true-peak meter in dBTP, the three loudness scales (M, S and I) in LU, and a spectrogram

38 as main indicators, a small phase and correlation display in the right top corner, and in the left bottom one, we can see the LRA, maximum TP and S values displayed.

Figure xxv. RTW TM 9 touch monitor from a sound editing room in the SWR Stuttgart studios

Before my praxis semester I did not have much mixing experience and no experience with loudness meters at all. Even though, it was not difficult for me to get a good mix that reached the loudness level target after a couple of indications from the engineers.

The first indication given was: “You have to mix with your ears”. I had never done it before, but it was quite intuitive. Everything had to sound natural, and there was freedom for the loud parts to be loud and the soft parts to be soft. I just had to make sure all the elements present in the scene were coherent with each other, and that the music added transmitted what the producer or author wanted to transmit.

Secondly, a good monitoring level is needed to let the loud parts be loud and the soft ones, soft. In the SWR studios there is no predetermined monitoring level. Instead, every sound engineer has its own monitor gain level depending on their preferences, and audibility capacities. I noticed that if the gain monitor is too low, the loudness meter indicates that the mix is too loud and vice versa. So, as we are mixing with our ears, the monitoring level is very critical. The monitoring system used in the SWR is a Genelec 5.1 system that consists in five 1238CF (with DSP inside) SAM™ Studio Monitor and a 7270 SAM™ Studio Subwoofer. This system is used in the post-production and live studios and it was calibrated with the Genelec auto-calibration technology AutoCal™ and Genelec Loudspeaker Manager (GLM™) control network technologies. Every monitor in this system is connected with each other creating a network and the system automatically align every monitor on this network in terms of level, timing, and equalization of room response anomalies. (Genelec Oy, 2015)

Finally, the third indication was that speech intelligibility is very important. I had to make sure that the voices in the films were understandable for everybody, and in any audition environment. This is probably the most important one, as without intelligibility the films lose their purpose: to communicate.

To make sure that all the dialogs were understandable, I just placed the voice loudness level in my center of reference, which was -23 LUFS. That means, that I just made sure

39 that all the voices had an average level of -23 LUFS, and let the other elements free around the speech level. This technique is discussed in previous parts of this work14, and also in the EBU 3343 document, where the reference signal is referred as anchor signal. To get the anchor signal level right, I always adjusted its level depending on de value of the Short-Term loudness meter. I normally did not look at the Momentary meter, as it did not give me any relevant information, as the voices should have momentary level fluctuations to be expressive and sound natural. In order to assure speech intelligibility, filters were also applied to the voices, to filter all the frequencies that may disturb. This methodology works normally, but sometimes it presents problems in some situations15.

By doing all those things, the target of -23 LUFS was normally reached or nearly reached, with a deviation of normally no more than ±1 LUFS. I also noticed, that when mixing short clips, the deviation is bigger than when mixing long films (half an hour approximately), where the target was usually perfectly reached. When the target is not perfectly reached there are several options. If the piece that we are mixing will be played back in a live production, e.g. a magazine show, no further adjustments are needed, as the clip is going to be sent through the mixing console of the live studios, and the sound engineer will adapt the level of all the signals to reach the target of the whole program. But, if the piece that we are mixing is a whole program ready to broadcast, then we have to adjust the loudness level to reach the target perfectly. To do that, the SWR has a ProTools plug-in that analyses and adjusts the average loudness level of our mix.

The plug-ins used are the NUGEN Audio LM-Correct. It is a loudness analyzer and corrector, capable of working up to 100 times faster than real time, compatible with mono, stereo and surround up to 5.1 files. It is compliant with EBU R128, ITU BS.1770 and CALM Act. Momentary and Short term maximum levels can be set. A LRA target is also optional (NUGEN Audio, 2016).

In the SWR studios this plug-in is set to be compliant with the EBU R128, that means that the loudness target level is set at -23 LUFS and that the True Peak maximum level is at -1 dBTP. The application analyses the mix faster than real time, and indicates if the file should be adjusted or not. Then by just pressing render, it automatically adjusts the values in order to meet the targets of the regulation. Extra parameters, such LRA or Maximum Momentary or Short-term Loudness Level, can be also adjusted, but in the SWR studios, those options are normally inactive. Normally four stereo tracks are saved with each clip, but only the main one is normalized as it is the only one that will be broadcasted.

14 See 2.2.4.2 What to measure 15 See 3.4 Daily problems

40

Figure xxvi. Nugen Audio LM-Correct plug-in interface (NUGEN Audio, 2016)

The four stereo sound tracks are the next: the main one, is the ST (Sende Ton) which is the final mixed sound of all the elements ready to broadcast. The second one is the IT (Internationaler Ton), which is all the ambience sounds and voices present in the video and all the background music. The third one is the IT ohne Musik, which consists on the same elements of the IT sound but without the music. And finally, the fourth track is the off voice, where only the off voice recorded in the studio can be found. Every small clip is saved like this in order to be able to reuse the material in any situation, as the video with its ambience sound is saved, but without the music or off voices.

When a whole program is produced, the four stereo tracks of the file are organized differently: the first one has the ST sound; the second one the IT sound; the third one, however, has the audio-description track; and the last one has the Dolby-E information for surround systems. Only in the clean feed16 version of the program the organization of the sound tracks is as explained in the previous case (ST/IT/IT without music/Off-voice).

3.4. Daily problems, abuses and shortcomings

All changes need a transition time, and during this time problems are detected and corrected until the optimal situation is found. The implementation of this regulations suppose a change of habits and of course many subjects were discussed deeply as some shortcomings were detected in the first approach.

All the next situations, and examples are real situations experienced by myself or by sound engineers with which I had the opportunity to discuss and share opinions and experiences during my praxis.

16 A clean feed version of a program consists in the same program, but without all the graphics or color corrections made or inserted in the recorded video.

41 3.4.1. Automatic loudness level systems

One of the problems detected concerned the automatic loudness levelers of the Play Out Center in Baden-Baden. The system was the, before mentioned, Inteligain from Evertz. This system was not able to see the loudness changes before they happened, and it only changed the signal gain when it detected that the signal was too loud at that moment. This caused several problems in the next situations:

 Loud-soft transition

When the real time leveler detected that a program was too loud (e.g. -18 LUFS), it starts reducing the gain until the target level is reached (e.g. -5 LU to reach -23 LUFS). Then a file with the correct loudness level comes and, as it cannot predict the level of the next file, the gain is still at -5 LU. Therefore, the next file will start at -28 LUFS. Because of the slow transitions in order to avoid sudden loudness differences, it takes a while until the level reaches the target again. This causes that, during this transition time, the consumers have already used the remote control to turn up the volume as the beginning of the program is too soft. When the gain is adjusted again, the consumers have to use the remote again to turn down the volume as now it is too loud. As we can see, this causes the reverse effect of the aim of the regulations, the consumers have to adjust the reproduction level two times as the gain adaption is too slow.

Figure xxvii. Graphic representation of a loud-soft file transition

 Soft-loud transition

The same happens with the opposite situation. When a file is detected to be too soft, the leveler increases the gain to reach the target. If a louder file comes next, the gain is still increased, and therefore, the file is even louder at the beginning. By the time the leveler detects the loudness level and decreases it, the customers at home have already jumped from the sofa and turned the volume down. Again, when the leveler has reached the target level, the customers have to use the remote again to turn up the volume as now it is too soft.

42

Figure xxviii. Graphic representation of a soft-loud file transition

Of course this situations caused some troubles to the customers, but also to the sound engineers who saw their mixes changed in a bad way. Those situations happened in real life, below we can see a graphic obtain by measuring the signal of a broadcaster.

Figure xxix. Measurements from actual loudness transitions during broadcasting

This only happened at the beginning of the implementation, as the decision to change the leveler was taken. The current method used to normalize the incorrect material is quite different. The system used is a Minnetonka system, and as said before, only the files that have not been normalized go through the normalizing process. This method is much better than the old one, as respects the dynamics of a mix and only adapts the gain of the whole signal. The only problem of this method, is that it has to be done before the file has to be broadcasted and it is time consuming. Because of the complexity of the unpacking and

43 packing process the system works practically in real time. That means that if a program of 90 minutes has to be normalized it will take almost 90 minutes to be done. This may suppose a problem, as this calculation time has to be taken into account, because the file has to be ready to broadcast at the right time.

Another problem related with the calculation time, was faced at the beginning of the implementation. At that time, a lot of not normalized material had to be broadcasted, as almost all the archive material was produced before the regulation. This situation caused that the leveler had to be constantly working and sometimes the time was tight. Nowadays, most of the programs broadcasted are already produced according to the recommendation and only one or two files per day have to be normalized.

3.4.2. Tricky situations

Despite all the techniques and technologies explained in this study, there are some situations in which reaching the target level may be a little bit tricky, or if reached, the result may not be very satisfying. In some situations, also, sudden loudness level changes may also occur despite following the recommendations. Here, I present some of these situations:

In films there are typically more dynamics than in broadcasting, and therefore, there are very loud parts and very soft ones. In broadcasting, though, there are not that much soft parts, as the listening environment does not permit it, but also because the material broadcasted is mainly talk-shows, news, magazines, etc. In this type of programs the voice has the main role, and some music and effects may also be added to it.

As explained before, it is intended for the voice to have a level of -23.0 LUFS in order to assure its speech intelligibility and because it is the main element present in almost every production. This cannot be done in all situations for the target to be reached, and may cause some trouble. The next situation may be an example of it:

 Voice loudness level differences

Imagine a talk show where there is a moderator that presents the show and also a guest that talk about some daily topics. There are some different sections in the show and between each part an energetic music piece is played. This music should be loud in order to have some presence, to keep the attention of the viewer and to fill those transition moments in the show. In order to be coherent, the music should be definitely louder than the voices, but that may present some trouble. If the voices are leveled at -23 LUFS, and the music should be louder than the voices the target level cannot be reached.

44

Figure xxx. Program scheme with -23LUFS voice level

Therefore the levels have to be adjusted to reach the level but also to maintain the coherence between the voice and the music. Now the voice is softer than -23 LUFS to leave some space to the music to be louder, and the target is perfectly reached.

Figure xxxi. Program scheme with -23LUFS program level

Even though the target is reached, this may cause some troubles because, as said before, voice is common in almost all the productions. That means that the viewers that may change the channel while watching this program will perceive a loudness level change between both channels as the voice of the first program was softer than the second one. This is the reason, that some engineers would prefer the dialog level loudness normalization instead of the EBU R128. Also some studies has shown that consumers prefer a constant dialog loudness level rather than a constant program loudness level. (Carroll, Jones, & Williams, 2007)

45

Figure xxxii. Voice level differences between programs

In another particular situation a similar problem occurs. When mixing a sports program (normally football), it is typical to have the best moments of the game, with some comments of an off voice added, and after, some comments of the trainer or players during the press conference with no more ambience added. As said before, it is wanted that the voices have a level of -23.0 LUFS in order to assure intelligibility. Here, the main voice is mixed together with a football stadium ambience sound, so it will be mixed softer, as the sum of both (ambience and off voice) should be near -23.0 LUFS as it is the main part of the program, timely speaking. The comments of the trainer during the press conference, however, have no added ambience and can perfectly be mixed at -23.0 LUFS. The problem here is that the two voices will have different loudness levels, and this difference will be perceived as a loudness level jump by the consumer, even though the two parts of the program have matched the target. It is also incoherent as the main voice should be the off voice in this case.

 Sudden loudness level changes

A situation where I realized that the loudness level suddenly jumped was while watching an action movie in a private German broadcaster. Action movies have normally more loud parts than other kinds of movies, as there are lots of shots, explosions, car chases and loud music. As the average loudness level of those films must be -23.0 LUFS when broadcasted, the soft parts of actions movies (e.g. dialogue) are very soft in order to reach the target. What happened while watching the film, is that there was a commercial brake during one of those soft parts. Of course, the commercial was played at the right loudness level, but it sounded very loud in comparison with the soft part of the movie.

46

Figure xxxiii. Loudness level difference between a film and a commercial

One more example of a sudden loudness level change can occur in the transition between two programs. Imagine a program is being broadcasted live, and the sound engineer realizes almost at the end of the program that he is mixing too soft and that he still has some “space” to mix louder. Then, in order to create impact and to enhance the final music of the show, he mixes the end of the program louder. When the next program starts, it will be much softer than the previous one, as the ending and beginning of the programs have very distant levels. A further recommendation might be that the beginnings and endings of the programs should be not too distant (±1.0 LU for example) from the target level in order to avoid these kind of loudness level changes.

3.4.3. Commercial loudness abuse

It is well known that the main problem, before the loudness regulations, were loud commercials. Now, with the new recommendations, this fact has changed significantly, but there are still some cases that commercials take advantage of the regulation.

A particular case of this situation are the drugs commercials. In these commercials there is a safety message that states that the consumer should read the drug precautions and should talk to the pharmacist before taking the drug. This message is typically at the end of the commercial and it has an approximate duration of 4 seconds. It may not seem much, but in an advertisement it is a significant amount of time. The strategy of this commercials is to make the commercial content louder than the target level and making the final safety message much softer so that the average loudness level is -23LUFS. By doing this the drug commercials can be louder than others producing again loudness level differences between programs.

Because of some of this situations some broadcasters are applying some extra regulations to their content, as the EBU R128 is just a recommendation. Some have add a maximum Momentary and Short-term level, some others have a maximum Loudness Range and/or True Peak level, depending on their preferences and needs.

47 4. CONCLUSION

In the 2000s the loudness war was causing struggles, also in broadcasting, with uncomfortable loudness level differences and very loud commercials. To put an end to it, loudness level recommendations were created and they have been one of the most significant changes in the history of broadcasting sound. Thanks to a lot of effort from regulatory institutions, broadcasters, manufacturers and professionals, the recommendations from the EBU R128 are nowadays a reality in many European television and radio stations. Since its creation a big debate started in all broadcast companies to figure out if it was, or not, a good solution for the loudness level differences that were causing many complaints at that time.

In the SWR, and in Germany in general, it was decided to take a step and make a change in 2012, but to do it new technology and knowledge was needed. In the SWR a relatively simple solution was found for its implementation, proving that it is possible to follow the recommendations without making big significant changes in the workflow or in the equipment used in production. To make it possible, the professionals of the house had to be trained as the mixing methodology had to change. Also, some equipment had to be purchased, such as EBU compliant meters, plug-ins and automatic loudness levelers. As time passed, experience was acquired and some problems or opportunities to improve were detected, so some adjustments to the original plan were made and some equipment was changed.

Now, after four years of its implementation in the SWR studios, the recommendations are fully implemented and integrated in the daily work of the engineers. After many hours of discussion with the professionals we concluded that, although no rule can be perfect for all situations, the EBU R128 is a significant step in the right direction. There are some situations where the recommendation does not adjust well enough, but it was also mentioned that those situations also happened before the recommendation was implemented. Because of this situations, there are some professionals who think that a dialog level normalization, together with loudness range recommendations would adjust better to the needs of the broadcasting industry, and that after all this time, revising the topic would be a good idea.

It has been detected that these recommendations are not being applied in all European broadcasters and that there still exist loudness level differences between programs and channels. Therefore, in my opinion, a similar model as the one implemented in the USA would help end up with this situation, as it is mandatory by law (CALM Act) to maintain a specified integrated loudness level.

Radio senders are a good example of broadcasters that have not implemented the regulations yet. This thesis focuses only in the implementation of the loudness level recommendations in television, but a very interesting extension of the topic would be its implementation in the radio. This has not been covered here because of time constrains, as the dynamic processes in the radio are relatively complex. Therefore, I strongly recommend future researchers to write about this topic. Also, the SWR is a television channel with no commercials, and it would have been interesting to see how commercials are treated before their broadcasting. It would be also interesting to focus in surround sound productions, as more parameters have to be taken into account while mixing in order to reach the target level. This topic is not extensively covered in this thesis, as I had not the opportunity to work in any surround sound production during my praxis semester.

48 With this thesis I increased my knowledge about the topic. I also gave out, put together and summarized relevant information for other broadcasters to implement the recommendations. Furthermore, I pretended to increase the awareness of the consumers too, as almost everyone has a TV at home and can detect loudness level differences. After reading this thesis, everyone should be able to understand what is happening when a loudness level change is detected, and I hope that this will help consumers choose and appreciate the content they want to see and, of course, to hear.

49 5. TERMINOLOGY

[1] Dynamic Range: difference of a signals level between its softest and loudest.

[2] Crest Factor: It was known as the difference between the highest peak of a signal and its average level, measured in dB. With the apparition of the program loudness, it can also be defined as the difference between the signals highest sample peak and the average loudness level.

[3] Sample Peak Level: the highest absolute numeric value of the samples of a file.

[4] True Peak Level: the level of a peak of a signal detected with a True Peak meter by upsampling the signal and estimating the inter-sample values, being able to detect intersample peaks, which are often higher than sample peaks

[5] Intersample Peak: Additional peaks that are not represented by the samples of a file that can occur between samples when filtering, changing the sample rate or playing the signal through a DAC. They are normally higher than sample peaks and they are common in highly processed material with lots of compression, limiting, etc. They can be detected with a True Peak meter by upsampling the signal. That is why they can also be known as True Peaks.

[6] Headroom: The space (measured in dB) between the highest peak of a signal and the level where clipping and distortion starts to occur due to overloading.

[7] Leq(RLB): measure of the average energy of an audio signal in a period of time using the RLB (revised low-frequency B-curve) weighting curve.

[8] K-filter: Filter used in the ITU BS.1770, it is implemented in two steps. First, a pre- filter that consists in a shelving filter that boosts the high frequencies to compensate the acoustic effect of our head, and then a high-pass filter, known as RLB (Revised Low- frequency B-curve), that approximates the human loudness perception depending on the frequency band.

[9] PLOUD Group: The loudness project group with over 240 participants including creative and technical experts from the European Broadcasting Union (EBU), creators of the EBU R128 Loudness Recommendation and all the complementary documentation.

50 6. BIBLIOGRAPHY

Audio Engineering Society. (2014, June 13). An audio timeline. Retrieved from Audio Engineering Society: http://www.aes.org/aeshc/docs/audio.history.timeline.html Bernard, P. (1987). Leq, sel, what? why? when? Retrieved from Brüel & Kjær. Camerer, F. (2010, September 6). On the way to Loudness nirvana - audio levelling with EBU R 128. Retrieved from EBU: https://tech.ebu.ch/docs/techreview/trev_2010-Q3_loudness_Camerer.pdf Carroll, T., Jones, G. A., & Williams, E. A. (2007). Chapter 5.18: Audio for digital television. In National Association of Broadcasters Engineering Handbook (Tenth Edition) (pp. 1309-1330). Focal Press. Deruty, E. (2011, September). Sound On Sound. Retrieved from http://www.soundonsound.com/sos/sep11/articles/loudness.htm Devine, K. (2013). Imperfect sound forever: loudness wars, listening formations,. Retrieved from City Univerity London: http://openaccess.city.ac.uk/3883/ Digital Domain, Inc. (2013). Part II: how to make better recordings in the 21st Century - An integrated approach to metering, monitoring, and leveling practices. Retrieved from Digital Domain: http://www.digido.com/how-to-make-better- recordings-part-2.html EBU. (1998, May). Tech 3276 – Listening conditions for the assessment of sound programme material: monophonic and two–channel stereophonic. Retrieved from https://tech.ebu.ch/docs/tech/tech3276.pdf EBU. (2000). EBU technical recommendation R68-2000 alignment level in digital audio production equipment and in digital audio recorders. Retrieved from https://tech.ebu.ch/docs/r/r068.pdf EBU. (2004, May). EBU Tech 3276-E Listening conditions for the assessment of sound programme material - Supplement 1. Retrieved from https://tech.ebu.ch/docs/tech/tech3276s1.pdf EBU. (2011, August). Tech 3343 - Practical guidelines for production and implementation in accordance with EBU R 128. Retrieved from https://tech.ebu.ch/docs/tech/tech3343.pdf EBU. (2014, November). EBU R128s1 - Loudness parameters for short-form content - Version 1.0. Retrieved from EBU Tech: https://tech.ebu.ch/docs/r/r128s1v1_0.pdf EBU. (2016a, January). Tech 3342 - Loudness Range measure to supplement loudness normalisation. Retrieved from https://tech.ebu.ch/docs/tech/tech3342.pdf EBU. (2016b, January 25). EBU R128s1 - Loudness parameters for short-form content - Version 2.0. Retrieved from EBU Tech: https://tech.ebu.ch/docs/r/r128s1.pdf EBU. (2016c, January). Tech 3341 - ‘EBU Mode’ metering to supplement Loudness normalisation. Retrieved from https://tech.ebu.ch/docs/tech/tech3341.pdf

51 Federal Communications Commission. (2015, December 15). Loud commercials. Retrieved from Federal Communications Commission: https://www.fcc.gov/media/policy/loud-commercials Fleischhacker, S. (2014, March). Audio loudness analysis - Technical white paper. Retrieved from Sencore: http://www.sencore.com/sites/default/files/Audio%20Loudness%20Analysis.pdf Free TV. (2010, July). Free TV australia operational practice OP-59. Retrieved from http://www.freetv.com.au/media/Engineering/OP59_Measurement_and_manage ment_of_Loudness_in_Soundtracks_for_Television_Broadcasting_-_Issue_1_- _July_2010.pdf Genelec Oy. (2015, October). Designed to Adapt - Genelec Smart Active Monitoring (SAM™) Systems. Retrieved from Genelec Studio Monitors: http://www.genelec.com/sites/default/files/media/Studio%20monitors/Catalogue s/genelec_sam_brochure_2015.pdf Institut für Medien- und Kommunikationspolitik. (2015, October 12). ARD. Retrieved from Media database: http://www.mediadb.eu/en/data-base/international-media- corporations/ard.html ITU. (2015, 10). Recommendation ITU-R BS.1770-4. Algorithms to measure audio programme loudness and true-peak audio level. Retrieved from ITU: https://www.itu.int/rec/R-REC-BS.1770/en Johansen, L. G. (2006). and audibility - fundamental aspects of the human hearing. Retrieved from University College of Aarhus: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.580.3611&rep=rep1& type=pdf Katz, B. (2000, September). An integrated approach to metering, monitoring, and levelling practices. Retrieved from Digital Domain, Inc. Katz, B. (2013). iTunes Musc : mastering high resolution audio delivery : produce grat sounding music with Mastered for iTunes. Burlington, MA: Focal Press. Lawo AG. (2016). mc2 66. Retrieved from https://www.lawo.com/products/audio- production-consoles/mc266.html Lund, T. (2011, April). ITU-R BS.1770 Revisited. Risskov, Denmark: TC Electronic A/S. Lund, T. (2015). TC Electronic. Retrieved from http://www.tcelectronic.com/loudness/ NUGEN Audio. (2016). LM-Correct 2. Retrieved from NUGEN Audio: http://www.nugenaudio.com/lm-correct-loudness-correction-automatic-quick- fix-plugin-aax-au-vst_19#features Qualis Audio, Inc. (2013, May 06). Loudness variation when downmixing. Retrieved from Qualis Audio: http://www.qualisaudio.com/documents/TechNote-4-5-6- 2013.pdf

52 RTW GmbH & Co. . (2010, April 14). RTW und LAWO kündigen zur NAB 2010 Zusammenarbeit an. Retrieved from RTW News: https://www.rtw.com/de/ueber-rtw/news/nachrichtendetails/article/rtw-und- lawo-kuendigen-zur-nab-2010-zusammenarbeit-an.html Schmid, H. (1976, October 4). Audio Program Level, the VU Meter, and the Peak- Program Meter. Retrieved from IEEE. Shepherd, I. (2011). Loudness war research. Retrieved from Dynamic Range Day: http://dynamicrangeday.co.uk/research/ Shepherd, I. (2011, February 20). So, Justin Bieber is louder than Motorhead, AC/DC and The Sex Pistols… – wait, WHAT ? Retrieved from Production Advice: http://productionadvice.co.uk/loudness-war-infographic/ Shepherd., I. (2011). Dynamic Range Day. Retrieved from http://dynamicrangeday.co.uk/about/ Soulodre, G. A. (2004). Evaluation of objective measures of loudness. Retrieved from Canadian Acoustics. Sreedhar, S. (2007, August 7). The future of music. Retrieved from IEEE Spectrum: http://spectrum.ieee.org/computing/software/the-future-of-music SWR. (2016). Das ist der SWR. Retrieved from Organisation: http://www.swr.de/unternehmen/organisation/-/id=7687068/6dr3wg/index.html Taylor, G. (2012, February 20). What's the difference between LKFS and LUFS? Retrieved from Game audionoise: http://gameaudionoise.blogspot.com.es/2012/02/whats-difference-between-lkfs- and-lufs.html Vickers, E. (2010, November 4-7). The Loudness War: background, speculation and recommendations. Retrieved from http://www.sfxmachine.com/docs/loudnesswar/loudness_war.pdf Yonge, M. (2008, April). Audio wave forms and meters - Extra. Retrieved from Line Up: http://www.ips.org.uk/files/10b_Audio_Waveforms_And_Meters_Extra.pdf

53