Pyloudnorm: a Simple Yet Flexible Loudness Meter in Python
Total Page:16
File Type:pdf, Size:1020Kb
pyloudnorm: A simple yet flexible loudness meter in Python Christian J. Steinmetz 1 Joshua D. Reiss 1 Abstract dardize, simplify, and improve upon previous approaches, the ITU-R BS.1770 recommendation was proposed (ITU-R The ITU-R BS.1770 recommendation for measur- BS.1770-4), and has now seen widespread adoption (Lund, ing the perceived loudness of audio signals has 2011; 2012). The proposed metering algorithm was later seen widespread adoption in broadcasting, and included in the EBU R 128 recommendation, which dictates due to its simplicity, this algorithm has now found loudness for broadcast material (EBU R 128). applications across audio signal processing. Here we describe pyloudnorm, a Python package that The ITU-R BS.1770 recommendation proposes a straight- enables the measurement of integrated loudness forward algorithm consisting of frequency-weighting fil- following the recommendation. While a number ters and gated energy measurements. The algorithm has of implementations are available, ours provides been shown to correlate well with the perceived loudness an easy to install package, a simple interface, and of broadband content, is computationally efficient, and rela- the ability to adjust the algorithm parameters, a tively easy to implement. For all these reasons it has now feature that other implementations neglect. We seen widespread adoption in the broadcast industry, and with discuss a set of modifications that we incorporate the rise of online streaming platforms, interest in content based upon recent literature that aim to improve normalization has sustained (Katz, 2015; Grimm, 2019). the robustness of the loudness measurement. Fi- While this recommendation has been found to correlate nally, we perform an evaluation comparing the ac- well with broadband content, further listening studies have curacy and runtime of pyloudnorm with six other discovered that this is not always the case, especially for nar- implementations, identifying issues with several rowband content (Cabrera et al., 2008; Pestana et al., 2013; of theses implementations. Fenton & Lee, 2017; Fenton, 2018). These investigations have led to a series of proposed modifications to the original recommendation, which generally consists of adjustments 1. Introduction to the algorithm parameters for the frequency-weighting filters and gating block sizes. While these modifications The nonlinear nature of the human auditory system makes are relatively straightforward to incorporate, there has been measurement of the perceived loudness of sound challeng- limited adoption thus far. ing (Stevens, 1955). While subjective loudness has been an active area of research in psychoacoustics over the last half While originally intended for broadcast scenarios, due to its century (Stevens, 1956; Zwicker & Scharf, 1965; Moore simplicity and efficacy, the ITU-R BS.1770 recommenda- & Glasberg, 1996; Moore, 2014), these models are often tion has now found applications across audio signal process- complex and not applicable to measuring the loudness of ing (Olive et al., 2013; Jillings et al., 2015; Friberg et al., streaming or recorded audio. For this reason, there has 2014; Schoeffler et al., 2013; Ward et al., 2012; Mansbridge been a longstanding interest within the broadcast industry in et al., 2012; Ward & Reiss, 2016; Fenton, 2018). In or- simple loudness models, as they enable the ability to moni- der to meet the demand for applications outside its original tor and control the listener experience (Bauer et al., 1967; scope, a number of implementations now exist that provide a Jones & Torick, 1981; Bauer & Torick, 1966; Skovenborg & programmatic interface. Unfortunately, most of these imple- Nielsen, 2004; Soulodre, 2004; Lund, 2006). Concurrently, mentations are either difficult to install, provide an interface there has been interest in methods for measuring the loud- that is not efficiently accessed from Python, or do not allow ness of music such as Vickers’ loudness (Vickers, 2001) for modifications of the algorithm parameters. For these and ReplayGain (Robinson, 2002). In an effort to stan- reasons, we built pyloudnorm1, a Python package that is 1 easily installed and integrated into existing projects, while Centre for Digital Music, Queen Mary University of Lon- also providing the ability to adjust the underlying algorithm don, London, UK. Correspondence to: Christian J. Steinmetz <[email protected]>. parameters. 1https://github.com/csteinmetz1/pyloudnorm pyloudnorm Figure 1. Measurement of integrated loudness following the ITU-R BS.1770 recommendation (ITU-R BS.1770-4). The simplicity and flexibility of this package has lead to block is then given by an interest in pyloudnorm, which a variety of works now X utilize. Current applications include pre-processing for au- lj = −0:691 + 10 log10 gi · zi;j; dio machine learning datasets, such as the CLEAR dataset i for acoustic question answering (Abdelnour et al., 2018), where gi = [1; 1; 1; 1:41; 1:41], for the left, right, centre, a dataset of multichannel hearing aid recordings (Fischer left surround, and right surround, respectively. et al., 2020), the LibriMix dataset which generated mixtures of speech for source separation (Cosentino et al., 2020), the The final step involves applying a gate in order to reduce the creation of music mixtures in the Slakh dataset (Manilow influence of blocks with low energy. An absolute threshold et al., 2019), and the OrchideaSOL dataset (Cella et al., is given by Γa = −70 dB LUFS, along with a second rela- 2020), which features orchestral recordings. In addition to tive threshold Γr, which is determined by first measuring dataset pre-processing, there have also been applications in the loudness of all the blocks above the absolute threshold feature extraction for machine learning with the Surfboard li- and subtracting 10 brary (Lenain et al., 2020), as well as in data augmentation in ! X 1 X Scaper (Salamon et al., 2017), and as a final post-processing Γr = −0:691 + 10 log10 gi zi;j − 10; jJgj step in voice conversion (Chen et al., 2020). i Jg The structure of the paper is as follows. In Section2 we de- where Jg = fj : lj > Γag, and jJgj is the number of scribe the algorithm and introduce recently proposed modifi- blocks above the threshold. This enables us to compute the cations. Then in Section3 we introduce pyloudnorm, along final integrated loudness in the same way by summing only with other existing implementations. Section4 presents an blocks that fall above both thresholds evaluation of these implementations using the compliance ! material provided by the recommendation, as well as our X 1 X LKG = −0:691 + 10 log10 gi zi;j ; own collection of examples. We finally present conclusions jJgj i Jg in Section5. this time where Jg = fj : lj > Γa and lj > Γrg. 2. Algorithm 2.1. Proposed modifications The proposed algorithm is outlined for the stereo case at The recommendation makes clear that loudness measure- a high level in Fig.1. First, the “K”-frequency weighting ments correlate well with perception only when the signal consists of a high-shelf filter that aims to mimic the response being measured is broadband in nature. of the head, followed by a highpass filter that reduces the influence of low frequencies. Then we take the filtered It should be noted that while this algorithm has signal of each channel y , and split this into overlapping i been shown to be effective for use on audio pro- blocks of 400 ms, with an overlap of 75%. We then compute grammes that are typical of broadcast content, the the energy of each block j in each channel i algorithm is not, in general, suitable for use to es- N timate the subjective loudness of pure tones. (ITU- 1 X z = y [n]2; R BS.1770-4) i;j N i;j n=1 While this is not often an issue when employed in broadcast where N is the number of samples in each block, and n is scenarios, the use of this recommendation in other appli- the sample index within the block. The loudness of each cations has continued to increase, potentially leading to pyloudnorm inaccurate measurements. For example, when measuring This enables users to carry out loudness measurements fol- the loudness of narrowband or percussive content, such as lowing the recommendation without any underlying knowl- isolated instruments, sound effects, impulse responses, as edge of the algorithm and its parameters. But, as outlined well as other recordings. in the previous section, a number of modifications have been proposed, which improve performance for some use The degree to which measurements produced by the rec- cases. To facilitate the integration of these modifications, ommendation deviate from perception has been studied. A pyloudnorm additionally exposes the underlying algorithm number of modifications have been proposed that aim to parameters to users if desired. To enable use of the proposed improve the performance and robustness of measurements. modifications we include pre-defined filter specifications. Cabrera et al.(2008) proposed some of the first adjustments, This facilitates the use of these modifications by simply informed by a number of listening studies. This involved passing a corresponding string while instantiating the meter. raising the cutoff frequency of the highpass filter to 149 Hz, We hope that this will enable users to make more accurate and the replacement of the high-shelf filter by a notch filter measurements in the growing and diverse applications of centered at 1 kHz. Pestana & Barbosa(2012) first identified the loudness recommendation. potential shortcomings of the recommendation for common multitrack sources, and later suggested adopting a smaller In addition to the modifications that have been proposed gating block size of 280 ms in combination with a +10 dB thus far, pyloudnorm also includes the ability to easily adopt gain on the high-shelf filter (Pestana et al., 2013). new modifications. This includes any adjustments to the gating block size and frequency-weighting filters. This even More recently, Fenton & Lee(2017) provided two alterna- enables users to define any cascade of arbitrary second-order tive frequency-weighting filters.