New tools for use in the musicology of record production Kirk McNally, George Tzanetakis, Steven R. Ness University of Victoria Music and Computer Science [email protected], [email protected], [email protected]

Abstract examine and extract features of the music. A system might evaluate the This paper introduces a stereo 3- timbral similarities between different D panning visualization tool based styles of music, eg. rap vs. jazz, and on methods borrowed from the produce genre classifications based field of Music Information Retrieval on these results. Expand this thinking (MIR). This tool helps to illustrate to incorporate all of the different and quantify production decisions technical treatments and techniques and recording practices used by at the disposal of the recording engineers and producers in the record engineer and producer and, with production process. The tool is also accurate and robust enough tools, it valuable for pedagogical purposes, maybe possible to not simply classify providing students with a visual genres, styles or tempos, but also the feedback of what they are (or are not) engineer and producers individual hearing in recordings as they develop stylistic techniques or "signature". their critical listening skills. A case Just as the studying of scores by study comparing a body of work by young composers provides insight Tchad Blake and Rick Rubin illustrates into the masters’ techniques and the value of this tool to the musicology trademarks, the 3-D panning of record production. visualization tool introduced in this paper allows student engineers and 1. Introduction producers insight into the technical

attributes that makes an album The field of Music Information distinct, ie. a “Rick Rubin” album vs. a Retrieval (MIR) is defined as, "a “Phil Spector” album. multi-disciplinary research endeavor that strives to develop innovative 1.1 Justification content-based searching schemes, novel interfaces, and evolving Prior work with colleagues in the field networked delivery mechanisms in an of MIR shows that including stereo effort to make the world’s vast store panning features to classify record of music accessible to all" [1]. How production style improves scores for this relates to the musicology of genre classification of a music record production may at first appear database versus “classic” methods a tenuous link. Consider however based solely on timbral similarities that in an effort to better understand for genre classification. Classification the music being searched, MIR researchers seek to develop tools that accuracies improved in the order of ten percent for the given task of complex problem of balancing levels distinguishing 1960’s garage music vs. panning vs. frequency vs. effects. from 1980-90’s grunge music. An There is an established history of increase of twenty percent was seen thinking about this mixing process for the task of distinguishing acoustic visually, where individual vs. electric jazz [2]. While this may instruments are drawn by students to seem a trivial task to the audio create a “picture” of their final mix [3]. engineer or producer (indeed it is The analysis of commercial hypothesized that a trained listener recordings is the counterpoint to this would score higher than the MIR and will guide students in their own system) it should not be discounted. mixing decisions. What is missing The work clearly showed that there from this methodology is that while are panning decisions made as part of students are developing their the production process that can be technical ear they simply may not identified as “stylistic” or “accepted” hear all the elements in a given mix. for a given genre. It is argued that this The tool introduced provides visual ----part of the production process is feedback of these elements to aid natural, indeed, instinctive for an students in the development of their industry professional – it just sounds critical listening skills. right! Again, expand this concept to an individual producer or engineer 2. The Tool with years of experience, one who has evolved to have a so-called “sound” 2.1 Basic Design and classification of this individual style or “genre” maybe possible. This The stereo 3-D panning visualization work also identified that the current tool allows users to visualize the means for visualizing this panning panning of different elements in a information was poor and non- recording, with panning graphed on intuitive. Developing a new method the x-axis, frequency on the y-axis, for visualizing this information, one and the real-time component creating that would enable it to be more a waterfall-type display on the z-axis. accessible to scholars, was the initial The tool computes the panning index, goal of this project. a frequency-domain source identification system based on a cross- 1.2 Panning Pedagogy channel metric as described in Avendano [4]. This panning index, In designing and developing the giving values of -1.0 to 1.0, full left to visualization tool it became clear that full right respectively, allows for the a valuable pedagogical tool was also mapping of the individual frequency being created. Students of audio components in different FFT (Fast engineering and record production Fourier Transfrom) or MFCC (Mel work to develop their listening skills Frequency Cepstrum Coefficient) bins through critical listening exercises along the left-right panning and practical work in the studio. The dimension. As each FFT of MFCC bin process of mixing (the place where is computed it is then mapped to zero most panning decisions are made) is a time on the z-axis. As each new bin is calculated the previous bin moves Marsyas (http://marsyas.sf.net) a along the z-axis (away from the user), Music Information Retrieval thus creating a “picture” of the track framework was used to generate files being visualized. Magnitude (level) of containing data points that the individual frequency components represented the 3D panning are mapped using a colour scale from information for a given track. This green (low level) to red (high level). version of the tool has the advantage This should be familiar to users of and of being easily accessed by users from digital audio workstation (DAW). around the world using a wide variety Using the opening bars of The Beatles of computers and operating systems. In My Life to illustrate this, we see the However, in order to accurately opening guitar and bass lines panned visualize the transient information of fully to the left (panning index = -1.0). a given track large amounts of data John Lennon’s vocals then enter need to be drawn on the screen in a panned fully right (panning index very short period of time. Even with +1.0) and Ringo Star’s drums enter the 3D support in it’s latest version, again panned left (panning index - Flash is unable to display the requisite 1.0). number of data points in real-time yielding a display lacking the An important side effect of this resolution required. For this reason, a plotting process is that the ambience second version of the software was or reverb present in the given track developed, again using the Marsyas is also clearly visualized. If the sound programming framework. This source is a “dry” amplitude panned version was combined with hardware source, such as the opening lines of accelerated 3D using the OpenGL the Rubin produced Jay Z track 99 toolkit and a GUI created using the Qt Problems, we see a single line in the framework. This application takes center of the display. Conversely, if advantage of the hardware 3D the source is panned to the center graphics cards present in all modern but accompanied by reverb, as in the computer systems to display large Blake produced track, Name, by Artist, numbers of data points we see a wider image that “spreads” simultaneously, and as the entire between left and right on the display. application is written in C++, it is able to provide real-time performance in 2.2 Software Design many use cases. This tool is open source, free software and is available Two versions of the software were for download at the main Marsyas developed, the first, a web-based website (http://marsyas.sf.net). version of the tool using a Flash application that was embedded within 3. A case study: a standard HTML web page. Due to Rick Rubin vs. Tchad Blake the recent addition of native 3D Two established industry support to Flash 10, it was possible to professionals were selected for a achieve near real-time performance in case study with the goal of clearly displaying the 3D panning identifying an individual style or information for a given track. technique with regards to panning in also known for his use of binaural the production process. techniques and an affinity towards Rick Rubin is a Grammy award- dynamic mixing textures with unique winning producer successful in a sonic textures. The selected works for variety of genres, which provides a Blake include: The Globe body of work well suited to test the Sessions, Redemption hypothesis that a “signature” exists in Son, Binaural, the albums he produces. Rubin’s Brutal Youth and productions are known to exhibit a Beauty and Crime. stripped-down sound, one that eschews the use of reverb, instead 3.1 Methodology using naked vocals and bare instrumentation. Rubin is quoted in a A total of seventy-two Blake NYTimes article saying, “There’s just a produced/engineered tracks and natural human element to a great seventy-four Rubin produced tracks song that feels immediately satisfying. were used for the case study. I like the song to create the mood” [5]. Observations were first made using Though he describes himself as, “no the visualization tool. It was found expert at the technical aspects” of the that the Rubin produced tracks record production process [6] this exhibit limited dynamic panning and underlying production goal of are primarily driven by very dry, allowing the song to, “create the mono vocals. Panning of mono mood” would certainly seem to elements in the mixes tended towards dictate albums that are very sparse extreme L/C/R panning. The stereo and natural. The selected works for elements eg. drums or piano, Rubin include: Beastie Boys Liscenced generally were seen to be spread full to Ill, Jay-Z The Black Album, Dixie left to right, again with little or no Chicks Taking the Long Way, Red Hot ambience or reverb associated with Chili Peppers Blood Sugar Sex Magik, them. The Blake tracks show more Tom Petty Wildflowers and Johny instances of dynamic panning and a Cash American IV: The Man Comes much greater use of reverb and Around. effects. Blake’s vocals are often presented with reverb or effects, but Tchad Blake is an engineer who even when presented without reverb, describes himself as, “…really an this presentation will inevitably engineer/producer, not a composer/ change. For example in a chorus or producer…I like working with artists bridge with the vocal being who have strengths in arrangements augmented by reverb or effects. This and are musically adept. I think I classic “lifting” of the chorus is also am good at contributing atmosphere present in Rubin produced tracks but and helping them flesh things out”. his technique uses track [7]. When asked his ideology and arrangements rather than the use of approach to panning on the Gearslutz effects. A final observation was in the forum Blake responds, “Anywhere and way that Blake and Rubin treated low everywhere. Mostly hard. Outstide frequency information (below 500Hz) the speakers if I can” [8]. Blake is with regards to reverb. Blake has a tendency to spread information below classifier. This trained machine 500Hz through the use of reverb or learning classifier was then used to ambiences. In the instances where classify the tracks using a standard 10- Rubin does use reverb it doesn’t fold cross validation methodology. generally extend into this low The results of this process showed frequency region. This creates a that 70.6 percent of the time the “thicker”, although less dynamic tracks were classified correctly for sound for Blake’s tracks in this the given task of identifying Blake vs. frequency range with Rubin’s Rubin. sounding more precise and dynamic. 4. Conclusions

3.2 Machine Learning Classification The case study shows that the observations made using the In order to support these visualization tool can be trusted observations and provide empirical and that there is a difference in the data showing them to be true, the way the two producer/engineers, selected tracks were run through a Rubin and Blake, treat panning in MIR machine learning classifier. The the record production process. It basic analysis tool is the same as that must be acknowledged that panning previously described, computing the alone is quite a crude measure of panning index for individual production style. However, any result frequency components of FFT or showing that record production MFCC bins. For this component of the can be quantified must be seen as study the process of plotting the data positive for this area of study. If the for visualization has simply been process can be quantified then it can replaced by computational analysis. be separated from the artist, thus the In order to compare equal segments producer/engineers role and impact of audio the tracks were first on the finished track can be clearly segmented into clips of 30 seconds studied and appreciated. each. Any end sections of the songs that were less than 30 seconds in Readers interested in this tool should length were removed. The panning contact Steven Ness for download information, or “features”, were then instructions at [email protected]. extracted from each of the clips using They can also obtain the MarPanning Marsyas. The features extracted were source code by downloading Marsyas the stereo panning index for low (0Hz- from http://marsyas.sf.net 250Hz), middle (250Hz-2500Hz) and high (2500Hz-22050Hz) frequency 4. Future Work bands as outlined in Tzanetakis [1] and Avendano [2]. The output of this The primary focus of future work will process provided the mean and be to create a VST plug-in version standard deviation of these audio of the tool that can be used in any features calculated over a number of DAW. This would allow users to different window sizes. These audio visualize the track throughout the feature vectors were then used to recording and mixing process to help train a Support Vector Machine (SVM) guide their work. Another goal is to increase the number of features for [4] Eargle, John. Handbook of Recording

rd the classification of record production Engineering 3 ed. New York: Van Nostrand Reinhold Co. 1986. style. Ideas include tools to quantify the use of compression, delay and [5] Hirschberg, Lynn. “The Music Man”. other effects, and to deconstruct The New York Times (September 2007). track arrangements through sound- (accessed an increased number of features 30 October 2009). an even greater ability to classify [6] Tyrangiel, Josh. “Rick Rubin: Hit Man”. individual production techniques will be possible. Time Magazine (February 2007). (accessed 30 5. Acknowledgements October 2009).

I would like to acknowledge Randy [7] Bonzai, Mr. “Working in the Real Jones for the ideas and discussions World: Engineer/Producer Tchad that led to this work, Peter Driessen Blake“. Mix Magazine (February for his valuable input, Steven Ness for 2005). (accessed 30 October 2009).

[8] Q&A with Tchad Blake. [1] Tzanetakis, G., Jones, R. and (accessed 30 October 2009). McNally, K. Stereo Panning Features for Classifying Recording Production Style. In Proceedings of the International Conference on Music Information Retrieval (ISMIR), Vienna, Austria, September 2007.

[2] Avendano, C. “Frequency-domain source identifica-tion and manipulation in stereo mixes for enhance-ment, suppression and re-panning applications. In Proceedings of IEEE Workshop on Applications of Signal Pro-cessing to Audio and Acoustics (WASPAA), pages 55– 58, 2003.

[3] J. Stephen Downie. Toward the Scientific Evaluation of Music Information Retrieval Systems. In Holger H. Hoos and David Bainbridge Proceedings of the Fourth International Conference on Music Information Retrieval: ISMIR 2003. (pp. 25-32).