<<

HUMDRUMR: A NEW TAKE ON AN OLD APPROACH TO COMPUTATIONAL

Nathaniel Condit-Schultz Claire Arthur Georgia Institute of Technology Georgia Institute of Technology [email protected] [email protected]

ABSTRACT proaches. Humanist scholars’ deep, nuanced knowl- edge of musical culture, structure, and practice could be Musicology research is a fundamentally humanistic en- an invaluable asset to the computational/empirical music deavor. However, despite the productive work of a small research community. Unfortunately, the training neces- niche of -trained computational musicologists, sary for fruitful computational scholarship is absent from most cutting-edge digital music research is pursued by most music curricula, which cover neither the fundamen- scholars whose primary training is scientific or compu- tal methodological principals of empirical research nor the tational, not humanistic. This unfortunate situation is necessary coding skills. The need to assist, and convince, prolonged, at least in part, by the daunting barrier that music scholars to learn programming has been an area of coding presents to humanities scholars with no active discussion for some time [7], in particular there is technical training. In this paper, we present humdrumR a need for software tools crafted to support their learning (“hum-drummer”), a software package designed to afford and research goals [32]. computational musicology research for both advanced and To make computational research truly appealing and novice computer coders. Humdrum is a powerful and in- accessible to traditional humanists and seasoned compu- fluential existing computational musicology framework in- tational researchers alike, we must juggle three conflict- cluding the humdrum syntax—a flexible text data format ing factors: flexibility, power, and usability. User-friendly with tens of thousands of extant scores available—and the interfaces like the Josquin Research Project’s Analysis Bash-based humdrum toolkit. HumdrumR is a modern Tools 1 , Theme Finder 2 , or rapscience.net’s Visualizer 3 replacement for the humdrum toolkit, based in the data- can be utilized by scholars with no special training. How- analysis/statistical programming language R. By combin- ever, such tools allow only variations of hard-coded anal- ing the flexibility and transparency of the humdrum syn- yses applied to limited, fixed databases. On the op- tax with the powerful data analysis tools and concise syn- posite extreme, any skilled programmer can code their tax of R, humdrumR offers an appealing new approach to own symbolic music data formats and analysis/parsing would-be computational musicologists. HumdrumR lever- software “from scratch”—as many computational projects ages R’s powerful metaprogramming capabilities to create [4, 11] have done—, allowing for the unlimited power an extremely expressive and composable syntax, allowing of the programming language of their choice, but requir- novices to achieve usable analyses quickly while avoiding ing substantially more effort and experience. The most many coding concepts that are commonly challenging for prominent modern computational musicology toolkit— beginners. music21 [10]—, in our estimation, falls too close to the later extreme, proving quite daunting to novice coders. 1. INTRODUCTION This paper describes humdrumR (“hum-drummer”), a software toolkit for symbolic musicological data analysis Though digital musicology has been a productive area of intended to be appealing and accessible to traditional mu- research for several decades (e.g., [2, 8, 12, 16, 18–20, 22– sicologists while also being useful to more experienced 24,26,28,29]), it remains a niche field within the musical- computational researchers. Learning lessons from the suc- side of academia. In fact, most cutting-edge, scientific cesses and failures of existing tools, humdrumR strikes a music research has been pursued by researchers with pri- powerful new balance between flexibility, power, and us- mary training in computer and psychology. Fortu- ability: nately, recent years have seen a flourishing of “digital hu- manities” research in general, with increasing numbers of 1. Using the humdrum data syntax, humdrumR is ex- traditional humanities scholars adopting computational ap- tremely flexible and general in scope, allowing users to study any type of performance-art data that can c Nathaniel Condit-Schultz, Claire Arthur. Licensed un- be represented symbolically—musical scores, dance der a Creative Commons Attribution 4.0 International License (CC BY steps, trumpet fingerings, etc. 4.0). Attribution: Nathaniel Condit-Schultz, Claire Arthur. “hum- drumR: A New Take on an Old Approach to Computational Musicology”, 1 http://josquin.stanford.edu/search/ 20th International Society for Music Information Retrieval Conference, 2 http://www.themefinder.org/ Delft, The Netherlands, 2019. 3 http://rapscience.net/Analysis/gui.html

715 Proceedings of the 20th ISMIR Conference, Delft, Netherlands, November 4-8, 2019

2. Based in the R programming language, humdrumR programming environment (Python, R, MATLAB, Julia, is extremely powerful, capable of complex data ma- etc.) to achieve more complex data manipulation, statisti- nipulation and interfacing with R’s statistics, visual- cal testing, or data visualization. The burden of parsing and ization, and libraries. manipulating humdrum data in the higher-level language falls on the user, substantially increasing the need for cod- humdrumR 3. is designed to present a relatively low ing skills and prolonging the workflow. To make matters barrier of entry for non-technical researchers, offer- worse, installation of the humdrum toolkit can be fairly ing a concise, expressive syntax for applying music complicated, especially for Windows users, who must in- analyses while avoiding difficult coding paradigms. stall a Unix emulator to use the toolkit at all. In 2005, musicologist and psychologist Nicholas Cook ob- HumdrumR is a successor to the humdrum toolkit, re- served that to truly engage in computational musicology placing all the original toolkit’s functionality while adding “proper,” scholars must achieve “sufficient understanding significantly more. However, humdrumR uses the origi- of the symbolical processing and data representation on nal humdrum syntax specification, and is thus compatible which it’s based” [7]. We believe that the combination of with all existing humdrum data. In fact, both the techni- humdrum and R represents an ideal avenue for computa- cal design and methodological philosophy of the humdrum tional novices to develop this “sufficient” understanding: syntax are fundamental to humdrumR. enough to pursue quality computational research but with- out having to engage with general coding paradigms that 2.1 The Humdrum Syntax are irrelevant to their research. The humdrum syntax is an extremely general scheme for In this paper, we first review the relevant philosophi- representing musicological data in plain-text. The syntax cal and technical features of humdrum and R, noting how is basically tabular (tab-delineated columns) but with a few humdrumR incorporates these features (sections 2 and additional complexities: 3). We next contrast humdrum(R) coding philosophy and style with that of music21 (section 4). Finally, we de- • Columns of data are interpreted as spines, which scribe the principle features of the humdrumR package can dynamically start, end, split, or merge, creating and humdrumR syntax (section 5), including numerous spine paths. Spine paths allow the syntax to repre- code examples. sent many complex cases from music notation: For example, if the upper staff of a piano score splits temporarily into two voices (one with beams up, the 2. HUMDRUM other with beams down), this can be encoded by Humdrum 4 is a system for computational musicology re- splitting the spine representing that staff into two search, created by [17] (circa 1995) and columns (with a *^ token) before merging them maintained by Craig Sapp at Stanford’s Center for Com- again after the split passage (*v tokens). puter Assisted Research in the Humanities. Though no one digital musicology framework has ever truly dominated the • Humdrum records are tagged as interpretation field, humdrum is certainly among the most widely used records (representing local metadata) or data and influential systems, being cited as the direct inspiration records (notes, chords, etc.). By simply interspers- for some of its most successful competitors—music21 ing interpretation and data records, metadata can be [10] and MusicXML [13]—and with tens of thousands of concisely associated with specific data points, pas- scores available in humdrum format. Humdrum actually sages, or sections. For example, key information can has two distinct components: the humdrum syntax and the be quickly read, edited, or added to scores by simply humdrum toolkit. inserting lines containing tokens like *f#: (f# mi- The humdrum toolkit is a collection of Unix command- nor). This approach contrasts with the more verbose, line tools for parsing and analyzing humdrum data, largely hierarchical attributes and tags used in XMLs. written in Bash and AWK but with more recent “extra” • Finally, each “cell” (i.e., each line-column coor- 5 commands written in C++. The humdrum toolkit is fun- dinate) in a humdrum file can contain multiple damentally entwined with the Bash shell, in particular the data tokens—a feature commonly used to represent grep and sed commands, which rely heavily on regu- chords but with myriad other potential use cases. lar expressions to parse and filter tokens. The humdrum toolkit also includes a number of analysis tools, notably The humdrum syntax provides a data structure but says the sophisticated windowing and n-gram tool context nothing about how information is actually encoded in data and the pattern finder patt. Using the toolkit, basic pars- tokens. Rather, specific interpretation schemes must be ing and analysis can be achieved quickly and easily in defined. A few well-specified interpretations for music Bash command pipes, but true Bash scripting is required are widely used, most notably the **kern representation. for even mildly complicated tasks. In practice, after initial However, users can freely define new interpretations, with parsing, humdrum users must transition into a higher-level the appropriate degree of rigour/precision/complexity for the task at hand. Thus, humdrum is not limited to tradi- 4 http://www.humdrum.org 5 Sapp also maintains a C++ development library for humdrum tools tional Western notation, but rather affords the study of non- called humlib (https://humlib.humdrum.org/). Western, vernacular, or avant-garde , as any type of

716 Proceedings of the 20th ISMIR Conference, Delft, Netherlands, November 4-8, 2019 musical feature, data, or metadata to be easily encoded in mation to humdrum data tokens while maintaining, and vi- the same place. This presents a stark contrast with nearly sualizing the simple, readable humdrum syntax with each all other music encoding schemes, including MuseData transformation. This dramatically improves the process of [15], abc [30], TinyNotation [9, ch. 16], LilyPond [25], and debugging and makes the series of steps from input to out- MusicXML, all of which are limited to representing music put clearer for novice users. Humdrum, thus, truly supports as, or in terms of, Western music notation—for instance, Cook’s [7] call for a direct understanding of symbolic rep- representing diatonic note names. Though **kern repre- resentations and processes. Consistent with this philoso- sents music in similar terms, the humdrum syntax in gen- phy, humdrumR commands also reconstruct and display eral has no such limitation. 6 Though the actively devel- readable humdrum data even as the data is manipulated, oping MEI [14] encoding standard can be extended to in- maintaining a clarity and transparency which is easily lost corporate arbitrary music information, the emphasis of the when coding with complex data structures. MEI community is nonetheless representing music nota- Though alternative data encodings have proven highly tion(s) for purposes of library science and score preserva- useful in contexts of research (MuseData, abc, MEI), tion and dissemination, not analysis. file interchange (MusicXML), composition and engraving Data encoding schemes inevitably balance read/ (TinyNotation, LilyPond, MEI), and performance (MIDI), writeability with power and flexibility [9, ch. 16]). No- the humdrum syntax offers an optimal combination of flex- tation systems such as abc notation and TinyNotation ibility, read/writeability, and epistemological transparency, achieve nearly effortless human read/writeability, but with and is thus an ideal target for a new computational musi- severely limited representational possibilities. 7 On the cology toolkit. Fortunately, our focus on a single encod- opposite extreme, MusicXML and MEI can encode ex- ing scheme is not a limiting factor, as software to translate tremely complex, sophisticated score information, but are between **kern and most important representations— difficult to read/write directly. Humdrum achieves a bal- including MIDI, MusicXML, and MEI—is already avail- ance between these extremes: though not quite as extend- able. In fact, fruitful cross-fertilization between musical able as MEI, the humdrum syntax nonetheless affords a data ecosystems is common: for instance, MEI’s extraordi- huge variety of approaches to encoding data. On the other nary Verovio score viewer has already been adapted as the hand, though humdrum’s top-down, tab-delineated format Verovio Humdrum Viewer 8 , making it easier than ever to certainly makes editing slightly more cumbersome than visualize and edit humdrum data. editing abc or TinyNotation, most humdrum interpreta- tions are comparably readable. 3. R Like other easy notation schemes (abc, tinyNotation, LilyPond), humdrum interpretations often embed multi- R [27] is a free, cross-platform, open-source “environment ple pieces of information in each token. For instance, the for statistical computing and graphics”—a domain specific **kern token (4.aL/ encodes information about slurs programming language designed specifically for data anal- ((), rhythm (4.), pitch (a), beaming (L), and stem direc- ysis. R has a large ecosystem of data-analysis and statis- tion (/). This “dense” approach allows a large window tics packages, most available through the Comprehensive of musical time (for instance, several measures of multi- R Archive Network (CRAN); As of now, the only pro- part music) can be seen on a single screen. This contrasts gramming environment with a comparable ecosystem for with, for instance, MEI which spreads specific pieces of in- data analysis is Python. However, being less popular than formation across nested tags, making it impossible to see Python for general programming, R’s ecosystem is com- more than a few notes in a single screen. Such complicated paratively focused and uncluttered, with a higher quality tokens can also be written and edited rapidly, once you are floor. R’s standard library (“base” R) itself contains anal- familiar with the encoding. However, kern’s “dense” ap- ysis and visualization features which will satisfy the needs proach is but one option given humdrum’s flexible syntax: of many musicological research projects. Even when us- information can be spread across multiple spines/columns ing external libraries, R users rarely encounter dependency (like MuseData) in whatever manner is most appropriate issues—in fact, the process of installing R, humdrumR, for data analysis. For example, the MCFlow corpus of and its dependencies is trivial on any operating system, rap transcriptions [6] encodes eight pieces of information even for beginner programmers, with no need for exter- across eight separate spines. nal package managers, virtual environments, “sandboxes,” Human read/writeability is not just important to the pro- etc. cess of data creation and curation, but is in fact essential to R excels at exploratory, ad hoc data analysis in short humdrum’s entire methodological philosophy: humdrum scripts or in the read-eval-print loop (REPL), making it emphasizes epistemological transparency by forcing users easy to quickly manipulate, filter, and visualize data “on to engage directly with their symbolic data representations, the fly.” Indeed, a focus on simple scripting and “con- even as they are filtered and transformed. Indeed, in tradi- strained” projects, without concern for more general soft- tional humdrum work flows, we apply repeated transfor- ware development issues, is core to R (and humdrumR) philosophy, making R an excellent avenue for learning pro- 6 Conversely, humdrum/**kern is not as optimized for representing the details of music engraving as LilyPond, MEI, or MusicXML. gramming purely for data analysis. This coding paradigm 7 However, music21 includes an API for extending TinyNotation, useful to those with the requisite coding skills. 8 http://verovio.humdrum.org/

717 Proceedings of the 20th ISMIR Conference, Delft, Netherlands, November 4-8, 2019 is well suited to the constrained, stand-alone projects typ- 4. music21 ical of theory-driven musicological research projects. The R ecosystem also includes a number of useful, free, soft- Music21 is a Python library for symbolic music gen- ware tools for enhancing the productivity, reproducibility, eration and analysis, with an extensive set of tools ex- and presentation/sharing of research conducted in R. No- tending well beyond the capabilities of the humdrum tably, RStudio is a free, high quality integrated develop- toolkit, and which easily integrates with Python’s extensive ment environment for R. R is also one target of the Jupyter ecosystem (statistics and graphics libraries, etc.). Though project [21], with RStudio too incorporating a suite of music21 and humdrumR overlap significantly in use mark-up, notebook, and presentation tools. Finally, RStu- case, humdrumR offers a fundamentally different coding dio’s shiny package [5] provides an easy means of creat- philosophy and style. ing interactive R data visualizations in the browser. Python syntax is famously simple to read/learn, yet music21 Though R is generally best suited to a combination the Pythonic coding style of nonetheless of procedural and functional styles of programming, it presents challenges to would-be computational musicol- music21 nonetheless includes Object-Oriented Programming fea- ogists. Working with requires one to engage tures suitable for defining simple classes. R’s S4 ob- directly with a hierarchy of complex classes (with numer- ject system is oriented around multiple dispatch—generic ous attributes and methods) and write many explicit control functions can be defined which call specific methods based and looping structures, including (in typical analyses) mul- for on the types of any or all of the function’s arguments, not tiple nested loops. In contrast to humdrum, which re- music21 just their first argument. In languages featuring multiple lies on plain-text strings to encode information, dispatch (Julia, Common Lisp, Smalltalk, etc.), methods parses musical scores into numerous complex data ob- music21.Note are not bound within classes. As a result, the object system jects. Notably, contains a rich set of at- operates in the background: novice users benefit from the tributes and methods for describing “notes.” This complex- features afforded by the object system—for instance, com- ity, though highly useful to experienced coders, is a bar- mon functions like summarize can be applied to nearly rier to entry for novices, for whom the practical reality of any type of R object, getting useful results—without ever carrying out a computational musicology project is all but music21 having to consciously engage with the system. impossible. This is in part due to the style of ’s User’s guide, which necessarily gets bogged down explain- One of the core philosophies of R is “vectorization”: ing detailed functionality of how to represent, extract and treating data collections (especially vectors and arrays) manipulate low-level features (e.g., parts, notes, etc.) at as conceptually singular objects. HumdrumR leans into the expense of explaining larger-scale processes like, for this philosophy, allowing users to think and operate on instance, how to search through a corpus of music to com- humdrumR data collections holistically. One concrete re- pare n-grams. Finally, Music21’s object hierarchy pri- alization of this approach is that humdrumR users can marily represents musical score features—representations completely avoid explicit iteration (i.e., loops): Iteration for arbitrary extra-musical data (e.g., dance steps, finger- is abstracted by the humdrumR API, allowing users to de- ings) or musical metadata (e.g., formal labels, manual an- cide solely what processes to apply to data tokens, not how notations) is not supported. to apply them to each token. HumdrumR defines a num- Music21’s Stream-based object model is (purposely) ber of useful data classes, yet the R approach makes these extremely flexible [1]. As discussed above, the humdrum classes an implementation detail that novice programmers syntax includes some scope for flexible variation (using need not understand. spine paths, for instance), yet humdrumR’s data back- end (section 5.1) nonetheless always has the same struc- 3.1 Metaprogramming ture; thus, one can always assume the same data struc- ture and thus that certain commands/routines will always The final key to the R ecosystem’s concise data analysis work. In contrast, music21 users must always first syntax is its subtle use of metaprogramming [31, ch. 17– determine the Stream-hierarchy of their data: are notes 21]. In particular, numerous R packages use a shared do- nested within measures, within parts, or within chords, etc? main specific language for specifying statistical models us- Similarly, music21’s highly sophisticated data classes ing R “formula” [31] objects. 9 Created using the ~ op- (like music21.Note) are fairly inflexible, whereas hum- erator, R formulae capture (“quote”) surrounding R ex- drum’s plain-text tokens are completely flexible. Again, pressions as well as their local namespace. R’s metapro- this later approach is consistent with humdrumR philoso- gramming features allow the programmatic manipulation phy, forcing users to think about and grapple with the trans- of these formulae, for instance, using the update routine. parent details of how their musical information is encoded Again, though metaprogramming is essential to R code, and not about details of data structure. only advanced developers will ever need to explicitly en- As dynamically typed, interpreted languages, explicit gage with metaprogrammming concepts. loops in either R or Python are notoriously slow. How- ever, whereas music21 makes much explicit use of for 9 For example, Y ~ X*Z + (1|G) describes a linear model pre- loops, humdrumR enables users to exclusively use “vec- dicting the variable Y using predictors X and Z, and the interaction be- tween X and Z, with random-effect intercepts specified for each level of torized” R and an optimized split-apply-combine back- the grouping factor G. end, to achieve fast execution of most commands. As a re-

718 Proceedings of the 20th ISMIR Conference, Delft, Netherlands, November 4-8, 2019 sult, humdrumR is generally much faster than music21. token, encoding the structure of the original humdrum data so that it can be reconstructed for visual inspection 5. HUMDRUMR after each modification. Users can freely create new fields; for instance, **kern tokens can be parsed into various The humdrumR library defines a number of data object pieces of information, each saved into a separate field: For classes and a suite of functions for manipulating these example, the commands classes. Most significant is the humdrumR class itself, chor %hum>% as.recip -> chor$Duration which encapsulates a corpus of parsed humdrum files, and chor %hum>% as. -> chor$MIDI serves as the primary interface through which users inter- create two new fields—Duration and MIDI—by apply- act with humdrum data. HumdrumR includes a complete ing the functions as.recip 11 and as.midi to the de- humdrum syntax parser, which reads any valid humdrum fault Token field. These new fields can then be referenced data—including all valid spine paths—into a humdrumR like any other field in subsequent calls. data object. Invalid humdrum files are automatically flagged and skipped, with line-by-line syntax error reports 5.2 API generated on request. Consistent with the humdrumR phi- losophy, the syntax for reading files is concise and power- Much of R’s expressive power arises from a subtle us- data.frame ful: users simply specify one or more regular expressions age of metaprogramming to manipulate s: subset with matching files on their local disc. For instance, the com- Several base R functions—including , , and within mand —allow users to input arbitrary R expressions which are then evaluated within the data set using the ta- readHumdrum("Bach/Chorales/chor.*krn")->chor ble’s named fields as a local namespace. HumdrumR ex- validates, reads, and parses all files matching the tends this paradigm to humdrum data, allowing users to ap- regular expression "chor.*krn" in the directory ply arbitrary expressions to humdrum data stored in the un- "Bach/Chorales" (370 files on our machine), wraps derlying data.table back-end. Users capture expres- them in a humdrumR corpus object, and assigns this ob- sions as R formulae and humdrumR API applies them to ject to the variable chor. A set of summary functions are the data. These expressions can refer to any field in the included to quickly describe the size, content, and struc- data, including fields created by the user. The command ture of a loaded humdrumR data objects. In addition, an extensive suite of functions and classes for representing chor %hum>% ~table(MIDI[Duration == "4"]) and manipulating pitch and rhythm data is also included— (using the MIDI and Duration fields defined in the these tools reproduce much of the functionality of the orig- previous code block) extracts all MIDI values where the inal humdrum toolkit—as well as music21’s core class corresponding rhythm is a quarter note—using the stan- hierarchy—, with some significant improvements and ad- dard R indexing ([]) operator—and then applies the base ditions. R table function, to tabulate these MIDI values. In a A great strength of the original humdrum toolkit is its sense, this approach is an abstraction of function defini- use of the Bash | (“pipe”) operator to concisely chain a tion: One creates an expression—equivalent to the body of series of operations—a syntax that is highly accessible to a function—but specifies no function arguments, as all data novice programmers. In recent years, the piping approach fields from the humdrum data are automatically passed into has become popular in R [3], especially magrittr’s the expression if referenced. %>% pipe operator. HumdrumR too incorporates a pipe The true power of the humdrumR arises through spe- operator—%hum>%—which appears in all our subsequent cial keyword formulae which modify the API’s behavior. code examples. Most notably, the by keyword can be used to split-apply- combine humdrum data. For instance, the command 5.1 Data Model chor %hum>% c(~table(MIDI[Duration == "4"]), R’s primary native data structure is the tabular by ~ File) data.frame. HumdrumR utilizes a popular exten- applies the exact same processing as the previous code 10 sion of the base R data.frame, the data.table : block except the expression is applied separately to each an optimized data.frame which achieves database file in the corpus, creating 370 separate tables. Since the manipulation performance comparable to Python’s Pandas humdrumR data fields include all data and metadata in the module, including an extremely fast split-apply-combine dataset, any field can be used to group data. routine. The HumdrumR class stores humdrum data in Other keyword formulae afford complex windowing, a list of data.tables, with each individual data token including n-grams, overlapping fixed-length windows, and assigned to a single row. Data and metadata for each token various dynamic windowing possibilities. Finally, another are encoded in named columns of the data.table, set of keywords can be used to directly manipulate R’s called fields. The original string is encoded in the Token built-in visualization settings. Since formulae (including field, with global and local metadata associated with that keyword formulae) are first-class objects in R, all of these token spread across other fields. Additional fields encode expressions can be easily saved, composed, manipulated, the “location” (which file, spine, path, record, etc.) of each 11 “Recip” is short for “reciprocal”—the humdrum term for standard 10 https://cran.r-project.org/web/packages/data.table/index.html Western duration categories (eighth-notes, sixteenth-notes, etc.).

719 Proceedings of the 20th ISMIR Conference, Delft, Netherlands, November 4-8, 2019 and combined. For instance, we could save the tabling ex- expressions to match that information. The REMDS can pression used above and use it in combination with various then be used to create generic functions which read an in- keyword formulae: put string and dispatch the appropriate method based on tabQuarters <- ~table(MIDI[Duration == "4"]) matching regular expressions—what’s more, these meth- chor %hum>% c(tabQuarters, by ~ File) ods can (optionally) be applied “in place,” only affect- chor %hum>% c(tabQuarters, by ~ Spine) ing the substring which matches the desired regular ex- chor %hum>% c(tabQuarters, by ~ Clef) pression. For example, the humdrumR pitch module de- humdrumR The API also includes routines for filter- fines a number of functions which translate specific pitch ing and indexing a humdrum corpus. The standard R in- encodings (note names, solfege, intervals, etc.) to/from [] [[]] dexing operators ( and ) can be used to select humdrumR’s common tonalInterval pitch represen- pieces, records, or spines, either numerically or by match- tation. The REMDS is then used to generate generic pitch ing regular expressions. More sophisticated filtering can translation functions which call the desired method when humdrumR be achieved through the use of formulae: For a specific regular expression is matched. For instance, the instance, the command function as.midi can be applied to strings containing a filterHumdrum(chor, variety of pitch representations, even when embedded with ~Token %~% ’[EeAa]-’, by ~ File) other information (i.e., rhythm, beaming): selects all the files in the data set which contain the notes as.midi("4.cc#J") # "cc#" => **kern => 73 E[ or A[. as.midi("4.C#5J") # "C#5" => **pitch => 73 To bring it all together, a simple, yet complete as.midi("4.-M9J") # "M9" => **mint => -14 humdrumR analysis script might look like this: as.midi("4.soJ") # "so" => **solfa => 7 readHumdrum("Bach/Chorale/chor.*krn")->chor chor %hum>% (~ as.midi(Token)) This approach allows users to use humdrumR functions %hum>% (~ Pipe - 60) without having to explicitly manipulate strings or use reg- %hum>% c(doplot ~hist(Pipe, ular expressions, one of the major barriers to learning in title = Clef, the original humdrum toolkit. Many of humdrumR’s own xlim = c(30, 80)), by ~ Clef, functions (like as.midi) are written using the REMDS, mfcol ~ c(2,2)) and developers can utilize it to significantly reduce coding These commands load the Bach chorale dataset, convert effort when defining new functions. the original data tokens to MIDI numbers, subtract these numbers by 60 (to center them on middle C), then create a separate histogram for each clef in the data (in a 2x2 6. FUTURE grid 12 ). Note the use of the base R hist (histogram) The package source of HumdrumR 0.3.0 is currently function, including the use of the title and xlim (x- available on github (natsguitar/humdrumR); when devel- axis limits) arguments to give each plot a meaningful title opment solidifies, version 1.0.0 will be made available and place all plots on the same scale. on CRAN under the terms of the GNU General Public Li- 5.3 Developer Tools cense. However, releasing code is not enough to support humanist scholars interested in coding—it is imperative to HumdrumR is designed to be highly extensible. Even provide high quality documentation and learning materials novice users can quickly begin to create and save their own in a style that is digestible for users who may be new to routines as R functions, or simpler yet, as combinations of computer programming. The most important contribution humdrumR formulae. However, humdrumR also includes of the original humdrum project was neither the syntax nor several features to support development by more sophis- the toolkit, but Huron’s extensive user guide [17]. 13 The ticated coders. All humdrumR data classes are intended Humdrum User Guide offers a gentle introduction to em- to serve as extensible bases for further development— pirical/computational research from a humanistic perspec- for instance, developers might choose to implement coun- tive, walking readers through the practical and philosophi- terpoint analysis algorithms using humdrumR’s basic cal details and challenges of digital humanities work and tonalInterval class and its wealth of useful meth- the conceptual transformations necessary to convert hu- ods (transposition, inversion, etc.). However, the most manistic thought into concrete code, using examples from significant tool for developers is humdrumR’s Regular- real musicology projects. HumdrumR too will ultimately Expression Method Dispatch System (REMDS). Interact- be accompanied by a humdrumR User Guide, including in- ing with humdrum data requires extensive string manipu- teractive online content. Our goal is to not just teach the lation, typically using regular expressions, as one works mechanics of operating software in a friendly, hands-on to extract the information one is interested in from hum- format, but also the conceptual framework needed to think drum’s “dense” character tokens. Using the REMDS, de- about music as data, introducing key scientific principles/ velopers need only define normal R functions to manipu- methods (data sampling, statistics, hypotheses, etc.) while late the information they are concerned with and regular maintaining a holistic, humanistic perspective. 12 The keyword mfcol is a base R graphics parameter which controls the layout of plots in a grid. 13 http://www.humdrum.org/guide/

720 Proceedings of the 20th ISMIR Conference, Delft, Netherlands, November 4-8, 2019

7. REFERENCES [14] Andrew Hankinson, Perry Roland, and Ichiro Fujinaga. The music encoding initiative as a document-encoding [1] Christopher Ariza and Michael Scott Cuthbert. The framework. In Proceedings of the International Society music21 Stream: A New Object Model for Repre- for Music Information Retrieval, pages 293–298, 2011. senting, Filtering, and Transforming Symbolic Musical Structures. In Proceedings of the International Com- [15] Walter B. Hewlett. MuseData: Multipurpose Rep- puter Music Conference. Citeseer, 2011. resentation, chapter 27, pages 404–407. MIT Press, [2] Claire Arthur. Taking harmony into account. Music 1997. Perception: An Interdisciplinary Journal, 34(4):405– 423, 2017. [16] David Huron. Note-Onset Asynchrony in J. S. Bach’s Two-Part Inventions. Music Perception, 10(4):435– [3] Stefan Milton Bache and Hadley Wickham. magrittr: 443, July 1993. A Forward-Pipe Operator for R, 2014. R package ver- sion 1.5. [17] David Huron. Music Research Using Humdrum: A User’s Guide. Stanford, California: Center for Com- [4] John Ashley Burgoyne, John Wild, and Ichiro Fuji- puter Assisted Research in the Humanities, 1999. naga. An Expert Ground Truth Set for Audio Chord Recognition and Music Analysis. In A Klapuri and [18] David Huron. Tone and Voice: A Derivation of the C. Leider, editors, Proceedings of the 12th Interna- Rules of Voice-Leading from Perceptual Principles. tional Society for Music Information Retrieval (ISMIR) Music Perception, 19(1):1–64, September 2001. Conference, pages 633–638, Miami, FL, 2011. [19] David Huron and Deborah A. Fantini. The Avoidance [5] Winston Chang, Joe Cheng, JJ Allaire, Yihui Xie, and of Inner-Voice Entries: Perceptual Evidence and Mu- Jonathan McPherson. shiny: Web Application Frame- sical Practice. Music Perception, 7(1):43–47, October work for R, 2018. R package version 1.2.0. 1989. [6] Nathaniel Condit-Schultz. The Musical Corpus of Flow: A digital corpus of rap transcriptions. Empiri- [20] Nori Jacoby, Naftali Tishby, and Dmitri Tymoczko. An cal Musicology Review, 11(2):124–147, 2016. Information Theoretic Approach to Chord Categoriza- tion and Functional Harmony. Journal of New Music [7] Nicholas Cook. Towards the Complete Musicologist? Research, 44(3):219–244, 2015. Invited talk at ISMIR, 2005. [21] Thomas Kluyver, Benjamin Ragan-Kelley, Fernando [8] Tim Crawford and David Lewis. Early modern cover Pérez, Brian E Granger, Matthias Bussonnier, Jonathan detection: finding concordances in mixed- Frederic, Kyle Kelley, Jessica B Hamrick, Jason Grout, notation corpora of early music. In Music Encoding Sylvain Corlay, et al. Jupyter notebooks-a publishing Conference, Tours France, 2017. format for reproducible computational workflows. In [9] Michael Scott Cuthbert. music21: User’s Guide, 2011. ELPUB, pages 87–90, 2016.

[10] Michael Scott Cuthbert and Christopher Ariza. mu- [22] David Lewis, Tim Crawford, and Daniel Müllensiefen. sic21: A toolkit for computer-aided musicology and Instrumental idiom in the 16th century: Embellishment symbolic music data. In Proceedings of the Interna- patterns in of vocal music. In Proceed- tional Society of Music Information Retrieval. Proceed- ings of the International Society for Music Information ings of the International Society for Music Information Retrieval, 2016. Retrieval, 2010. [23] Kristen Masada and Razvan Bunescu. Chord Recog- [11] Trevor de Clercq and David Temperley. A Cor- nition in Symbolic Music Using Semi-Markov Condi- pus Analysis of Rock Harmony. , tional Random Fields. In Proceedings of the 18th Inter- 30(01):47–70, January 2011. national Society for Music Information Retrieval Con- [12] Johanna Devaney, Claire Arthur, Nathaniel Condit- ference, pages 272–278, 2017. Schultz, and Kirsten Nisula. Theme and Variation En- codings with Roman Numerals: A New Data Set [24] Eric Nichols, Dan Morris, Sumit Basu, and Christo- for Symbolic Music Analysis. In Meinard Müller and pher Raphael. Relationships Between and Frans Wiering, editors, Proceedings of the 16th In- Melody in Popular Music. In Proceedings of the 10th ternational Society for Music Information Retrieval International Society for Music Information Retrieval (ISMIR) Conference, pages 728–734, Malaga, Spain, (ISMIR) Conference, pages 471–476. ISMIR, 2009. 2015. [25] Han-Wen Nienhuys and Jan Nieuwenhuizen. Lily- [13] Michael Good. MusicXML: An internet-friendly for- Pond, a system for automated music engraving. In Pro- mat for . In XML Conference and Expo, ceedings of the XIV Colloquium on Musical Informat- pages 03–04, 2001. ics (XIV CIM 2003), volume 1, pages 167–171, 2003.

721 Proceedings of the 20th ISMIR Conference, Delft, Netherlands, November 4-8, 2019

[26] A. D. Patel and J. R. Daniele. Stress-Timed vs. Syllable-Timed Music? A Comment on Huron and Ollen (2003). Music Perception: An Interdisciplinary Journal, 21(2):273–276, 2003.

[27] R Core Team. R: A Language and Environment for Sta- tistical Computing. R Foundation for Statistical Com- puting, Vienna, Austria, 2018.

[28] David Temperley and T. de Clercq. Statistical Analysis of Harmony and Melody in . Journal of New Music Research, 42(3):187–204, 2013.

[29] Paul Von Hippel and David Huron. Why Do Skips Pre- cede Reversals? The Effect of Tessitura on Melodic Structure. Music Perception, 18(1):59–85, October 2000.

[30] Chris Walshaw. ABC2MTEX: An easy way of tran- scribing folk and traditional music, Version 1.0. Uni- versity of Greenwich, London, 1993.

[31] Hadley Wickham. Advanced R. Chapman and Hall, 2 edition, 2019.

[32] Frans Wiering and Emmanouil Benetos. Digital Musi- cology and MIR: Papers, Projects and Challenges. In Proceedings of the International Society for Music In- formation Retrieval, 2013.

722