Ai Song Contest: Human-Ai Co-Creation in Songwriting
Total Page:16
File Type:pdf, Size:1020Kb
AI SONG CONTEST: HUMAN-AI CO-CREATION IN SONGWRITING Cheng-Zhi Anna Huang1 Hendrik Vincent Koops2 Ed Newton-Rex3 Monica Dinculescu1 Carrie J. Cai1 1 Google Brain 2 RTL Netherlands 3 ByteDance annahuang,noms,[email protected], h.v.koops,[email protected] ABSTRACT experiences were enabled by major advances in machine learning and deep generative models [18, 66, 68], many of Machine learning is challenging the way we make music. which can now generate coherent pieces of long-form mu- Although research in deep generative models has dramati- sic [17, 30, 39, 55]. cally improved the capability and fluency of music models, Although substantial research has focused on improving recent work has shown that it can be challenging for hu- the algorithmic performance of these models, much less is mans to partner with this new class of algorithms. In this known about what musicians actually need when songwrit- paper, we present findings on what 13 musician/developer ing with these sophisticated models. Even when compos- teams, a total of 61 users, needed when co-creating a song ing short, two-bar counterpoints, it can be challenging for with AI, the challenges they faced, and how they lever- novice musicians to partner with a deep generative model: aged and repurposed existing characteristics of AI to over- users desire greater agency, control, and sense of author- come some of these challenges. Many teams adopted mod- ship vis-a-vis the AI during co-creation [45]. ular approaches, such as independently running multiple Recently, the dramatic diversification and proliferation smaller models that align with the musical building blocks of these models have opened up the possibility of leverag- of a song, before re-combining their results. As ML mod- ing a much wider range of model options, for the potential els are not easily steerable, teams also generated massive creation of more complex, multi-domain, and longer-form numbers of samples and curated them post-hoc, or used a musical pieces. Beyond using a single model trained on range of strategies to direct the generation, or algorithmi- music within a single genre, how might humans co-create cally ranked the samples. Ultimately, teams not only had with an open-ended set of deep generative models, in a to manage the “flare and focus” aspects of the creative pro- complex task setting such as songwriting? cess, but also juggle them with a parallel process of explor- In this paper, we conduct an in-depth study of what peo- ing and curating multiple ML models and outputs. These ple need when partnering with AI to make a song. Aside findings reflect a need to design machine learning-powered from the broad appeal and universal nature of songs, song- music interfaces that are more decomposable, steerable, in- writing is a particularly compelling lens through which terpretable, and adaptive, which in return will enable artists to study human-AI co-creation, because it typically in- to more effectively explore how AI can extend their per- volves creating and interleaving music in multiple medi- sonal expression. ums, including text (lyrics), music (melody, harmony, etc), and audio. This unique conglomeration of moving parts 1. INTRODUCTION introduces unique challenges surrounding human-AI co- creation that are worthy of deeper investigation. Songwriters are increasingly experimenting with machine learning as a way to extend their personal expression [12]. As a probe to understand and identify human-AI co- For example, in the symbolic domain, the band Yacht used creation needs, we conducted a survey during a large-scale MusicVAE [60], a variational autoencoder type of neural human-AI songwriting contest, in which 13 teams (61 par- ticipants) with mixed (musician-developer) skill sets were arXiv:2010.05388v1 [cs.SD] 12 Oct 2020 network, to find melodies hidden in between songs by in- terpolating between the musical lines in their back cata- invited to create a 3-minute song, using whatever AI al- log, commenting that “it took risks maybe we aren’t will- gorithms they preferred. Through an in-depth analysis of ing to” [47]. In the audio domain, Holly Herndon uses survey results, we present findings on what users needed neural networks trained on her own voice to produce a when co-creating a song using AI, what challenges they polyphonic choir. [15, 46]. In large part, these human-AI faced when songwriting with AI, and what strategies they leveraged to overcome some of these challenges. We discovered that, rather than using large, end-to-end © Cheng-Zhi Anna Huang, Hendrik Vincent Koops, Ed models, teams almost always resorted to breaking down Newton-Rex, Monica Dinculescu, Carrie J. Cai. Licensed under a Cre- their musical goals into smaller components, leveraging ative Commons Attribution 4.0 International License (CC BY 4.0). Attri- a wide combination of smaller generative models and re- bution: Cheng-Zhi Anna Huang, Hendrik Vincent Koops, Ed Newton- Rex, Monica Dinculescu, Carrie J. Cai. “AI Song Contest: Human-AI combining them in complex ways to achieve their creative Co-Creation in Songwriting”, 21st International Society for Music Infor- goals. Although some teams engaged in active co-creation mation Retrieval Conference, Montréal, Canada, 2020. with the model, many leveraged a more extreme, multi- stage approach of first generating a voluminous quantity to steer the AI and express their creative goals. Recent of musical snippets from models, before painstakingly cu- work has also found that users desire to retain a certain rating them manually post-hoc. Ultimately, use of AI sub- level of creative freedom when composing music with stantially changed how users iterate during the creative AI [26, 45, 65], and that semantically meaningful controls process, imposing a myriad of additional model-centric can significantly increase human sense of agency and cre- iteration loops and side tasks that needed to be executed ative authorship when co-creating with AI [45]. While alongside the creative process. Finally, we contribute rec- much of prior work examines human needs in the context ommendations for future AI music techniques to better of co-creating with a single tool, we expand on this emerg- place them in the music co-creativity context. ing body of literature by investigating how people assem- In sum, this paper makes the following contributions: ble a broad and open-ended set of real-world models, data • A description of common patterns these teams used sets, and technology when songwriting with AI. when songwriting with a diverse, open-ended set of deep generative models. 3. METHOD AND PARTICIPANTS • An analysis of the key challenges people faced The research was conducted during the first three months when attempting to express their songwriting goals of 2020, at the AI Song Contest organized by VPRO [67]. through AI, and the strategies they used in an attempt The contest was announced at ISMIR in November 2019, to circumvent these AI limitations. with an open call for participation. Teams were invited to • Implications and recommendations for how to better create songs using any artificial intelligence technology of design human-AI systems to empower users when their choice. The songs were required to be under 3 min- songwriting with AI. utes long, with the final output being an audio file of the song. At the end, participants reflected on their experi- ence by completing a survey. Researchers obtained con- 2. RELATED WORK sent from teams to use the survey data in publications. The Recent advances in AI, especially in deep generative mod- survey consisted of questions to probe how teams used AI els, have renewed interested in how AI can support mixed- in their creative process: initiative creative interfaces [16] to fuel human-AI co- • How did teams decide which aspects of the song to creation [28]. Mixed initiative [33] means designing in- use AI and which to be composed by musicians by terfaces where a human and an AI system can each “take hand? What were the trade-offs? the initiative” in making decisions. Co-creative [44] in this • How did teams develop their AI system? How did context means humans and AI working in partnership to teams incorporate their AI system into their work- produce novel content. For example, an AI might add to flow and generated material into their song? a user’s drawing [21, 50], alternate writing sentences of a In total, 13 teams (61 people) participated in the study. story with a human [14], or auto-complete missing parts of The teams ranged in size from 2 to 15 (median=4). Nearly a user’s music composition [4, 34, 38, 45]. three fourths of the teams had 1 to 2 experienced mu- Within the music domain, there has been a long his- sicians. A majority of teams had members with a dual tory of using AI techniques to model music composi- background in music and technology: 5 teams had 3 to tion [23, 31, 53, 54], by assisting in composing counter- 6 members each with this background, and 3 teams had 1 point [22, 32], harmonizing melodies [13, 43, 51], more to 2 members. We conducted an inductive thematic anal- general infilling [29, 35, 41, 61], exploring more adven- ysis [2, 5, 6] on the survey results to identify and better turous chord progressions [27, 37, 49], semantic controls understand patterns and themes found in the teams’ sur- to music and sound generation [11, 24, 36, 63], building vey responses. One researcher reviewed the survey data, new instruments through custom mappings or unsuper- identifying important sections of text, and two researchers vised learning [20,25], and enabling experimentation in all collaboratively examined relationships between these sec- facets of symbolic and acoustic manipulation of the musi- tions, iteratively converging on a set of themes.