Generationmania: Learning to Semantically Choreograph
Total Page:16
File Type:pdf, Size:1020Kb
GenerationMania: Learning to Semantically Choreograph Zhiyu Lin, Kyle Xiao and Mark Riedl Georgia Institute of Technology Atlanta, Georgia, United States fzhiyulin,[email protected], [email protected] Abstract to get a high score as well as reconstruct the original music, a player needs to press the correct button at the correct time Beatmania is a rhythm action game where players play the for playable objects, as well as not press any button when role of a DJ that performs music by pressing specific con- not being instructed. The controller used in this game series troller buttons to mix ”Keysounds” (audio samples) at the correct time, unlike other rhythm action games such as Dance is also unique: It features both 7 buttons and a “turntable” Dance Revolution. It has an active amateur Chart (Game control which the player scratches instead of presses. stage) creation community, though chart authoring is con- A fundamental difference between BMIIDX and many sidered a difficult and time-consuming task. We present a other rhythm action games like DDR is that BMIIDX is con- deep neural network based process for automatically gener- sidered a game with keysounds, which means every object in ating Beatmania charts for arbitrary pieces of music. Given the chart has an audio sample counterpart which plays if and a raw audio track of a song, we identify notes according to only if the corresponding action is executed. This even in- instrument, and use a neural network to classify each note as playable or non-playable. The final chart is produced by cludes non-playable objects; their actions are automatically executed. For comparison, Guitar Hero (Miller 2009) is an- mapping playable notes to controls. We achieve an F1-score on the core task of Sample Selection that significantly beats other keysound based rhythm action game where each note LSTM baselines. in the game represents a guitar maneuver. In BMIIDX, how- ever, each note can represent an audio sample from different instruments. Introduction These differences in the underlying mechanics yields a Rhythm action games such as Dance Dance Revolution, unique paradigm for creating BMIIDX charts. Requiring Guitar Hero, and Beatmania challenge players to press keys a clear binding between objects and instrument placement or make dance moves in response to audio playback. The based on an underlying score means BMIIDX charts cannot set of actions timed to the music is a chart and is presented be overmapped. Overmapping, which happens frequently in to the player as the music plays. Charts are typically hand- DDR, describes the situation where patterns of actions are crafted, which limits the songs available to those that have unrelated to instruments being played or occur when no note accompanying charts. Learning to choreograph (Donahue, is being played by any instrument at that moment. That is, Lipton, and McAuley 2017) is the problem of automatically the creation of BMIIDX charts is strictly constrained by the generating a chart to accompany an a priori unknown piece semantic information provided by the underlying music. We of music. refer to this challenge as “Learning to semantically choreo- Beatmania IIDX (BMIIDX) is a rhythm action game, sim- graph” (LtSC). ilar to Dance Dance Revolution, with an active commu- Due to the strict relationship between chart and music—as nity of homebrew chart choreographers (Chan 2004). Un- well as other differences such as charts with several simul- like Dance Dance Revolution, players play the role of a taneously actions—the prior approach used to choreograph DJ and must recreate a song by mixing audio samples by charts for Dance Dance Revolution (Donahue, Lipton, and pressing controller buttons as directed by on-screen charts. McAuley 2017) cannot be used to generate BMIIDX charts. In BMIIDX, some notes from some instruments are played We approach the challenge of learning to semantically automatically to create a complete audio experience. That is choreograph charts for BMIIDX as a four-part process. (1) there are “playable” and “non-playable” notes in each song. we train a neural network to identify the instruments used A playable object is defined as that which appears visu- in audio sample files and the timing of each note played by ally on the chart and is available for players to perform, each instrument. (2) we automatically label the difficulty of with a one-to-one correspondence to an audio sample; on charts in our training set, which we find improves chart gen- the other hand, a non-playable object is one that is auto- eration accuracy. (3) We train a supervised neural network matically played as part of the background music. In order that translates a musical context into actions for each time slice in the chart. Unlike Dance Dance Convolution (Don- ahue, Lipton, and McAuley 2017), which used an LSTM, Figure 1: A visualization of a Beatmania IIDX homebrew chart, Poppin’ Shower. Notes in this screen shot are labeled with their author-created filenames. Note that only objects in the columns starting with A are playable objects that is actually visible to players; Others are Non-playable objects used as background. we find a feed forward model works well for learning to se- Figure 1 shows an example of a BMIIDX homebrew mantically choreograph when provided a context containing chart. The objects in the “A” columns are playable objects the instrument class of each sample, intended difficulty label with keysounds. of each sample, the beat alignment, and a summary of prior instrument-to-action mappings.1 (4) Notes predicted to be Rhythm Action Game Chart Choreography playable by the network are mapped to controls and the final There is a handful of research efforts in chart choreogra- chart is constructed. In addition, we introduce the BOF2011 phy for rhythm action games, including rule-based gener- dataset for Beatmania IIDX chart generation. ation (OKeeffe 2003; Smith et al. 2009) and genetic algo- rithms using hand-crafted fitness functions (Nogaj 2005). Background and Related Work Dance Dance Convolutions is the first deep neural network Procedural Content Generation (PCG) is defined as “the cre- based approach to generate DDR charts (Donahue, Lipton, ation of game content through algorithmic means”. Machine and McAuley 2017). Donahue et al. refer to the problem of learning approaches treat content generation as (a) learning learning chart elements from data as Learning To Choreo- a generative model and (b) sampling from the model during graph. Alemi et al. (2017) suggest that this approach can creation time (Summerville et al. 2017; Guzdial and Riedl reach real-time performance if properly tuned. 2016; Summerville and Mateas 2015; Hoover, Togelius, and Dance Dance Convolutions uses a two-stage approach. Yannakis 2015; Summerville and Mateas 2016). Onset detection is a signal analysis process to determine the salient points in an audio sample (drum beats, melody notes, Beatmania IIDX etc.) where steps should be inserted into a chart. Step selec- The homebrew community of BMIIDX is arguably one of tion uses a long-short term memory neural network (Hochre- the oldest and most mature one of its kind (Chan 2004), iter and Schmidhuber 1997) to learn to map onsets to spe- with multiple emulators, an open format (Be-music Source2, cific chart elements. BMIIDX chart generation differs from BMS) and peer-reviewed charts published in semi-yearly DDR in that the primary challenge is determining whether proceedings. Despite the community striving to provide the each note for each instrument should be playable as a stage highest quality charts, the process of generating such a chart object or non-playable (i.e., automatically played for the is considered a heavy workload, and due to the strict se- player). mantic bindings, usually the author of the music or a vet- eran chart author has to participate in the generation of the Data chart. Many aspiring amateur content creators start by build- We compiled a dataset of songs and charts from the “BMS ing charts for rhythm action games without keysounds (i.e. Of Fighters 2011” community driven chart creation initia- Dance Dance Revolution charting is considered by the com- tive. During this initiative, authors created original music munity to be easier). Furthermore, there is a strong demand and charts from scratch. The dataset thus contains a wide for customized charts: players have different skill levels and variety of music and charts and was composed by various different expectations on the music-chart translation, and groups of people. Although the author is not required to cre- such charts are not always available to them. ate a defined spread of different charts for a single piece of 1A fixed window feed-forward network can often outperform a music for the event, authors frequently build 3 to 4 charts for recurrent network when short-term dependencies have more impact each song. The dataset, which we refer to as “BOF2011”, than long-term ones (Miller and Hardt 2018). consists of 1,454 charts for 366 songs. Out of 4.3M total ob- 2https://en.wikipedia.org/wiki/Be-Music_ jects, 28.7%, or 1.24M of them are playable ones. Table 1 Source summarizes the dataset. is available, we focus on Sample Classification, Challenge Table 1: BOF2011 dataset summary. Modeling and Sample Selection which is unique to BMIIDX # Songs 366 stage generation. # Charts 1,454 # Charts per song 3.97 Sample Classification # Unique audio samples 171,808 # Playable objects 1,242,394 Sample classification is a process by which notes from dif- # Total objects 4,320,683 ferent instruments in the audio samples are identified. The Playable object % 28.7 BMS file format associates audio samples with timing infor- mation. That is, a chart is a set of sample-time pair contains a pointer to the file system where an audio file for the sam- We find that modeling the difficulty of charts plays an ple resides.