Automatic Generation of Rap Lyrics Using Sequence-To-Sequence Learning
Total Page:16
File Type:pdf, Size:1020Kb
DeepLyricist: Automatic Generation of Rap Lyrics using Sequence-to-Sequence Learning Nils Hulzebosch Student ID: 10749411 A thesis presented for the degree of Bachelor Artificial Intelligence 18 ECTS Faculty of Science University of Amsterdam Netherlands July 2, 2017 M.Sc. Mostafa Dehghani Dr. Sander van Splunter University of Amsterdam University of Amsterdam [email protected] [email protected] First supervisor Second supervisor Abstract This thesis demonstrates the use of sequence-to-sequence learning for the automatic generation of novel English rap lyrics, with the goal of generating lyrics with similar qualities to those of humans in terms of rhyme, idiom, structure, and novelty. The sequence-to-sequence model tries to learn the best parameters for generating target sequences given source sequences, and is trained on over 1.6 million source-target pairs, containing lyrics from 348 different rap artists. The automatic evaluations of each of the four characteristics show that idiom and structure have the best performance, whereas rhyme and novelty should be improved to be in a similar range of human lyrics. Future research could focus on using hierarchical models to improve the learning and generation of rhyme, structure, and possibly novelty, and implementing a sampled probability to increase the uniqueness of generated lyrics. 1 Contents Acknowledgements 4 1 Introduction 5 2 Related work 6 3 Characteristics of rap lyrics 7 3.1 Rhyme . .7 3.2 Rhyme schemes . .9 3.3 Song structure . 10 3.4 Idiom . 11 3.5 Novelty . 12 4 Evaluation Methodology 12 4.1 Rhyme . 12 4.1.1 Modified rhyme density . 13 4.1.2 End rhyme score . 14 4.2 Song structure . 15 4.3 Idiom . 16 4.4 Novelty . 16 5 Model architecture 16 5.1 Explanation of model . 16 5.2 Attention mechanism . 17 5.3 Modified loss function . 18 6 Dataset 19 6.1 Gathering, selection, and cleansing of data . 19 6.2 Addition of structure . 19 6.3 Preparing the data for the model . 22 6.3.1 Tokenization . 22 6.3.2 Source-target pairs . 22 6.3.3 Vocabulary . 22 6.3.4 Word embeddings . 23 7 Experiments 23 7.1 Training the model . 23 7.2 Generation of lyrics . 24 8 Results 25 8.1 Evaluation of rhyme density . 26 8.2 Evaluation of end rhyme score . 26 8.3 Evaluation of structure . 27 8.4 Evaluation of idiom . 29 2 8.5 Evaluation of novelty . 29 8.6 Unknown token . 31 9 Conclusion 31 10 Discussion 32 10.1 Interpretation of results . 32 10.2 Possible limitations . 33 10.3 Future work . 34 Appendix A. List of 348 rappers and rap groups used for training data. 39 Appendix B. Example of generated lyrics. 41 Appendix C. Example of learned word embedding. 42 3 Acknowledgements "Under pressure, I've been feeling under pressure." - Logic (Under Pressure) The line above, which is cited from one of my favourite rappers, pretty much describes my state of mind when finishing this thesis. I've had a lot of setbacks during this project, but due to the support of several people, I now finished the project with a sense of satisfaction. First of all, I would like to express my thanks to Mostafa Dehghani for giving me the opportunity to research this subject. His knowledge, enthusiasm, and willingness were the main factors in making this project possible. I enjoyed our collaboration! Secondly, I would like to thank Sander van Splunter for the analytic guidance throughout this project, which helped me approach the scientific edge of this fairly subjective subject. Furthermore, I would like to thank my parents and brother, along with my family and friends for their support, with a special thanks to my father, who helped me develop the criteria for the model and evaluation, using his knowledge of cognitive science and music. Finally, I thank my girlfriend for her loving support during this intensive project. Nils Hulzebosch Amsterdam, July 2, 2017. 4 1 Introduction The writing of rap lyrics is a complex process that requires creativity and skill to create an interesting story that involves stylistic elements such as rhyme. This thesis describes the research of the automated generation of rap lyrics, which is motivated as follows: first, to explore the capacities of a deep-learning model with respect to the generation of completely new lyrics instead of the reproduction of existing ones. Secondly, to research how deep-learning can be used to generate unique, creative lyrics instead of choosing common or existing ones. Lastly, to determine the best evaluation methods for generated rap lyrics. Rap lyrics often have a clear structure, due to rhyme, along with structural parts such as intro, chorus, bridge, verse, and outro. One aim is to let the model learn these aspects to enable the generation of new lyrics with rhyme and structural parts. Furthermore, characteristics such as idiom (the words used in rap) and novelty (the uniqueness of generated lyrics) will be addressed. In this research, the generation of rap lyrics is seen as a sequence-to-sequence problem, in which given a sequence of tokens, a new sequence is generated (Sutskever et al., 2014). This approach is for example used in Neural Machine Translation, where a sentence in the source language maps to a sentence in the target language (Bahdanau et al., 2014; Cho et al., 2014). Furthermore, it is used in automatic summarisation, where each sequence containing a full article maps to a summary sequence (Chopra et al., 2016; Nallapati et al., 2016a,b; Rush et al., 2015). For the generation of rap lyrics, the input sequence is a line from a song and the output sequence is the following line. This way, the model could learn to predict an appropriate next line in terms of content and style. The main research question is how a sequence-to-sequence model can be used to generate novel English rap lyrics with similar qualities to those of human rappers. To evaluate the novelty of the generated lyrics, two methods are used, inspired by the methods of Potash et al. (2016): the first is a measurement of amount of uniquely generated sequences, duplicates within the generated dataset, and duplicates compared to the training dataset. The second is a comparison of cosine-similarities of training and generated data. For the evaluation of similarity to human rap lyrics, three important properties of rap lyrics are examined: the first is rhyme, using a method similar to rhyme density (Malmi et al., 2015; Potash et al., 2016), along with 'end rhyme score', which is a newly developed method. Secondly, idiom is measured by comparing the amount of characteristic words for rap in the training and generated datasets. Thirdly, the structure is measured by calculating the amount of correctly generated structure tokens, which are added to the training dataset during the pre-processing of the data. The hypothesis is that a sequence-to-sequence model could generate novel English rap lyrics with similar qualities to those of human rappers, when trained on a sufficiently large dataset enriched with structural information, including an attention mechanism to improve rhyme. The proposed approach is a data-driven machine learning method, that needs the construction of a large dataset for the training of the model. The resulting dataset contains over 2 million lines from 348 different rap artists. The sequence-to-sequence model is implemented in TensorFlow (Abadi et al., 2016). The model in general consists of two 5 LSTM Recurrent Neural Networks (Hochreiter and Schmidhuber, 1997) for encoding and decoding. Furthermore, a word-level attention mechanism is employed during decoding (Bahdanau et al., 2014). The contributions of this thesis are the following: 1) the demonstration of a sequence- to-sequence approach for the generation of novel rap lyrics, 2) the demonstration of computational, quantitative evaluation methods for rhyme, idiom, structure, and novelty 3) the evaluation of lyrics generated by a sequence-to-sequence model with attentional LSTM Recurrent Neural Networks compared to those of human rappers, and 4) the construction of a large dataset of structured rap lyrics. The thesis is divided into the following sections. The first is a discussion of relevant work. The second is an overview of the characteristics of rap lyrics, followed by a section that discusses the evaluation methodology. Next, the architecture of the sequence-to- sequence model is explained, along with the attention mechanism and a modification to the model. The fifth section discusses the acquisition of the dataset, including the gathering, pre-processing, and structuring of data. The sixth discusses the experiments, including training the model and the generation of lyrics. This is followed by an examination of the results of generation and evaluation. Finally, the research is concluded, along with a discussion regarding interpretations of results, possible limitations, and future work. 2 Related work Recent research on the automated generation of rap lyrics has been done by Malmi et al. (2015), using a retrieval based model. When this deep-learning model is trained with a large corpus of rap lyrics, it learns the automated combination of existing lyrics with substantive and stylistic similarities, having a new verse with closeness in content and style as a result. However, this model has the following limitations: 1) the requirement of pre-defined functions for the measurement of content and style, 2) ambiguity about the uniqueness of generated lyrics, and 3) the limited granularity of the model's generative capacities, which is on the line-level and not on the word-level. An alternative deep-learning architecture, a neural language model, is used by Potash et al. (2015) to learn the language model of rap lyrics.