Multi-Dimensional Computer-Driven Quantitative Analysis of the Music and Lyrics of the Beatles
Total Page:16
File Type:pdf, Size:1020Kb
Multi-Dimensional Computer-Driven Quantitative Analysis of the Music and Lyrics of the Beatles Cedric McDougal Northeastern University Data retrieved through computer-driven quantitative textual analysis of Beatles lyrics was combined with data retrieved from computer-driven quantitative auditory analysis of Beatles music in order to build an extensive dataset that can answer a wide variety of multi-dimensional questions. This dataset spans the Beatles' career and provides metrics such as danceability, energy, positive emotions, negative emotions, self-referential lyrics, big words, etc. It can be indexed by year, album, songwriter, chart position, and more. This study explores a number of general relationships and trends within the data in order to best demonstrate the types of questions that can be answered with this dataset. Though the official Beatles career lasted less than over human beings in the task of parsing natural a decade, they were extremely prolific during their language. A computer program might not be flexible time as a band. They wrote and performed over 200 enough to correctly parse a sentence with multiple songs, many of which were never recorded, never subjects, it often cannot handle words that are not in released, or released after the band broke up. the English dictionary, and it is limited in its ability to Throughout this time, their songwriting style changed extract high level patterns from an unstructured dramatically, and they consistently broke new ground source of data.5 On the other hand, a computer in the pop music world. There has been much study program can quickly analyze large amounts of text, of these changes using a number of different eliminate human error, avoid human bias, elucidate a techniques. Riley1 analyzes the music and lyrics high number of interconnected patterns, and within the context of the historical events of the deterministically achieve results.6 Furthermore, Beatles' lives. Inglis2 examines how the group's research has already suggested that computer-driven approach to love changed over time by categorizing textual analysis can correctly deduce stylistic and their love songs. Whissell3 and Petrie4 use emotional properties when given the lyrics to Beatles computer-driven linguistic analysis to reveal the music.7 This suggests that the use of computer-driven emotional and stylistic progression of the band's textual analysis is acceptable for the purpose of this lyrics over time. This study aims to expand on the last study. approach by combining computer-driven linguistic Currently, computers are better at analyzing audio analysis with computer-driven auditory analysis. than they are at parsing natural language. A program The effectiveness of computer-driven analysis is can verifiably extract (or fail to extract) tempo, key, important to consider before relying on it as a tool for mode, duration, time signature, and loudness from a study. Computers have a few distinct disadvantages given audio file. A computer program can also effectively extrapolate more generic concepts, such as how suitable a song is for dancing, the intensity of a 1 Riley, Tim. “For the Beatles: Notes on Their 8 Achievement,” Popular Music, Vol. 6, No. 3, Beatles song, and the presence of spoken words. These Issue (Oct., 1987): 257-271. 2 Inglis, Ian. “Variations on a Theme: The Love Songs of 5 Bright, Melissa A. and Dawn O'Connor. “Qualitative the Beatles,” International Review of the Aesthetics Data Analysis: Comparison Between Traditional and and Sociology of Music, Vol. 28, No. 1 (Jun., 1997): Computerized Text Analysis,” The Osprey Journal of 37-62. Ideas and Inquiry, All Volumes, Paper 21 (2007): 1. 3 Whissell, Cynthia. “Traditional and Emotional 6 Bright and O'Connor, “Qualitative Data Analysis,” Stylometric Analysis of the Songs of Beatles Paul 2–3. McCartney and John Lennon,” Computers and the 7 Whissell, “ Emotional Stylometric Analysis,” 257. Humanities, Vol. 30, No. 3 (1996): 257–265. 8 Schindler, Alexander and Andreas Rauber. “Capturing 4 Petrie, Keith J., James W. Pennebaker and Borge the Temporal Domain in Echonest Features for Sivertsen. “Things We Said Today: A Linguistic Improved Classification Effectiveness:” 4, accessed Analysis of the Beatles,” Psychology of Aesthetics, April 22, 2013. Creativity, and the Arts, Vol. 2, No. 4, (2008): http://www.ifs.tuwien.ac.at/~schindler/pubs/AMR2012 197–202. .pdf metrics would be extremely difficult and bias-prone categories or subcategories, such as sadness, overall for a human to produce. This suggests that the use of affect, verb, past tense, etc. This dictionary has been a computer is also suitable for auditory analysis in development for over ten years and has undergone within this study. numerous updates and revisions. When LIWC analyzes a text, it uses this dictionary to give the text Method a score for each word-type category. The free version of LIWC, which was used for this study, gives access Selection of Songs to seven language dimensions, which are explained on the LIWC website. “Self references” are words This study uses all the songs written by John such as “I,” “me,” and “my.” People who use a lot of Lennon, Paul McCartney, George Harrison, and/or self-referential words tend to be “more insecure, Ringo Starr that were released on Beatles albums or nervous, and possibly depressed.” “Social words” as singles between 1962 and 1970. This included reference other people. A high use of social words thirteen albums (including Yellow Submarine and suggests that a person is outgoing. “Positive emotion Magical Mystery Tour) and 184 songs. words” suggest a person is optimistic. “Negative Information about each song, such as year, album, emotion words” suggest a person is anxious or and songwriter, were taken from Wikipedia.9 For the neurotic. “Overall cognitive words” are words that most part, this dataset uses the primary songwriter “reflect how much people are actually thinking about when a song was credited to Lennon-McCartney, but their writing topic.” “Articles” (a, an, the), in large was written by one or the other. Some songs, such as amounts, suggest a person is more concrete. “Big “One After 909,” have the songwriter listed as words” (words with more than six letters) suggest a “Lennon, with McCartney.” The Wikipedia dataset person is less emotional and more detached. These was the most convenient and comprehensive dataset interpretations have been crafted through research, available. However, it is possible that some of the but do not apply to every situation. The metrics information may be incorrect. If this is the case, the themselves must still be viewed within the context of amount of incorrect data is likely minimal enough to their generation. have little affect on the results, as Wikipedia can Along with the use of LIWC, a custom script was usually be trusted for basic facts about well known written to calculate simple lyrical metrics, including subjects. word count, unique word count, and longest word. Lyrical Analysis Auditory Analysis The lyrics were downloaded from two sources The auditory analysis was done with a program (beatlesnumber9.com/; sing365.com). No cleansing called The Echo Nest (the.echonest.com). The Echo was done on the lyrics, and not every song was Nest offers an online API through which developers checked for accuracy. This means some analysis can upload music to be analyzed. The analysis might be thrown off by a repeated chorus, vocal provides simple metrics, such as tempo, key, sounds, non-English words, or incorrect lyrics. These duration, mode (major/minor), time signature, and insufficiencies may affect the metrics related to loudness (dB). It also offers some derived metrics. lyrical analysis and should be taken into account “Danceability” is determined by tempo, rhythm when interpreting the results. stability, beat strength, and overall regularity. The lyrical analysis itself was done with a “Energy” is determined by dynamic range, perceived program called Linguistic Inquiry and Word Count loudness, timbre, onset rate, and general entropy. The (LIWC).10 LIWC uses a special dictionary of almost Echo Nest can also determine the amount of spoken 4,500 words, each of which defines one or more words (“speechiness”) and whether the recording is a live performance or not (“liveness”). The audio files 9 Wikipedia. “List of songs recorded by the Beatles.” that were uploaded for analysis were from Last modified April 21, 2013. Blackboard (blackboard.neu.edu). http://en.wikipedia.org/wiki/List_of_songs_recorded_b One of the major downsides to The Echo Nest is y_the_Beatles. that it cannot detect multiple keys. This is especially 10 Pennebaker, James W., Roger J. Booth and Martha E. important with the music of the Beatles because their Francis. Linguistic Inquiry and Word Count (2007). songs often change keys. Any analysis that uses the Accessed April 22, 2013. key must be tweaked to account for this limitation. http://www.liwc.net/index.php Results Fig. 2 % Danceability # of Songs Danceability Harrison 48 22 It is generally accepted that the earlier Beatles Lennon 52 74 Lennon/McCartney 55 13 songs were much better to dance to than the later McCartney 58 70 Beatles songs, because many later songs, such as All 60 3 “Revolution 9” and “Tomorrow Never Knows” were Starkey 61 2 a lot more experimental and non-traditional. This suggests that the danceability metric retrieved from Note. “% Danceability” is the average across songs. The Echo Nest should decrease over time. In fact, this is not what the data shows (Fig. 1). 1963 has one of the lowest danceability, which is to be expected the lowest danceabilities, and 1969 has one of the because of his songs that include free-form Indian highest. To explain this unexpected result, it is rhythms. Lennon has the next lowest danceability. This is likely due to his tendency to write his music to Fig. 1 fit the rhythm of his lyrics.