Multivariate Analysis of Korean Pop Music Audio Features
Total Page:16
File Type:pdf, Size:1020Kb
MULTIVARIATE ANALYSIS OF KOREAN POP MUSIC AUDIO FEATURES Mary Solomon A Thesis Submitted to the Graduate College of Bowling Green State University in partial fulfillment of the requirements for the degree of MASTERS OF SCIENCE May 2021 Committee: John Chen, Advisor Junfeng Shang Copyright © May 2021 Mary Solomon All rights reserved iii ABSTRACT John Chen, Advisor K-pop, or Korean pop music, is a genre originating from South Korea that features various musical styles such as hip hop, R&B, and electronic dance. Modern K-pop started with Seo Taiji and Boys in 1992 and has since evolved through stylistic eras called ‘generations’ to become a worldwide sensation. K-pop’s global popularity can be recognized by the success of groups such as BTS and BlackPink. How do the musical qualities of K-pop songs contribute to the genre’s popularity? Furthermore, how have the musical qualities contributed to the evolution of becoming the global phenomenon it is today? To explore these questions and more, multivariate analysis will be performed on a curated dataset of 12,012 K-pop songs and their audio features. The audio features, collected with Spotify’s Web API, include variables such as Danceability, Loudness, Acousticness, and Valence. The audio features contribution and trends in the evolution of K-pop will be analyzed with nonparametric statistical approaches, Multiple Linear Regression (MLR) and Logistic Regression models. MLR and Logistic Regression will also be used to examine the relationship between the audio features and popularity. Finally, dimension reduction of the audio features performed by Principal Components Analysis paired with K-means clustering will be utilized to explore the possibility of optimizing song clusters within K-pop. iv This thesis is dedicated, in memoriam, to Kim Jonghyun, Choi ‘Sulli’ Jin-Ri, and Goo Hara. Their artistry, talent, hard work, and influence in Korean pop music will continue to live on in their legacy. v ACKNOWLEDGMENTS First, I would like to thank my advisor, Dr. John Chen for his supportive guidance and encouragement throughout this process. Additionally, I would like to thank Dr. Junfeng Shang for serving on my committee and providing valuable feedback on my work. Overall, I extend my gratitude to the Bowling Green State University Mathematics and Statistics department for being a supportive community, providing endless amounts of help and always encouraging my exploration of creative research pursuits. Additionally, I would like to thank all of the music teachers who have nurtured my lifelong passion for music. I would like to extend a special thanks to my friend Minso Choi for providing Korean to English translations, allowing me to thoroughly perform this research. Finally, I give my greatest appreciation to all of my friends and family who have cheered me on throughout all of my endeavors. vi TABLE OF CONTENTS Page CHAPTER 1 INTRODUCTION ............................... 1 CHAPTER 2 BACKGROUND ................................ 3 2.1 Defining K-pop .................................... 3 2.2 K-pop Generations .................................. 4 CHAPTER 3 DATA COLLECTION ............................. 6 3.1 Overview ....................................... 6 3.2 Spotify Audio Features ................................ 8 3.3 Data Filtering and Selection Criteria ......................... 10 3.4 Distribution of Data .................................. 11 CHAPTER 4 MODELING STRATEGIES FOR SPOTIFY AUDIO FEATURES . 15 4.1 Overview ....................................... 15 4.2 Multiple Linear Regression .............................. 15 4.3 Binary Logistic Regression .............................. 16 4.4 Variable Selection ................................... 19 4.5 Regularized Regression ................................ 20 CHAPTER 5 NONPARAMETRIC ANALYSIS OF AUDIO FEATURES . 22 5.1 Introduction ...................................... 22 5.2 Methodology ..................................... 22 5.2.1 Wilcoxon Sum Rank Test ........................... 22 5.2.2 Kruskal-Wallis Test .............................. 23 5.3 Comparing K-pop Generations ............................ 24 5.4 Comparing Male and Female Artists ......................... 29 5.5 Comparing Group and Solo Artists .......................... 31 vii CHAPTER 6 CLASSIFYING NEW GENERATION SONGS . 34 6.1 Introduction ...................................... 34 6.2 Minor Mode Results .................................. 35 6.3 Major Mode Results .................................. 36 6.4 Comparison of Minor and Major Mode Models . 37 CHAPTER 7 PREDICTING SONG RELEASE DATE . 40 7.1 Introduction ...................................... 40 7.2 Data Preparation .................................... 40 7.3 Model Assumptions .................................. 41 7.4 Minor Mode Results .................................. 43 7.5 Major Mode Results .................................. 45 7.6 Comparison of Minor and Major Mode Models . 48 CHAPTER 8 PREDICTING POPULARITY ......................... 49 8.1 Introduction ...................................... 49 8.2 Linear Regression Approach ............................. 49 8.2.1 Data Preparation ............................... 49 8.2.2 Minor Mode Results ............................. 50 8.2.3 Major Mode Results ............................. 53 8.2.4 Comparison of Minor and Major Mode Models . 54 8.3 Logistic Regression Approach ............................ 55 8.3.1 Minor Mode Results ............................. 55 8.3.2 Major Mode Results ............................. 56 8.3.3 Comparison of Minor and Major Mode Models . 57 8.4 Comparing Linear and Logistic Regression Approach . 58 CHAPTER 9 PRINCIPAL COMPONENTS ANALYSIS . 59 9.1 Introduction ...................................... 59 viii 9.2 Methodology ..................................... 59 9.2.1 Principal Components Analysis ....................... 59 9.2.2 K-means Clustering .............................. 60 9.3 Dimension Reduction: PCA ............................. 61 9.3.1 Data Preparation ............................... 61 9.3.2 Minor Mode Results ............................. 62 9.3.3 Major Mode Results ............................. 65 9.3.4 Comparison of Minor and Major Mode Results . 67 9.4 Clustering on the Principal Components ....................... 68 CHAPTER 10 RESEARCH LIMITATIONS .......................... 71 CHAPTER 11 CONCLUSION ................................. 73 BIBLIOGRAPHY ........................................ 76 APPENDIX A WILCOXON PAIRWISE COMPARISON RESULTS . 79 APPENDIX B MLR AND LOGISTIC REGRESSION MODEL RESULTS . 82 ix LIST OF FIGURES Figure Page 3.1 Idology’s Generation Theory Table .......................... 6 3.2 Translated Idology Generation Theory Table . 7 3.3 Distribution of Popularity, Acousticness, Instrumentalness, and Speechiness. 11 3.4 Distribution of Energy and Loudness. ........................ 12 3.5 Distribution of Duration, Danceability, Tempo, and Valence. 13 3.6 Frequency of Musical Keys .............................. 14 7.1 Distribution of Month Release ............................ 41 7.2 Distribution of Transformed Month Release . 42 7.3 Diagnostic Plots for Minor Mode Song Release Dates . 43 7.4 Diagnostic Plots for Major Mode Song Release Dates . 46 8.1 Distribution of Popularity ............................... 50 8.2 Distribution of Transformed Popularity ........................ 51 8.3 Diagnostic Plots for Minor Mode Popularity . 52 8.4 Diagnostic Plots for Major Mode Popularity . 53 9.1 Scree Plot for Minor Mode Principal Components . 62 9.2 Scree Plot for Major Mode Principal Components . 65 9.3 Silhouette Plots for Optimal K ............................ 69 9.4 K-means Clustering Scatter-plots ........................... 70 1 Full Logistic model for Minor Mode Generation Classification . 82 2 Stepwise Logistic Model for Minor Mode Generation Classification . 83 3 All Possible Subsets Logistic Model for Minor Mode Generation Classification . 84 4 Full Logistic Model for Major Mode Generation Classification . 86 5 Stepwise Logistic Model for Major Mode Generation Classification . 87 x 6 All Possible Subsets Logistic Model for Major Mode Generation Classification . 88 7 Full MLR Model for Minor Mode Song Release Date . 90 8 Stepwise Linear Regression Model for Minor Mode Song Release Date . 91 9 All Possible Subsets Linear Regression Model for Minor Mode Song Release Date 92 10 Full MLR Model for Major Mode Song Release Date . 94 11 Stepwise Linear Regression Model for Major Mode Song Release Date . 95 12 All Possible Subsets Linear Regression Model for Major Mode Song Release Date 96 13 Full MLR model for Minor Mode Song Popularity . 98 14 Stepwise Linear Regression Model for Minor Mode Song Popularity . 99 15 All Possible Subsets Linear Regression Model for Minor Mode Song Popularity . 100 16 Full MLR Model for Major Mode Song Popularity . 102 17 Stepwise Linear Regression Model for Major Mode Song Popularity . 103 18 All Possible Subsets Linear Regression Model for Major Mode Song Popularity . 104 19 Full Logistic Model for Minor Mode Popularity Classification . 106 20 Stepwise Logistic Model for Minor Mode Popularity Classification . 107 21 All Possible Subsets Logistic Model for Minor Mode Popularity Classification . 108 22 Full Logistic Model for Major Mode Popularity Classification . 110 23 Stepwise Logistic Model for Major Mode Popularity Classification . 111 24 All Possible Subsets Logistic Model for Major Mode Popularity Classification . 112 xi LIST OF TABLES Table Page 4.1 Classification Assessment: Confusion