An Exploration of Parliamentary Speeches in the Irish Parliament Using Topic Modeling
Total Page:16
File Type:pdf, Size:1020Kb
Technological University Dublin ARROW@TU Dublin Dissertations School of Computer Sciences 2018 An Exploration of Parliamentary Speeches in the Irish Parliament Using Topic Modeling Fiona Leheny Technological University Dublin Follow this and additional works at: https://arrow.tudublin.ie/scschcomdis Part of the Computer Sciences Commons Recommended Citation Leheny, Fiona (2018). An exploration of parliamentary speeches in the Irish parliament using topic modeling. Masters dissertation, DIT, 2018. This Dissertation is brought to you for free and open access by the School of Computer Sciences at ARROW@TU Dublin. It has been accepted for inclusion in Dissertations by an authorized administrator of ARROW@TU Dublin. For more information, please contact [email protected], [email protected]. This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 4.0 License An Exploration of Parliamentary Speeches in the Irish Parliament using Topic Modeling Fiona Leheny A dissertation submitted in partial fulfilment of the requirements of Dublin Institute of Technology for the degree of M.Sc. in Computing (Data Analytics) September 2017 Declaration I certify that this dissertation which I now submit for examination for the award of MSc in Computing (Data Analytics), is entirely my own work and has not been taken from the work of others save and to the extent that such work has been cited and acknowledged within the text of my work. This dissertation was prepared according to the regulations for postgraduate study of the Dublin Institute of Technology and has not been submitted in whole or part for an award in any other Institute or University. The work reported in this dissertation conforms to the principles and requirements of the Institutes guidelines for ethics in research. Signed: Date: I Abstract The only resource available in the public domain which highlights parliamentary ac- tivity is parliamentary questions. Up until the last ten years, manual content analysis was carried out to classify these. More recently, machine learning techniques have been used to automatically classify and analyse these data sets. This study analyses the verbal parliamentary speeches in the Irish Parliament (known as the D´ail)over a ten- year period using unsupervised machine learning. It does so by applying a less utilised topic modeling technique, known as Non-negative Matrix Factorisation (NMF), to de- tect the latent themes in these speeches. A two-layer dynamic approach using NMF is applied to extract the themes raised in these speeches at a point in time and over the entire period. The findings suggest that the themes raised vary from very niche subject matter areas to more general areas and have evolved over time. The trend in the topics raised over the entire period give an indication of what the political agenda was during these D´ailterms. Furthermore, reviewing the topics at a party and indi- vidual TD level demonstrate what their political priorities are. Conversely, reviewing the topics that parties and TDs are not discussing gives an insight into the themes that they have no interest in. Keywords: Parliamentary questions, Parliamentary speeches, Text mining, Topic modeling, Non-negative Matrix Factorisation, Clustering. II Acknowledgments I would like to give a sincere thank you to my supervisor, Dr. Sarah Jane Delaney, for her guidance and sharing of expertise throughout this research project. Also, a special thanks to Dr. Derek Greene for providing the data set, the publishsed tools and additional guidance along the way. Finally, I would not have been in a position to complete this Masters course without the support and patience of my husband, Colm, and my parents. Thank you all. III Contents Declaration I Abstract II Acknowledgments III Contents IV List of Figures VIII List of Tables XI List of Acronyms XII 1 Introduction 1 1.1 Research Project/problem . .1 1.2 Research Objectives . .2 1.3 Research Methodologies . .3 1.4 Scope and Limitations . .4 1.5 Document Outline . .4 2 Review of existing literature 6 2.1 Irish Parliament Overview . .7 2.2 Parliamentary Questions . .7 2.3 Techniques for analysing Parliamentary Questions . 10 2.4 Challenges and Limitation of PQs . 12 IV 2.5 Overview of Machine Learning . 12 2.5.1 Supervised Machine Learning . 14 2.5.2 Unsupervised Machine Learning . 16 2.6 Topic modeling . 17 2.6.1 Applications . 18 2.6.2 LDA . 19 2.7 NMF . 20 2.7.1 Theory of NMF . 20 2.7.2 Example of NMF . 22 2.7.3 Application of NMF . 22 2.7.4 Dynamic Topic modeling . 23 2.8 Validity and Interpretation of Topic Models . 25 2.8.1 Topic Coherence . 25 2.9 Parameter Selection . 27 2.10 Strengths and Limitations . 28 2.11 Summary of Literature Review . 28 3 Design and Methodology 31 3.1 Overview of Approach/Design . 32 3.2 Data Understanding . 34 3.2.1 Structure of data set . 34 3.3 Data Preparation . 38 3.3.1 Data Cleansing . 38 3.3.2 Data Construction/Partitioning . 40 3.3.3 Data Pre-processing . 42 3.4 Modeling . 43 3.4.1 Modeling Layer 1 NMF . 44 3.4.2 Modeling Layer 2 NMF . 44 3.4.3 Build Word2Vec model . 44 3.4.4 Evaluation of Number of Topics for each Time Window . 44 V 3.4.5 Rerun layer 1 NMF . 45 3.4.6 Evaluation of Number of Dynamic Topics . 45 3.4.7 Review Outputs and Refine Model using Parameterisations . 45 3.4.8 Reformat and Load Modeling outputs . 48 3.4.9 Identify Labels for Dynamic Topics . 48 3.4.10 Visualise Final Results . 48 3.5 Evaluation Methods . 49 3.5.1 Window Topic Modeling Evaluation . 49 3.5.2 Dynamic Topic Modeling Evaluation . 49 3.6 Strength & Limitations of the Solution/Approach . 49 4 Implementation, Evaluation and Results 51 4.1 Introduction . 51 4.2 Key Implementation details . 52 4.2.1 Data Pre-processing . 52 4.2.2 Modeling . 52 4.3 Results . 53 4.3.1 Modeling Results . 53 4.3.2 Evaluation of Results . 54 4.3.3 Manual Review & Labelling of Topics . 56 4.4 Trend Analysis . 59 4.4.1 Topic priorities overall in the D´ail. 59 4.4.2 Topic priorities by party and constituency . 61 4.4.3 Further detailed analysis on specific topics . 62 4.5 Further Discussion . 71 4.6 Summary of Analysis . 75 5 Conclusion 78 5.1 Research Overview . 78 5.2 Problem Definition . 79 5.3 Design/Experimentation, Evaluation & Results . 79 VI 5.4 Contributions and Impact . 81 5.5 Future Work & Recommendations . 82 References 84 A Additional tables and figures 89 A.1 Top ranked speeches in Financial Crisis topic across all time windows . 89 A.2 Heat Maps - Topics and TDs by % of total speeches a TD made . 89 B Python Code (Sample) 97 B.1 Data pre-processing-exract nouns code . 97 VII List of Figures 2.1 Literature Review Chapter layout . .6 2.2 Supervised Machine Learning process . 14 2.3 Topic Modeling Approach . 17 2.4 Topic modeling using NMF . 21 2.5 NMF Multiplicative Update Rule. 21 2.6 NMF Topic Model Example of TDM (A) . 23 2.7 NMF Topic Model Example of W and H factors . 23 2.8 Dynamic NMF Topic Model - Layer 1 and 2 . 24 3.1 CRISP-DM project life cycle(Chapman et al., 2000) . 32 3.2 Percentage of speeches made in Irish by political party from 2006 to 2015 38 3.3 Speeches made per D´ailterm by opposition or government party members 39 3.4 Speeches made per year . 41 3.5 Speeches made per quarter per year . 42 3.6 Speeches made in monthly category across full period 2006 -2105 . 42 4.1 Number of Topics versus Number of Speeches for each time window . 54 4.2 Number of topics selected for each time window . 55 4.3 Topic Coherence across each time window . 55 4.4 Financial Crisis by top terms within related Window Topics . 57 4.5 Water by top terms within related Window Topics . 57 4.6 Health by top terms within related Window Topics . 58 4.7 Dynamic topics importance over ten year period . 59 VIII 4.8 Dynamic topics importance over ten year period . 60 4.9 Dynamic topics by number of speeches and by party . 61 4.10 Niche dynamic topics by number of speeches and by constituency . 63 4.11 Dynamic topics by number of speeches (5 selected) . 64 4.12 Financial Crisis - Number of speeches by government/opposition mem- ber over time . 66 4.13 Water - Number of speeches by Year . 66 4.14 Health - Number of speeches by Year . 66 4.15 Guards/Crime - Number of speeches by Year . 67 4.16 Housing - Number of speeches by Year . 67 4.17 Topic: TD by number of speeches (Top 20) . 68 4.18 Topic: TD by number of speeches (Top 20) . 70 4.19 TDs by number of speeches (Top 20) . 71 4.20 % of TDs in a party which have covered all 18 topics . 73 4.21 TDs - less than 50% topic coverage - % topic coverage versus % years speaking . 74 4.22 Fianna F´ailTDs - number topics versus number years speaking . 76 4.23 Independents -TD by topic coverage % of total speeches made by a TD 77 4.24 Socialists -TD by topic coverage by % of total speeches made by a TD 77 A.1 Labour -TD by topic coverage % of total speeches made by a TD . 91 A.2 Sinn F´ein-TD by topic coverage % of total speeches made by a TD . 91 A.3 Green Party -TD by topic coverage % of total speeches made by a TD .