Machine Learning and Sentiment Analysis Approaches for the Analysis of Parliamentary Debates
Total Page:16
File Type:pdf, Size:1020Kb
Machine Learning and Sentiment Analysis Approaches for the Analysis of Parliamentary Debates Thesis submitted in accordance with the requirements of the University of Liverpool for the degree of Doctor in Philosophy by Zaher Ibrahim Saleh Salah Faculty of Science Department of Computer Science May 2014 \It is not best that we should all think alike; it is a difference of opinion that makes horse races." Mark Twain Abstract In this thesis the author seeks to establish the most appropriate mechanism for con- ducting sentiment analysis with respect to political debates; firstly so as to predict their outcome and secondly to support a mechanism to provide for the visualisation of such debates in the context of further analysis. To this end two alternative approaches are considered, a classification-based approach and a lexicon-based approach. In the context of the second approach both generic and domain specific sentiment lexicons are considered. Two techniques to generating domain-specific sentiment lexicons are also proposed: (i) direct generation and (ii) adaptation. The first was founded on the idea of generating a dedicated lexicon directly from labelled source data. The second approach was founded on the idea of using an existing general purpose lexicon and adapting this so that it becomes a specialised lexicon with respect to some domain. The operation of both the generic and domain specific sentiment lexicons are com- pared with the classification-based approach. The comparison between the potential sentiment mining approaches was conducted by predicting the attitude of individual debaters (speakers) in political debates (using a corpus of labelled political speeches extracted from political debate transcripts taken from the proceedings of the UK House of Commons). The reported comparison indicates that the attitude of speakers can be effectively predicted using sentiment mining. The author then goes on to propose a framework, the Debate Graph Extraction (DGE) framework, for extracting debate graphs from transcripts of political debates. The idea is to represent the structure of a debate as a graph with speakers as nodes and \exchanges" as links. Links between nodes were established according to the exchanges between the speeches. Nodes were labelled according to the \attitude" (sentiment) of the speakers, \positive" or \negative", using one of the three proposed sentiment mining approaches. The attitude of the speakers was then used to label the graph links as being either \supporting" or \opposing". If both speakers had the same attitude (both \positive" or both \negative") the link was labelled as being \supporting"; otherwise the link was labelled as being \opposing". The resulting graphs capture the abstract representation of a debate where two opposing factions exchange arguments on related content. i Finally, the author moves to discuss mechanisms whereby debate graphs can be structurally analysed using network mathematics and community detection techniques. To this end the debate graphs were conceptualised as networks in order to conduct appropriate network analysis. The significance was that the network mathematics and community detection processes can draw conclusions about the general properties of debates in parliamentary practice through the exploration of the embedded patterns of connectivity and reactivity between the exchanging nodes (speakers). Keywords: Sentiment Analysis, Machine Learning, Debate Visualisation, Debate Analysis & Information Retrieval. ii Contents Abstract i Contents vi List of Figures xi List of Tables xiii List of Algorithms xiv Dedication xv Acknowledgement xvi 1 Introduction1 1.1 Overview...................................1 1.1.1 Political sentiment mining......................2 1.1.2 UK House of Commons debates...................3 1.2 Motivation..................................4 1.3 Research Objectives.............................4 1.4 Research Methodology............................5 1.5 Research Contributions...........................6 1.6 Thesis Structure...............................7 1.7 Published Work................................ 11 1.8 Summary................................... 12 2 Previous Work 13 2.1 Introduction.................................. 13 2.2 The classifier-based approach to sentiment extraction........... 14 2.3 The lexicon-based approach to sentiment extraction........... 17 2.3.1 Generic lexicon-based sentiment mining.............. 17 iii 2.3.2 Domain specific lexicon-based sentiment mining.......... 18 2.4 Related work on visualising the debate structure............. 19 2.5 Related work on graph networks analysis for political sentiment mining 27 2.6 Related work on sentiment analysis in the political domain....... 31 2.7 Summary................................... 34 3 The UK House of Commons Political Debates Corpus 35 3.1 The UK Parliamentry System........................ 36 3.2 Political parties................................ 38 3.3 Parliamentary debates............................ 40 3.4 The UK House of Commons political debates datasets.......... 41 3.5 Summary................................... 47 4 Political Sentiment Mining Using Classification 49 4.1 Preprocessing................................. 51 4.2 Classifier Generation............................. 54 4.3 Evaluation................................... 54 4.3.1 Classification using speech data only................ 56 4.3.2 Classification using speech data augmented with \party affilia- tion" and \debate ID" information................. 57 4.3.3 Classification using \party affiliation" and \debate ID" only... 58 4.4 Summary................................... 59 5 Political Sentiment Mining Using Generic Sentiment Lexicons 60 5.1 Part-Of-Speech Tagging (POST)...................... 62 5.2 Preprocessing................................. 63 5.3 Attitude detection using generic sentiment lexicons............ 65 5.4 Results obtained using the generic lexicon-based approach........ 67 5.5 Summary................................... 68 6 Political Sentiment Mining Using Domain Specific Sentiment Lexi- cons 70 6.1 Part-Of-Speech-Tagging (POST)...................... 72 6.2 Preprocessing................................. 72 6.3 Sentiment score and polarity calculation.................. 72 6.4 Lexicon generation.............................. 74 iv 6.5 Evaluation framework for domain specific sentiment lexicons....... 75 6.6 Evaluation results for domain specific lexicons............... 75 6.7 Summary................................... 77 7 Global Comparison Between The Sentiment Mining Approaches 78 7.1 Comparison.................................. 79 7.2 Summary................................... 80 8 The Debate Graph Extraction (DGE) Framework 82 8.1 Preprocessing................................. 84 8.1.1 Preprocessing for the sentiment lexicon-based approach for atti- tude detection and node labelling.................. 84 8.1.2 Preprocessing for classification-based approach for attitude de- tection and node labelling...................... 87 8.2 Attitude detection and node labelling................... 87 8.2.1 Attitude detection and node labelling using the sentiment lexicon based approach............................ 88 8.2.2 Attitude detection and node labelling using the classification- based approach............................ 88 8.3 Link identification and labelling....................... 88 8.3.1 Link identification using semantic similarity............ 89 8.3.2 Link identification using interruptions............... 90 8.3.3 Link identification using relevant interruptions.......... 90 8.3.4 Link labelling............................. 90 8.4 Debate graph generation........................... 91 8.5 Illustrative example............................. 91 8.5.1 Sentiment similarity debate graph................. 91 8.5.2 Interruption graph.......................... 92 8.5.3 Relevant interruption graph..................... 93 8.6 Summary................................... 93 9 Debate Graph Analysis 97 9.1 The debates.................................. 98 9.2 Exemplar questions.............................. 99 9.3 The debate graphs (networks)........................ 101 v 9.3.1 The approval of the invasion in Iraq debate networks....... 103 9.3.2 The military intervention in Syria debate networks........ 106 9.4 Analysis of debate graphs (networks).................... 106 9.4.1 Assortativity............................. 111 9.4.2 Community structures........................ 111 9.4.3 Assortativity: Answering question Q1 ............... 113 9.4.3.1 Disassortativity with respect to party affiliation.... 114 9.4.3.2 Disassortativity with respect to voting profile...... 114 9.4.3.3 Disassortativity in interruption vs. relevant interrup- tion networks........................ 115 9.4.3.4 Disassortativity significance testing........... 115 9.4.4 Community detection: Answering question Q2 .......... 116 9.5 Summary................................... 126 10 Conclusion 127 10.1 Summary................................... 127 10.2 Main Findings................................ 128 10.3 Research Contributions........................... 132 10.4 Research Future Extensions......................... 133 A Parliamentary Stop Words List 136 Bibliography 152 vi List of Figures 2.1 Training and testing a machine learning classifier.............. 16 2.2 Argument structure visualisation produced using the Rationale software tool. Source: Wikimedia Commons..................... 23 2.3 Simple debate graph of the form proposed in this thesis.......... 23 2.4