Stance Detection in Danish Politics Thesis Project, Course Code: KISPECI1SE IT University of Copenhagen by Rasmus Lehmann (
[email protected]) supervised by Leon Derczynski June 3, 2019 Contents Abstract v List of figures vi List of tables vii 1 Introduction 1 1.1 Motivation . .1 1.2 Research problem . .1 1.2.1 Dataset creation . .2 1.2.2 Stance detection task . .2 1.3 Approach . .2 1.3.1 Deep Learning for stance detection . .2 1.4 Organization of thesis . .3 2 Background 4 2.1 Data driven approaches to politics . .4 2.2 Natural Language Processing . .5 2.2.1 Feature engineering within NLP . .5 2.2.2 Stance detection . .6 2.2.3 NLP for Danish . .7 3 Dataset 8 3.1 Political quotes and their stances . .8 3.2 Data sources . .9 3.2.1 Determining media outlets to be included . .9 3.2.2 Ritzau as objective data source . 13 3.3 Delimitation of search space . 13 3.3.1 Choice of stance topic . 13 i 3.3.2 Specification of the topic of immigration policy . 15 3.3.3 Choice of politicians . 15 3.4 Data gathering and parsing . 17 3.4.1 Parsing PDF data . 17 3.4.2 Identifying quotees . 18 3.5 Data cleaning . 19 3.5.1 Creating automated cleaning procedures . 19 3.5.2 Identifying false positives . 20 3.6 Data labelling . 21 3.6.1 Determining quote subtopic . 21 3.6.2 Determining quote stance . 22 3.7 The final dataset . 23 3.7.1 Assessing representativity in the dataset .