Semantic Network Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Semantic Network Analysis Techniques for Extracting, Representing, and Querying Media Content Wouter van Atteveldt Reading committee: prof.dr. Enrico Motta dr. Gertjan van Noord prof.dr. Guus Schreiber prof.dr. Klaus Schönbach prof.dr. Philip A. Schrodt c Wouter van Atteveldt 2008 You are allowed to copy and distribute this book in whole or in part andtomakeaderivedworkunderthetermsoftheCreativeCom- mons Attribution-Noncommercial 3.0 Netherlands License. (http://creativecommons.org/licenses/by-nc/3.0/nl/) An electronic version of this book is available from http://vanatteveldt.com/dissertation. This book can be purchased from http://www.amazon.com/. SIKS Dissertation Series No. 2008-30 The research reported in this thesis has been carried out under the auspices of SIKS, the Dutch Research School for Information and Knowledge Systems. Published by BookSurge Publishers, Charleston SC ISBN: 1-4392-1136-1 VRIJE UNIVERSITEIT Semantic Network Analysis Techniques for Extracting, Representing, and Querying Media Content ACADEMISCH PROEFSCHRIFT ter verkrijging van de graad Doctor aan de Vrije Universiteit Amsterdam, op gezag van de rector magnificus prof.dr. L.M. Bouter, in het openbaar te verdedigen ten overstaan van de promotiecommissie van de faculteit der Exacte Wetenschappen op vrijdag 14 november 2008 om 13.45 uur in de aula van de universiteit, De Boelelaan 1105 door Wouter Hendrik van Atteveldt geboren te Texel promotoren: prof.dr. F.A.H. van Harmelen prof.dr. J. Kleinnijenhuis copromotor: dr. K.S. Schlobach Preface Undoubtedly, this thesis contains many inaccuracies and omissions. One of the most glaring is the single author name on the cover: Although I doubt any Ph.D. thesis is really the sole work of its defender, especially an interdisciplinary work such as this thesis is the product of many hours of talking and collaborating. I started this thesis with little knowledge of both Communication Science and Knowledge Representation, and I am very grateful for the patient explanations and pointers from my supervi- sors and other colleagues. I think I’ve been very lucky with my supervisors: I will not quickly forget the 5 AM e-mails from Jan when I was in Turkey working on one of our papers last year; or the hours spent in front of the whiteboard with Stefan while working on modal logic; or the innocent sounding ques- tions Frank always asked to keep me on track both substantively and in terms of planning. Between the three of them, I think I’ve received in- credible support both on the content of my work and on the procedure and planning needed to get me here. I especially appreciate the way how it was always stressed that, after everybody had their say, it was my thesis, and my responsibility to decide what I wanted to study, how I wanted to write it down, and when I wanted to finish it. This gave me the confidence to write down and defend this thesis with my name on it, even though the work is neither finished, nor perfect, nor solely mine. According to Frank, the only thing worse than having no desk is hav- ing two desks, but I am very happy that I was a member of both the Communication Science and Knowledge Representation groups. The KR group (both on the AI and BI side of the invisible line) was and is a dy- namic group with a lot of room for discussion and learning. Apart from v vi Preface my supervisors, I especially appreciate the long talks, about work or oth- erwise, with Mark, Laura, Michel, and Willem. On the other side of the tram line, I was very lucky to start my PhD just after Rens and Lonneke started theirs, and I fondly remember the hours we spent looking at data and models together. Although Dirk wisely turned down my request to become co-promotor he was always there to talk about work but espe- cially about non-work. He taught me to think about why I do the things I do, and to concentrate on doing the things that really matter. Anita showed a surprising side to her character when she was stuck in Ams- terdam during a storm and we drank champagne for her birthday and played Catan until 2 AM. I am also very happy we stole back Janet from the UvA for the Contested Democracy project; we can always use more Klaverjas players to join us to the ICA conferences. I really look forward to continuing my collaboration and friendship with all these colleagues. I would also like to use the opportunity to thank my professors from the University College Utrecht and University of Edinburgh, especially Maarten Prak, who showed me how university education should be done, Mark Steedman and Greame Ritchie, who got me hooked on natural lan- guages, and of course Miles Osborne, Claire Grover, Bonnie Webber and all the others. The ACLU Lawyer Clarence Darrow once said that the first half of our life is ruined by our parents, and the second half by our children. In contrast, I feel that my childhood has both been very pleasant and helped me to value thinking, knowledge, and discussing. I have always felt my parents to support me in whatever I did, and I believe that I am very lucky to have had such a wonderful family. Bas, I sometimes miss the early days of 2AT, exploring the game called running a company, and I am glad that things are going so well with the company. I am also thrilled that we finally got the catamaran working, and I hope we will have a very windy summer next year. Nienke, I am looking forward to our first publication together, using text boxes and arrows between them to explain the human condition, and I look forward to more roof terrace parties if you still find Amsterdam liveable after your stay in New York. If you compare the above list with my list of co-authors, one name is con- spicuously lacking. I would probably end up on the couch if I would list Nel among my colleagues and collaborators, even though we did spend a lot of time working, discussing, and writing together. However, that is completely insignificant compared to her contribution to my real life. Since meeting her I’ve learned invaluable lessons on people, emotions, and insecurity, and I feel that I’ve become a much better person over the last four years, or at least a better dressed person. Contents 1 Introduction 1 1.1Introduction........................... 2 1.2ResearchQuestion....................... 6 1.3DomainandData........................ 7 1.4Contributions.......................... 8 1.5ThesisOutline.......................... 9 I Background 11 2 Content Analysis 13 2.1Introduction........................... 14 2.2ContentAnalysisinCommunicationScience........ 18 2.3SemanticNetworkAnalysis.................. 23 2.4TheNETmethod........................ 29 2.5 The 2006 Dutch parliamentary elections . ....... 33 2.6ComputerContentAnalysis.................. 35 2.7Conclusion............................ 39 3 Natural Language Processing 41 3.1Introduction........................... 42 3.2ThePreprocessingPipeline.................. 43 3.3Thesauri............................. 48 3.4Evaluationmetrics....................... 49 3.5Conclusion............................ 50 vii viii Table of Contents 4 Knowledge Representation and the Semantic Web 51 4.1Introduction........................... 52 4.2TheSemanticWeb........................ 54 4.3 The Semantic Web as a Knowledge Representation frame- work............................... 60 4.4Conclusion............................ 61 II Extracting Semantic Networks 63 5 Extracting Associative Frames using Co-occurrence 65 5.1Introduction........................... 66 5.2FramesasAssociations..................... 68 5.3 A Probabilistic Model of Associative Framing . ....... 71 5.4UseCase:TerrorismintheNews............... 75 5.5Conclusion............................ 88 6 Using Syntax to find Semantic Source, Subject, Object 91 6.1Introduction........................... 92 6.2 Determining Semantic Roles using Syntax Patterns . 93 6.3DeterminingValidity...................... 99 6.4Results..............................105 6.5ErrorComponentsAnalysis..................111 6.6Discussion/Conclusion....................114 7 Determining valence using Sentiment Analysis 117 7.1Introduction...........................118 7.2PolarityinPoliticalCommunication.............120 7.3Task:ClassifyingNETrelations................121 7.4SentimentAnalysis.......................122 7.5Method..............................124 7.6Results..............................130 7.7Validation............................135 7.8Conclusion............................141 III Reasoning with Media Data 143 8 Using RDF to store Semantic Network Data 145 8.1Introduction...........................146 8.2 Representing Media Data: Statements about Statements . 147 8.3RepresentingPoliticalBackgroundKnowledge.......152 8.4Apoliticalontology.......................156 8.5UsingOWLforaricherontology...............162 Table of Contents ix 8.6Conclusions...........................164 9 Querying, Analysing, and Visualising Semantic Network Data 165 9.1Introduction...........................166 9.2QueryingtheSemanticNetwork...............167 9.3SearchingtheNews.......................168 9.4Usingthesystem:Partiesinthenews............171 9.5Conclusion............................175 IV System Description 177 10 The AmCAT Infrastructure 179 10.1Introduction...........................180 10.2TheAmCATNavigatorandDatabase............181 10.3TheiNetCodingProgram...................192 10.4Conclusion............................202 11 Discussion and Conclusion 203 Bibliography 215 Samenvatting (Dutch Summary) 231 xTableofContents CHAPTER 1 Introduction ‘No-campaign Wilders is a circus