Investigating Topic Modeling Techniques for Historical Feature Location

Total Page:16

File Type:pdf, Size:1020Kb

Investigating Topic Modeling Techniques for Historical Feature Location Investigating topic modeling techniques for historical feature location. Lukas Schulte Faculty of Health, Science and Technology Master thesis in Computer Science Second Cycle, 30 hp (ECTS) Dr. Sebastian Herold, University of Karlstad Dr. Muhammad Ovais Ahmad Karlstad, June 28th, 2021 I Abstract I Abstract Software maintenance and the understanding of where in the source code features are imple- mented are two strongly coupled tasks that make up a large portion of the effort spent on de- veloping applications. The concept of feature location investigated in this thesis can serve as a supporting factor in those tasks as it facilitates the automation of otherwise manual searches for source code artifacts. Challenges in this subject area include the aggregation and composition of a training corpus from historical codebase data for models as well as the integration and optimization of qualified topic modeling techniques. Building up on previous research, this thesis provides a comparison of two different techniques and introduces a toolkit that can be used to reproduce and extend on the results discussed. Specifically, in this thesis a changeset- based approach to feature location is pursued and applied to a large open-source Java project. The project is used to optimize and evaluate the performance of Latent Dirichlet Allocation models and Pachinko Allocation models, as well as to compare the accuracy of the two models with each other. As discussed at the end of the thesis, the results do not indicate a clear favorite between the models. Instead, the outcome of the comparison depends on the metric and view- point from which it is assessed. Keywords feature location, topic modeling, changesets, latent dirichlet distribution, pachinko allocation, mining software repositories, source code comprehension Investigating topic modeling techniques for historical feature location. Page I II Acknowledgments II Acknowledgments First, I would like to acknowledge the work that the responsible instances at Karlstad University and the University of Applied Sciences Osnabrück have put into their partnership within the ERASMUS program, which made my thesis possible. Further, I would like to give special thanks to the supervisor of my thesis, Dr. Sebastian Herold, for his support and help. In the same way, I would like to thank Dr. Muhammad Ovais Ahmad for taking the role of examiner for my work. Finally, I would like to express my gratitude to my family and friends who provided support and distraction during the time I worked on this thesis. Investigating topic modeling techniques for historical feature location. Page I III Table of Contents III Table of Contents 1 INTRODUCTION.............................................................................................................................................. 1 1.1 BACKGROUND............................................................................................................................................ 1 1.2 PROBLEM DESCRIPTION ............................................................................................................................. 1 1.3 THESIS GOAL ............................................................................................................................................. 2 1.4 THESIS OBJECTIVE ..................................................................................................................................... 2 1.5 ETHICS AND SUSTAINABILITY .................................................................................................................... 2 1.6 METHODOLOGY ......................................................................................................................................... 3 1.7 STAKEHOLDERS ......................................................................................................................................... 4 1.8 DELIMITATIONS ......................................................................................................................................... 4 1.9 OUTLINE .................................................................................................................................................... 4 2 BACKGROUND AND RELATED WORK ........................................................................................................... 6 2.1 FEATURE LOCATION .................................................................................................................................. 6 2.1.1 Definition and Taxonomy ................................................................................................................ 6 2.1.2 Tools for Feature Location .............................................................................................................. 7 2.1.3 Datasets for Benchmarking ............................................................................................................. 8 2.2 TEXT MINING ............................................................................................................................................. 8 2.3 TOPIC MODELING ...................................................................................................................................... 9 2.4 DIRICHLET DISTRIBUTIONS ...................................................................................................................... 11 2.5 LATENT DIRICHLET ALLOCATION AND PACHINKO ALLOCATION ............................................................. 13 2.6 RELATED WORK ...................................................................................................................................... 15 3 METHODS .................................................................................................................................................... 18 3.1 OVERVIEW ............................................................................................................................................... 18 3.2 DATA PREPARATION ................................................................................................................................ 19 3.2.1 Data Mining ................................................................................................................................... 19 3.2.2 Text Cleaning ................................................................................................................................. 19 3.3 TOPIC MODELING .................................................................................................................................... 20 3.3.1 Topic Model Parameters ............................................................................................................... 20 3.3.2 Hyperparameter Tuning ................................................................................................................ 21 3.4 FEATURE LOCATION ................................................................................................................................ 21 3.4.1 Search Query ................................................................................................................................. 22 3.4.2 Performance Metrics ..................................................................................................................... 22 3.4.3 Goldset-based Validation .............................................................................................................. 23 4 IMPLEMENTATION ....................................................................................................................................... 25 4.1 GOALS AND CONSTRAINTS ...................................................................................................................... 25 4.2 GENERAL SOLUTION STRATEGY .............................................................................................................. 26 4.3 IMPORTER APPLICATION .......................................................................................................................... 29 4.3.1 Context, Scope, and Solution Strategy ........................................................................................... 29 4.3.2 Building Block View ...................................................................................................................... 32 4.4 FEATURE LOCATION ................................................................................................................................ 39 4.4.1 Context, Scope, and Solution Strategy ........................................................................................... 39 4.4.2 Building Block View ...................................................................................................................... 42 5 EVALUATION ............................................................................................................................................... 47 5.1 SETUP ...................................................................................................................................................... 47 Investigating topic modeling techniques for historical feature location. Page II III Table of Contents 5.1.1 General Data Structures ................................................................................................................ 47 5.1.2 Target System ................................................................................................................................ 50
Recommended publications
  • Multi-View Learning for Hierarchical Topic Detection on Corpus of Documents
    Multi-view learning for hierarchical topic detection on corpus of documents Juan Camilo Calero Espinosa Universidad Nacional de Colombia Facultad de Ingenieria, Departamento de Ingenieria de Sistemas e Industrial. Bogot´a,Colombia 2021 Multi-view learning for hierarchical topic detection on corpus of documents Juan Camilo Calero Espinosa Tesis presentada como requisito parcial para optar al t´ıtulode: Magister en Ingenier´ıade Sistemas y Computaciøn´ Director: Ph.D. Luis Fernando Ni~noV. L´ıneade Investigaci´on: Procesamiento de lenguaje natural Grupo de Investigaci´on: Laboratorio de investigaci´onen sistemas inteligentes - LISI Universidad Nacional de Colombia Facultad de Ingenieria, Departamento de Ingenieria en Sistemas e Industrial. Bogot´a,Colombia 2021 To my parents Maria Helena and Jaime. To my aunts Patricia and Rosa. To my grandmothers Lilia and Santos. Acknowledgements To Camilo Alberto Pino, as the original thesis idea was his, and for his invaluable teaching of multi-view learning. To my thesis advisor, Luis Fernando Ni~no,and the Laboratorio de investigaci´onen sistemas inteligentes - LISI, for constantly allowing me to learn new knowl- edge, and for their valuable recommendations on the thesis. V Abstract Topic detection on a large corpus of documents requires a considerable amount of com- putational resources, and the number of topics increases the burden as well. However, even a large number of topics might not be as specific as desired, or simply the topic quality starts decreasing after a certain number. To overcome these obstacles, we propose a new method- ology for hierarchical topic detection, which uses multi-view clustering to link different topic models extracted from document named entities and part of speech tags.
    [Show full text]
  • Generating Commit Messages from Git Diffs
    Generating Commit Messages from Git Diffs Sven van Hal Mathieu Post Kasper Wendel Delft University of Technology Delft University of Technology Delft University of Technology [email protected] [email protected] [email protected] ABSTRACT be exploited by machine learning. The hypothesis is that methods Commit messages aid developers in their understanding of a con- based on machine learning, given enough training data, are able tinuously evolving codebase. However, developers not always doc- to extract more contextual information and latent factors about ument code changes properly. Automatically generating commit the why of a change. Furthermore, Allamanis et al. [1] state that messages would relieve this burden on developers. source code is “a form of human communication [and] has similar Recently, a number of different works have demonstrated the statistical properties to natural language corpora”. Following the feasibility of using methods from neural machine translation to success of (deep) machine learning in the field of natural language generate commit messages. This work aims to reproduce a promi- processing, neural networks seem promising for automated commit nent research paper in this field, as well as attempt to improve upon message generation as well. their results by proposing a novel preprocessing technique. Jiang et al. [12] have demonstrated that generating commit mes- A reproduction of the reference neural machine translation sages with neural networks is feasible. This work aims to reproduce model was able to achieve slightly better results on the same dataset. the results from [12] on the same and a different dataset. Addition- When applying more rigorous preprocessing, however, the per- ally, efforts are made to improve upon these results by applying a formance dropped significantly.
    [Show full text]
  • Nonstationary Latent Dirichlet Allocation for Speech Recognition
    Nonstationary Latent Dirichlet Allocation for Speech Recognition Chuang-Hua Chueh and Jen-Tzung Chien Department of Computer Science and Information Engineering National Cheng Kung University, Tainan, Taiwan, ROC {chchueh,chien}@chien.csie.ncku.edu.tw model was presented for document representation with time Abstract evolution. Current parameters served as the prior information Latent Dirichlet allocation (LDA) has been successful for to estimate new parameters for next time period. Furthermore, document modeling. LDA extracts the latent topics across a continuous time dynamic topic model [12] was developed by documents. Words in a document are generated by the same incorporating a Brownian motion in the dynamic topic model, topic distribution. However, in real-world documents, the and so the continuous-time topic evolution was fulfilled. usage of words in different paragraphs is varied and Sparse variational inference was used to reduce the accompanied with different writing styles. This study extends computational complexity. In [6][7], LDA was merged with the LDA and copes with the variations of topic information the hidden Markov model (HMM) as the HMM-LDA model within a document. We build the nonstationary LDA (NLDA) where the Markov states were used to discover the syntactic by incorporating a Markov chain which is used to detect the structure of a document. Each word was generated either from stylistic segments in a document. Each segment corresponds to the topic or the syntactic state. The syntactic words and a particular style in composition of a document. This NLDA content words were modeled separately. can exploit the topic information between documents as well This study also considers the superiority of HMM in as the word variations within a document.
    [Show full text]
  • Large-Scale Hierarchical Topic Models
    Large-Scale Hierarchical Topic Models Jay Pujara Peter Skomoroch Department of Computer Science LinkedIn Corporation University of Maryland 2029 Stierlin Ct. College Park, MD 20742 Mountain View, CA 94043 [email protected] [email protected] Abstract In the past decade, a number of advances in topic modeling have produced sophis- ticated models that are capable of generating hierarchies of topics. One challenge for these models is scalability: they are incapable of working at the massive scale of millions of documents and hundreds of thousands of terms. We address this challenge with a technique that learns a hierarchy of topics by iteratively apply- ing topic models and processing subtrees of the hierarchy in parallel. This ap- proach has a number of scalability advantages compared to existing techniques, and shows promising results in experiments assessing runtime and human evalu- ations of quality. We detail extensions to this approach that may further improve hierarchical topic modeling for large-scale applications. 1 Motivation With massive datasets and corresponding computational resources readily available, the Big Data movement aims to provide deep insights into real-world data. Realizing this goal can require new approaches to well-studied problems. Complex models that, for example, incorporate many de- pendencies between parameters have alluring results for small datasets and single machines but are difficult to adapt to the Big Data paradigm. Topic models are an interesting example of this phenomenon. In the last decade, a number of sophis- ticated techniques have been developed to model collections of text, from Latent Dirichlet Allocation (LDA)[1] through extensions using statistical machinery such as the nested Chinese Restaurant Pro- cess [2][3] and Pachinko Allocation[4].
    [Show full text]
  • Sistemas De Control De Versiones De Última Generación (DCA)
    Tema 10 - Sistemas de Control de Versiones de última generación (DCA) Antonio-M. Corbí Bellot Tema 10 - Sistemas de Control de Versiones de última generación (DCA) II HISTORIAL DE REVISIONES NÚMERO FECHA MODIFICACIONES NOMBRE Tema 10 - Sistemas de Control de Versiones de última generación (DCA) III Índice 1. ¿Qué es un Sistema de Control de Versiones (SCV)?1 2. ¿En qué consiste el control de versiones?1 3. Conceptos generales de los SCV (I) 1 4. Conceptos generales de los SCV (II) 2 5. Tipos de SCV. 2 6. Centralizados vs. Distribuidos en 90sg 2 7. ¿Qué opciones tenemos disponibles? 2 8. ¿Qué podemos hacer con un SCV? 3 9. Tipos de ramas 3 10. Formas de integrar una rama en otra (I)3 11. Formas de integrar una rama en otra (II)4 12. SCV’s con los que trabajaremos 4 13. Git (I) 5 14. Git (II) 5 15. Git (III) 5 16. Git (IV) 6 17. Git (V) 6 18. Git (VI) 7 19. Git (VII) 7 20. Git (VIII) 7 21. Git (IX) 8 22. Git (X) 8 23. Git (XI) 9 Tema 10 - Sistemas de Control de Versiones de última generación (DCA) IV 24. Git (XII) 9 25. Git (XIII) 9 26. Git (XIV) 10 27. Git (XV) 10 28. Git (XVI) 11 29. Git (XVII) 11 30. Git (XVIII) 12 31. Git (XIX) 12 32. Git. Vídeos relacionados 12 33. Mercurial (I) 12 34. Mercurial (II) 12 35. Mercurial (III) 13 36. Mercurial (IV) 13 37. Mercurial (V) 13 38. Mercurial (VI) 14 39.
    [Show full text]
  • Journal of Applied Sciences Research
    Copyright © 2015, American-Eurasian Network for Scientific Information publisher JOURNAL OF APPLIED SCIENCES RESEARCH ISSN: 1819-544X EISSN: 1816-157X JOURNAL home page: http://www.aensiweb.com/JASR 2015 October; 11(19): pages 50-55. Published Online 10 November 2015. Research Article A Survey on Correlation between the Topic and Documents Based on the Pachinko Allocation Model 1Dr.C.Sundar and 2V.Sujitha 1Associate Professor, Department of Computer Science and Engineering, Christian College of Engineering and Technology, Dindigul, Tamilnadu-624619, India. 2PG Scholar, Department of Computer Science and Engineering, Christian College of Engineering and Technology, Dindigul, Tamilnadu-624619, India. Received: 23 September 2015; Accepted: 25 October 2015 © 2015 AENSI PUBLISHER All rights reserved ABSTRACT Latent Dirichlet allocation (LDA) and other related topic models are increasingly popular tools for summarization and manifold discovery in discrete data. In existing system, a novel information filtering model, Maximum matched Pattern-based Topic Model (MPBTM), is used.The patterns are generated from the words in the word-based topic representations of a traditional topic model such as the LDA model. This ensures that the patterns can well represent the topics because these patterns are comprised of the words which are extracted by LDA based on sample occurrence of the words in the documents. The Maximum matched pat-terns, which are the largest patterns in each equivalence class that exist in the received documents, are used to calculate the relevance words to represent topics. However, LDA does not capture correlations between topics and these not find the hidden topics in the document. To deal with the above problem the pachinko allocation model (PAM) is proposed.
    [Show full text]
  • Incorporating Domain Knowledge in Latent Topic Models
    INCORPORATING DOMAIN KNOWLEDGE IN LATENT TOPIC MODELS by David Michael Andrzejewski A dissertation submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy (Computer Sciences) at the UNIVERSITY OF WISCONSIN–MADISON 2010 c Copyright by David Michael Andrzejewski 2010 All Rights Reserved i For my parents and my Cho. ii ACKNOWLEDGMENTS Obviously none of this would have been possible without the diligent advising of Mark Craven and Xiaojin (Jerry) Zhu. Taking a bioinformatics class from Mark as an undergraduate initially got me excited about the power of statistical machine learning to extract insights from seemingly impenetrable datasets. Jerry’s enthusiasm for research and relentless pursuit of excellence were a great inspiration for me. On countless occasions, Mark and Jerry have selflessly donated their time and effort to help me develop better research skills and to improve the quality of my work. I would have been lucky to have even a single advisor as excellent as either Mark or Jerry; I have been extremely fortunate to have them both as co-advisors. My committee members have also been indispensable. Jude Shavlik has always brought an emphasis on clear communication and solid experimental technique which will hopefully stay with me for the rest of my career. Michael Newton helped me understand the modeling issues in this research from a statistical perspective. Working with prelim committee member Ben Liblit gave me the exciting opportunity to apply machine learning to a very challenging problem in the debugging work presented in Chapter 4. I also learned a lot about how other computer scientists think from meetings with Ben.
    [Show full text]
  • Extração De Informação Semântica De Conteúdo Da Web 2.0
    Mestrado em Engenharia Informática Dissertação Relatório Final Extração de Informação Semântica de Conteúdo da Web 2.0 Ana Rita Bento Carvalheira [email protected] Orientador: Paulo Jorge de Sousa Gomes [email protected] Data: 1 de Julho de 2014 Agradecimentos Gostaria de começar por agradecer ao Professor Paulo Gomes pelo profissionalismo e apoio incondicional, pela sincera amizade e a total disponibilidade demonstrada ao longo do ano. O seu apoio, não só foi determinante para a elaboração desta tese, como me motivou sempre a querer saber mais e ter vontade de fazer melhor. À minha Avó Maria e Avô Francisco, por sempre estarem presentes quando eu precisei, pelo carinho e afeto, bem como todo o esforço que fizeram para que nunca me faltasse nada. Espero um dia poder retribuir de alguma forma tudo aquilo que fizeram por mim. Aos meus Pais, pelos ensinamentos e valores transmitidos, por tudo o que me proporcionaram e por toda a disponibilidade e dedicação que, constantemente, me oferecem. Tudo aquilo que sou, devo-o a vocês. Ao David agradeço toda a ajuda e compreensão ao longo do ano, todo o carinho e apoio demonstrado em todas as minhas decisões e por sempre me ter encorajado a seguir os meus sonhos. Admiro-te sobretudo pela tua competência e humildade, pela transmissão de força e confiança que me dás em todos os momentos. Resumo A massiva proliferação de blogues e redes sociais fez com que o conteúdo gerado pelos utilizadores, presente em plataformas como o Twitter ou Facebook, se tornasse bastante valioso pela quantidade de informação passível de ser extraída e explorada.
    [Show full text]
  • Brno University of Technology Bulk Operation
    BRNO UNIVERSITY OF TECHNOLOGY VYSOKÉ UČENÍ TECHNICKÉ V BRNĚ FACULTY OF INFORMATION TECHNOLOGY FAKULTA INFORMAČNÍCH TECHNOLOGIÍ DEPARTMENT OF INFORMATION SYSTEMS ÚSTAV INFORMAČNÍCH SYSTÉMŮ BULK OPERATION ORCHESTRATION IN MULTIREPO CI/CD ENVIRONMENTS HROMADNÁ ORCHESTRÁCIA V MULTIREPO CI/CD PROSTREDIACH MASTER’S THESIS DIPLOMOVÁ PRÁCE AUTHOR Bc. JAKUB VÍŠEK AUTOR PRÁCE SUPERVISOR Ing. MICHAL KOUTENSKÝ VEDOUCÍ PRÁCE BRNO 2021 Brno University of Technology Faculty of Information Technology Department of Information Systems (DIFS) Academic year 2020/2021 Master's Thesis Specification Student: Víšek Jakub, Bc. Programme: Information Technology and Artificial Intelligence Specializatio Computer Networks n: Title: Bulk Operation Orchestration in Multirepo CI/CD Environments Category: Networking Assignment: 1. Familiarize yourself with the principle of CI/CD and existing solutions. 2. Familiarize yourself with the multirepo approach to software development. 3. Analyze the shortcomings of existing CI/CD solutions in the context of multirepo development with regard to user comfort. Focus on scheduling and deploying bulk operations on multiple interdependent repositories as part of a single logical branching pipeline. 4. Propose and design a solution to these shortcomings. 5. Implement said solution. 6. Test and evaluate the solution's functionality in a production environment. Recommended literature: Humble, Jez, and David Farley. Continuous delivery. Upper Saddle River, NJ: Addison- Wesley, 2011. Forsgren, Nicole, Jez Humble, and Gene Kim. Accelerate : building and scaling high performing technology organizations. Portland, OR: IT Revolution Press, 2018. Nicolas Brousse. 2019. The issue of monorepo and polyrepo in large enterprises. In Proceedings of the Conference Companion of the 3rd International Conference on Art, Science, and Engineering of Programming (Programming '19). Association for Computing Machinery, New York, NY, USA, Article 2, 1-4.
    [Show full text]
  • An Unsupervised Topic Segmentation Model Incorporating Word Order
    An Unsupervised Topic Segmentation Model Incorporating Word Order Shoaib Jameel and Wai Lam The Chinese University of Hong Kong One line summary of the work We will see how maintaining the document structure such as paragraphs, sentences, and the word order helps improve the performance of a topic model. Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 1 Outline Motivation Related Work I Probabilistic Unigram Topic Model (LDA) I Probabilistic N-gram Topic Models I Topic Segmentation Models Overview of our model I Our N-gram Topic Segmentation model (NTSeg) Text Mining Experiments of NTSeg I Word-Topic and Segment-Topic Correlation Graph I Topic Segmentation Experiment I Document Classification Experiment I Document Likelihood Experiment Conclusions and Future Directions Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 2 Motivation Many works in the topic modeling literature assume exchangeability among the words. As a result we see many ambiguous words in topics. For example, consider few topics obtained from the NIPS collection using the Latent Dirichlet Allocation (LDA) model: Example Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 architecture order connectionist potential prior recurrent first role membrane bayesian network second binding current data module analysis structures synaptic evidence modules small distributed dendritic experts The problem with the LDA model Words in topics are not insightful. Shoaib Jameel and Wai Lam SIGIR-2013, Dublin, Ireland 3 Motivation Many works in the topic modeling literature assume exchangeability among the words. As a result we see many ambiguous words in topics. For example, consider few topics obtained from the NIPS collection using the Latent Dirichlet Allocation (LDA) model: Example Topic 1 Topic 2 Topic 3 Topic 4 Topic 5 architecture order connectionist potential prior recurrent first role membrane bayesian network second binding current data module analysis structures synaptic evidence modules small distributed dendritic experts The problem with the LDA model Words in topics are not insightful.
    [Show full text]
  • Making the Most of Git and Github
    Contributing to Erlang Making the Most of Git and GitHub Tom Preston-Werner Cofounder/CTO GitHub @mojombo Quick Git Overview Git is distributed Tom PJ Chris Git is snapshot-based The Codebase 1 Snapshots have zero or more parents 1 2 Branching in Git is easy 1 2 3 4 Merging in Git is easy too 1 2 3 5 4 A branch is just a pointer to a snapshot master 1 2 3 5 4 Branches move as new snapshots are taken master 1 2 3 5 6 4 Tags are like branches that never move master 1 2 3 5 6 4 v1.0.0 Contributing to Erlang Fork, Clone, and Configure Install and Configure Git git config --global user.name "Tom Preston-Werner" git config --global user.email [email protected] Sign up on GitHub Fork github.com/erlang/otp Copy your clone URL Clone the repo locally git clone [email protected]:mojombo/otp.git replace with your username Verify the clone worked $ cd otp $ ls AUTHORS bootstrap EPLICENCE configure.in INSTALL-CROSS.md erl-build-tool-vars.sh INSTALL-WIN32.md erts INSTALL.md lib Makefile.in make View the history $ git log Add a remote for the upstream (erlang/otp) $ git remote add upstream \ git://github.com/erlang/otp.git Repositories GitHub GitHub erlang/otp mojombo/otp upstream origin Local otp Create a branch List all branches $ git branch * dev Create a branch off of “dev” and switch to it $ git checkout -b mybranch Both branches now point to the same commit dev mybranch Make Changes Each commit should: Contain a single logical change Compile cleanly Not contain any cruft Have a good commit message Review your changes $ git status $ git diff Commit your
    [Show full text]
  • A New Method and Application for Studying Political Text in Multiple Languages
    The Pennsylvania State University The Graduate School THE RADICAL RIGHT IN PARLIAMENT: A NEW METHOD AND APPLICATION FOR STUDYING POLITICAL TEXT IN MULTIPLE LANGUAGES A Dissertation in Political Science by Mitchell Goist Submitted in Partial Fulfillment of the Requirements for the Degree of Doctor of Philosophy May 2020 ii The dissertation of Mitchell Goist was reviewed and approved* by the following: Burt L. Monroe Liberal Arts Professor of Political Science Dissertation Advisor Chair of Committee Bruce Desmarais Associate Professor of Political Science Matt Golder Professor of Political Science Sarah Rajtmajer Assistant Professor of Information Science and Tecnology Glenn Palmer Professor of Political Science and Director of Graduate Studies iii ABSTRACT Since a new wave of radical right support in the early 1980s, scholars have sought to understand the motivations and programmatic appeals of far-right parties. However, due to their small size and dearth of data, existing methodological approaches were did not allow the direct study of these parties’ behavior in parliament. Using a collection of parliamentary speeches from the United Kingdom, Germany, Spain, Italy, the Netherlands, Finland, Sweden, and the Czech Re- public, Chapter 1 of this dissertation addresses this problem by developing a new model for the study of political text in multiple languages. Using this new method allows the construction of a shared issue space where each party is embedded regardless of the language spoken in the speech or the country of origin. Chapter 2 builds on this new method by explicating the ideolog- ical appeals of radical right parties. It finds that in some instances radical right parties behave similarly to mainstream, center-right parties, but distinguish themselves by a focus on individual crime and an emphasis on negative rhetorical frames.
    [Show full text]