Stylometric Analysis of Parliamentary Speeches: Gender Dimension

Total Page:16

File Type:pdf, Size:1020Kb

Stylometric Analysis of Parliamentary Speeches: Gender Dimension Stylometric Analysis of Parliamentary Speeches: Gender Dimension Justina Mandravickaite˙ Tomas Krilaviciusˇ Vilnius University, Lithuania Vytautas Magnus University, Lithuania Baltic Institute of Advanced Baltic Inistitute of Advanced Technology, Lithuania Technology, Lithuania [email protected] [email protected] Abstract to capture the differences in the language due to the gender (Newman et al., 2008; Herring and Relation between gender and language has Martinson, 2004). Some results show that gen- been studied by many authors, however, der differences in language depend on the con- there is still some uncertainty left regard- text, e.g., people assume male language in a for- ing gender influence on language usage mal setting and female in an informal environ- in the professional environment. Often, ment (Pennebaker, 2011). We investigate gender the studied data sets are too small or texts impact to the language use in a professional set- of individual authors are too short in or- ting, i.e., transcripts of speeches of the Lithua- der to capture differences of language us- nian Parliament debates. We study language wrt age wrt gender successfully. This study style, i.e., male and female style of the language draws from a larger corpus of speeches usage by applying computational stylistics or sty- transcripts of the Lithuanian Parliament lometry. Stylometry is based on the two hypothe- (1990–2013) to explore language differ- ses: (1) human stylome hypothesis, i.e., each in- ences of political debates by gender via dividual has a unique style (Van Halteren et al., stylometric analysis. Experimental set 2005); (2) unique style of individual can be mea- up consists of stylistic features that indi- sured (Stamatatos, 2009), stylometry allows gain- cate lexical style and do not require exter- ing meta-knowledge (Daelemans, 2013), i.e., what nal linguistic tools, namely the most fre- can be learned from the text about the author quent words, in combination with unsu- - gender (Luyckx et al., 2006; Argamon et al., pervised machine learning algorithms. Re- 2003; Cheng et al., 2011; Koppel et al., 2002), sults show that gender differences in the age (Dahllöf, 2012), psychological characteristics language use remain in professional en- (Luyckx and Daelemans, 2008), political affilia- vironment not only in usage of function tion (Dahllöf, 2012), etc. words, preferred linguistic constructions, Like in most studies of gender and language but in the presented topics as well. (Yu, 2014; Herring and Martinson, 2004), bio- logical sex as a criterion for gender was used in 1 Introduction this study. We compare differences of the gen- Gender influence on language usage have been ex- der related language use at the group level (fac- tensively studied (Lakoff, 1973; Holmes, 2006; tion). Lithuanian language allows easy distinction Holmes, 2013; Argamon et al., 2003) without between male and female legislators based on their fully reaching a common agreement. Understand- names in the transcripts.1 ing gender differences in professional environ- We investigate several questions: (1) How well ment would assist in a more balanced atmosphere simple stylistic features distinguish genders of (Herring and Paolillo, 2006; Mullany, 2007), how- members the Lithuanian Parliament? (2) Which ever results on extent of variation depending on differences in language use by female and male context of communication in professional setting Lithuanian Parliament members selected features are inconclusive(Newman et al., 2008). and methods are able to capture? Most studies rely on the relatively small data 1Of course, all information about members of parliament sets, or texts of the individual authors are too short102 is available on-line. Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, pages 102–107, Valencia, Spain, 4 April 2017. c 2017 Association for Computational Linguistics Figure 2: Bootstrap Consensus Tree with Can- berra and 100–10000 MFW. Figure 1: Results with 7000 MFW as features. at the ending. All these features produce a sub- stantial number of inflective forms for one lemma. 2 Data Set Thus in order to avoid data sparseness we did not lemmatize corpus for our experiments. Corpus of parliamentary speeches in the Lithua- To get around of “fingerprint” of individual au- 2 nian Parliament is used. It consists of transcripts thorship as much as possible, all the samples were of parliamentary speeches from March 1990 to concatenated into two large documents based on December 2013, 10727 of female members of Par- the gender, and then were partitioned into 15 parts liament (MPs) and 100181 of male MPs, over- each. Thus for analysis we had 15 samples of par- all 23 908 302 words (2 357 596 of female MPs liamentary speech made by female MPs and an- and 21 550 706 of male; see Table 2 for the de- other 15 samples – made by male MPs. tails). Only speeches of at least 100 words and of MPs with at least 200 of them were included in 3 Stylistic Features and Statistical the corpus (Kapociˇ ut¯ e-Dzikien˙ e˙ and Utka, 2014). Measures It could have diminished number of female MPs speeches included into the corpus and our anal- We use the most frequent words (MFW) (Bur- ysis as well. However, the choice of unsuper- rows, 1992; Hoover, 2007; Eder, 2013b; Rybicki vised learning approach downscales class imbal- and Eder, 2011; Eder and Rybicki, 2013; Eder, ance problem, i.e. significant difference in number 2013a) (usually, they coincide with function words of transcribed parliamentary speeches made by fe- (Hochmann et al., 2010; Sigurd et al., 2004)), as male and male MPs. features, because they are considered to be topic- Lithuanian is a highly inflective language, i.e. neutral and perform well (Juola and Baayen, 2005; nouns have grammatical gender, number and se- Holmes et al., 2001; Burrows, 2002). mantic relations between them are expressed with Stylo package for stylometric analysis using R 7 cases; adjectives have to match nouns in terms (Eder et al., 2014) is used for experiments. of gender, number and case; verbs have 4 tenses Experiments are performed in batches using dif- and particles for each of them, with ending mark- ferent number of MFWs, firstly, using the whole ing its tense, person and number; gender and case corpus, raw frequency list of features is gener- for the particles are also marked morphologically ated, then normalized using z-scores, which mea- sure distance of features frequencies in the corpus 2 Corpus of parliamentary speeches in the Lithuanian Par- in terms of their proximity to the mean (Hoover, liament was created in the project “Automatic Authorship At- Ai µ tribution and Author Profiling for the Lithuanian Language” 2004), where z-scores are defined as z = σ− , (ASTRA) (No. LIT-8-69), 2014 – 2015. 103 where Ai is frequency of a feature, µ is mean fre- MPs by gender No. of samples No. of words No. of unique words Female 10 727 2 357 596 93 611 Male 100 181 21 550 706 268 030 Table 1: Statistics of Corpus of parliamentary speeches in the Lithuanian Parliament. n Ai Bi with Canberra distance δ = | − | (AB) i=1 Ai + Bi where n is a number of most frequent features,| | | | P A and B are documents, Ai and Bi are frequen- cies of a given feature in the documents A and B in the corpus, respectively (Eder et al., 2014). It was reported to be suitable for inflective lan- guages, albeit it is sensitive for rare vocabulary (Eder et al., 2014), e.g., words that occurred only once or twice. The goal is identifying stylistic dissimilarities and mapping positions of the text samples in rela- tion to each other, not classifying female/male leg- islators, hence hierarchical clustering with Ward linkage (it minimizes total variance within-cluster (Everitt et al., 2011)) was chosen. Though it is sensitive to changes in a number of features or methods of grouping (Eder, 2013a; Luyckx et al., 2006), in this study it shows stable results. Ro- Figure 3: Results with 200 MFW (starting at 6800 bustness of clustering results was examined us- MFW). ing bootstrap procedure (Eder, 2013a). It includes extensions of Burrows’s Delta (Argamon, 2008; Eder et al., 2014) and bootstrap consensus trees quency of certain feature in one document, σ is a (Eder, 2013a) as a way to improve reliability of standard deviation. cluster analysis dendrograms. Dissimilarity between the text samples is cal- culated using selected distances (see below), and 4 Experiments distance matrix is generated. Then, hierarchical From 20 to 10 000 most frequent features were clustering is applied to group samples by similar- used for each experiment. We use hierarchical ity (Everitt et al., 2011), and dendrograms are used clustering with Ward linkage and Canberra dis- to visualize the results. tance, and visualize results in dendrograms to map Typically Burrows’s Delta distance is used for positions of the samples in relation to each other. stylometric analysis (Burrows, 2002; Rybicki and We focus on identifying variation in female and Eder, 2011). However, Delta depends on z-scores, male parliamentary speech, and do not analyze number of documents and balance of terms in smaller clusters and dynamics inside them. A documents, length and number of authors (Sta- more detailed investigation of separate features matatos, 2009). While Burrow’s Delta is effec- (e.g., specific words, part-of-speech tags or their tive for English and German, it is less success- sequences) that are characteristic to female MPs ful for highly inflective languages, e.g., Latin and and male MPs individually, are part of future Polish (Rybicki and Eder, 2011). Hence we used plans, while in this paper we focus on the most Eder’s Delta, i.e., a modified Burrows’s Delta that frequent words. gives more weight to the frequent features and Experiments with more MFW (from 7000 up rescales less frequent to avoid random infrequent to 9910) successfully separated samples of parlia- ones (Eder et al., 2014).
Recommended publications
  • Quantitative Authorship Attribution: a History and an Evaluation of Techniques
    QUANTITATIVEAUTHORSHIP ATTRIBUTION: A HISTORYAND AN EVALUATIONOF TECHNIQUES A thesis submitted in partial fulfillment of the requirements for the degree of IN THE DEPARTMENT OF LINGUISTICS O Jack Grieve 2005 SIMONFRASER UNI~RSITY Summer 2005 All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means, without the permission of the author Name: Jack William Grieve Degree: Master of Arts Title of Thesis: Quantitative Authorship Attribution: A History and an Evaluation of Techniques Examining Committee: Dr. Zita McRobbie Chair Associate Professor, Department of Linguistics Dr. Paul McFetridge Senior Supervisor Associate Professor, Department of Linguistics Dr. Maria Teresa Taboada Supervisor Assistant Professor, Department of Linguistics Dr. Fred Popowich External Examiner Professor, School of Computing Science Date Defended: SIMON FRASER UNIVERSITY PARTIAL COPYRIGHT LICENCE The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users. The author has further granted permission to Simon Fraser University to keep or make a digital copy for use in its circulating collection. The author has further agreed that permission for multiple copying of this work for scholarly purposes may be granted by either the author or the Dean of Graduate Studies. It is understood that copying or publication of this work for financial gain shall not be allowed without the author's written permission.
    [Show full text]
  • Detection of Translator Stylometry Using Pair-Wise Comparative Classification and Network Motif Mining
    Detection of Translator Stylometry using Pair-wise Comparative Classification and Network Motif Mining Heba Zaki Mohamed Abdallah El-Fiqi M.Sc. (Computer Sci.) Cairo University, Egypt B.Sc. (Computer Sci.) Zagazig University, Egypt SCIENTIA MANU E T MENTE A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the School of Engineering and Information Technology University of New South Wales Australian Defence Force Academy © Copyright 2013 by Heba El-Fiqi [This page is intentionally left blank] i Abstract Stylometry is the study of the unique linguistic styles and writing behaviours of individuals. The identification of translator stylometry has many contributions in fields such as intellectual-property, education, and forensic linguistics. Despite the research proliferation on the wider research field of authorship attribution using computational linguistics techniques, the translator stylometry problem is more challenging and there is no sufficient machine learning literature on the topic. Some authors even claimed that detecting who translated a piece of text is a problem with no solution; a claim we will challenge in this thesis. In this thesis, we evaluated the use of existing lexical measures for the transla- tor stylometry problem. It was found that vocabulary richness could not identify translator stylometry. This encouraged us to look for non-traditional represen- tations to discover new features to unfold translator stylometry. Network motifs are small sub-graphs that aim at capturing the local structure of a real network. We designed an approach that transforms the text into a network then identifies the distinctive patterns of a translator by employing network motif mining.
    [Show full text]
  • Identifying Idiolect in Forensic Authorship Attribution: an N-Gram Textbite Approach Alison Johnson & David Wright University of Leeds
    Identifying idiolect in forensic authorship attribution: an n-gram textbite approach Alison Johnson & David Wright University of Leeds Abstract. Forensic authorship attribution is concerned with identifying authors of disputed or anonymous documents, which are potentially evidential in legal cases, through the analysis of linguistic clues left behind by writers. The forensic linguist “approaches this problem of questioned authorship from the theoretical position that every native speaker has their own distinct and individual version of the language [. ], their own idiolect” (Coulthard, 2004: 31). However, given the diXculty in empirically substantiating a theory of idiolect, there is growing con- cern in the Veld that it remains too abstract to be of practical use (Kredens, 2002; Grant, 2010; Turell, 2010). Stylistic, corpus, and computational approaches to text, however, are able to identify repeated collocational patterns, or n-grams, two to six word chunks of language, similar to the popular notion of soundbites: small segments of no more than a few seconds of speech that journalists are able to recognise as having news value and which characterise the important moments of talk. The soundbite oUers an intriguing parallel for authorship attribution studies, with the following question arising: looking at any set of texts by any author, is it possible to identify ‘n-gram textbites’, small textual segments that characterise that author’s writing, providing DNA-like chunks of identifying ma- terial? Drawing on a corpus of 63,000 emails and 2.5 million words written by 176 employees of the former American energy corporation Enron, a case study approach is adopted, Vrst showing through stylistic analysis that one Enron em- ployee repeatedly produces the same stylistic patterns of politely encoded direc- tives in a way that may be considered habitual.
    [Show full text]
  • Learning Stylometric Representations for Authorship Analysis
    0 Learning Stylometric Representations for Authorship Analysis STEVEN H. H. DING, School of Information Studies, McGill University, Canada BENJAMIN C. M. FUNG, School of Information Studies, McGill University, Canada FARKHUND IQBAL, College of Technological Innovation, Zayed University, UAE WILLIAM K. CHEUNG, Department of Computer Science, Hong Kong Baptist University, Hong Kong Authorship analysis (AA) is the study of unveiling the hidden properties of authors from a body of expo- nentially exploding textual data. It extracts an author’s identity and sociolinguistic characteristics based on the reflected writing styles in the text. It is an essential process for various areas, such as cybercrime investigation, psycholinguistics, political socialization, etc. However, most of the previous techniques criti- cally depend on the manual feature engineering process. Consequently, the choice of feature set has been shown to be scenario- or dataset-dependent. In this paper, to mimic the human sentence composition process using a neural network approach, we propose to incorporate different categories of linguistic features into distributed representation of words in order to learn simultaneously the writing style representations based on unlabeled texts for authorship analysis. In particular, the proposed models allow topical, lexical, syn- tactical, and character-level feature vectors of each document to be extracted as stylometrics. We evaluate the performance of our approach on the problems of authorship characterization and authorship verification
    [Show full text]
  • A Comparison of Classifiers and Features for Authorship Authentication of Social Networking Messages
    CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. (2016) Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.3918 SPECIAL ISSUE PAPER A comparison of classifiers and features for authorship authentication of social networking messages Jenny S. Li1, Li-Chiou Chen1,*,†, John V. Monaco1, Pranjal Singh2 and Charles C. Tappert1 1Seidenberg School of Computer Science and Information Systems, Pace University, New York City, NY 10038, USA 2VISA Inc. Technology Center, Bangalore, India SUMMARY This paper develops algorithms and investigates various classifiers to determine the authenticity of short social network postings, an average of 20.6 words, from Facebook. This paper presents and discusses several experiments using a variety of classifiers. The goal of this research is to determine the degree to which such postings can be authenticated as coming from the purported user and not from an intruder. Various sets of stylometry and ad hoc social networking features were developed to categorize 9259 posts from 30 Facebook authors as authentic or non-authentic. An algorithm to utilize machine-learning classifiers for investigating this problem is described, and an additional voting algorithm that combines three classifiers is investigated. This research is one of the first works that focused on authorship authentication in short messages, such as postings on social network sites. The challenges of applying traditional stylometry techniques on short messages are discussed. Experimental results demonstrate an average accuracy rate of 79.6% among 30 users. Further empirical analyses evaluate the effect of sample size, feature selection, user writing style, and classification method on authorship authentication, indicating varying degrees of success compared with previous studies.
    [Show full text]
  • Profiling a Set of Personality Traits of Text Author: What Our Words Reveal About Us*
    Research in Language, 2016, vol. 14:4 DOI: 10.1515/rela-2016-0019 PROFILING A SET OF PERSONALITY TRAITS OF TEXT AUTHOR: WHAT OUR WORDS REVEAL ABOUT US* TATIANA LITVINOVA Voronezh State Pedagogical University [email protected] PAVEL SEREDIN Voronezh State University [email protected] OLGA LITVINOVA Voronezh State Pedagogical University [email protected] OLGA ZAGOROVSKAYA Voronezh State Pedagogical University [email protected] Abstract Authorship profiling, i.e. revealing information about an unknown author by analyzing their text, is a task of growing importance. One of the most urgent problems of authorship profiling (AP) is selecting text parameters which may correlate to an author’s personality. Most researchers’ selection of these is not underpinned by any theory. This article proposes an approach to AP which applies neuroscience data. The aim of the study is to assess the probability of self- destructive behaviour of an individual via formal parameters of their texts. Here we have used the “Personality Corpus”, which consists of Russian-language texts. A set of correlations between scores on the Freiburg Personality Inventory scales that are known to be indicative of self-destructive behaviour (“Spontaneous Aggressiveness”, “Depressiveness”, “Emotional Lability”, and “Composedness”) and text variables (average sentence length, lexical diversity etc.) has been calculated. Further, a mathematical model which predicts the probability of self- destructive behaviour has been obtained. Keywords: authorship profiling, neurolinguistics, language personality, computational stylometry, discourse production * The study was funded by the RF President's grants for young scientists N° МК-4633.2016.6 “Predicting the Probability of Suicide Behavior Based on Speech Analysis”. 409 © by the author, licensee Łódź University – Łódź University Press, Łódź, Poland.
    [Show full text]
  • On the Feasibility of Internet-Scale Author Identification
    On the Feasibility of Internet-Scale Author Identification Arvind Narayanan Hristo Paskov Neil Zhenqiang Gong John Bethencourt [email protected] [email protected] [email protected] [email protected] Emil Stefanov Eui Chul Richard Shin Dawn Song [email protected] [email protected] [email protected] Abstract—We study techniques for identifying an anonymous Yet a right to anonymity is meaningless if an anonymous author via linguistic stylometry, i.e., comparing the writing author’s identity can be unmasked by adversaries. There style against a corpus of texts of known authorship. We exper- have been many attempts to legally force service providers imentally demonstrate the effectiveness of our techniques with as many as 100,000 candidate authors. Given the increasing and other intermediaries to reveal the identity of anonymous availability of writing samples online, our result has serious users. While sometimes successful [5; 6], in most cases implications for anonymity and free speech — an anonymous courts have upheld a right to anonymous speech [7; 8; 9]. blogger or whistleblower may be unmasked unless they take All of these efforts have relied on the author revealing their steps to obfuscate their writing style. name or IP address to a service provider, who may in turn While there is a huge body of literature on authorship recognition based on writing style, almost none of it has studied pass on that information. A careful author need not register corpora of more than a few hundred authors. The problem for a service with their real name, and tools such as Tor can becomes qualitatively different at a large scale, as we show, be used to hide their identity at the network level [10].
    [Show full text]
  • Adversarial Stylometry: Circumventing Authorship Recognition to Preserve
    Adversarial Stylometry: Circumventing Authorship Recognition to Preserve Privacy and Anonymity 12 MICHAEL BRENNAN, SADIA AFROZ, and RACHEL GREENSTADT, Drexel University The use of stylometry, authorship recognition through purely linguistic means, has contributed to literary, historical, and criminal investigation breakthroughs. Existing stylometry research assumes that authors have not attempted to disguise their linguistic writing style. We challenge this basic assumption of existing stylometry methodologies and present a new area of research: adversarial stylometry. Adversaries have a devastating effect on the robustness of existing classification methods. Our work presents a framework for creating adversarial passages including obfuscation, where a subject attempts to hide her identity, and imitation, where a subject attempts to frame another subject by imitating his writing style, and translation where original passages are obfuscated with machine translation services. This research demonstrates that manual circumvention methods work very well while automated translation methods are not effective. The obfuscation method reduces the techniques’ effectiveness to the level of random guessing and the imitation attempts succeed up to 67% of the time depending on the stylometry technique used. These results are more significant given the fact that experimental subjects were unfamiliar with stylometry, were not professional writers, and spent little time on the attacks. This article also contributes to the field by using human subjects to empirically validate the claim of high accuracy for four current techniques (without adversaries). We have also compiled and released two corpora of adversarial stylometry texts to promote research in this field with a total of 57 unique authors. We argue that this field is important to a multidisciplinary approach to privacy, security, and anonymity.
    [Show full text]
  • Linguistic Authentication and Reliability 1,1. Authorship in An
    c.I Linguistic Authentication and Reliability Carole E. Chaski, Ph.D. 1,1. Authorship in an Electronic Society Many different types of crime and civil action involve documents whose origins or authorship must be authenticated. The traditional method of linking document with author has involved Questioned Document Examination, in particular handwriting or typewriter identification and/or ink dating. But our society is rapidly moving beyond pen, pencil and typewriter; we produce more and more electronic documents. Documents composed on the computer, printed over networks, faxed over telephone lines or simply stored in electronic memory preclude traditional handwriting identification. When the authorship of an electronically produced document is disputed, the analysis of handwriting and typing obviously do not apply, but also in the case of networked printers- to which thousands of potential users have access --even ink, paper and printer identification cannot narrow the range of suspects or produce a solitary identification. The language of a document, however, is independent of whether a document is written or printed or faxed or stored electronically. The question then arises: can the language of a document be used to link the document with the author? Since the early 1900's, American courts have dealt with this question, from a legal perspective, in terms of admissibility of language evidence. Table I summarizes what has been proffered as language-based evidence of authorship: punctuation, grammatical errors, spelling errors, sentence
    [Show full text]
  • Forensic Authorship Analysis of Microblogging Texts Using N-Grams and Stylometric Features
    Forensic Authorship Analysis of Microblogging Texts Using N-Grams and Stylometric Features Nicole Mariah Sharon Belvisi Naveed Muhammad Fernando Alonso-Fernandez School of Information Technology (ITE) Institute of Computer Science School of Information Technology (ITE) Halmstad University, Sweden University of Tartu, Estonia Halmstad University, Sweden [email protected] [email protected] [email protected] Abstract—In recent years, messages and text posted on the techniques to determine authorship from such pieces of digital Internet are used in criminal investigations. Unfortunately, the evidence. Accordingly, the purpose of this work is to analyze authorship of many of them remains unknown. In some channels, methods to identify the writer of a digital text. We will focus the problem of establishing authorship may be even harder, since the length of digital texts is limited to a certain number on short texts limited to 280 characters (Twitter posts). of characters. In this work, we aim at identifying authors of Authorship analysis aims at establishing a connection be- tweet messages, which are limited to 280 characters. We evaluate tween a text and its author. It relies on the fact that every popular features employed traditionally in authorship attribution person has a specific set of features in their writing style that which capture properties of the writing style at different levels. distinguishes them from any other individual [5]. The problem We use for our experiments a self-captured database of 40 users, with 120 to 200 tweets per user. Results using this small set are can be approached from three different perspectives [1], [6]: promising, with the different features providing a classification • Authorship Identification, also known as Authorship At- accuracy between 92% and 98.5%.
    [Show full text]
  • Stylometric Techniques for Multiple Author Clustering Shakespeare‘S Authorship in the Passionate Pilgrim
    (IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 8, No. 3, 2017 Stylometric Techniques for Multiple Author Clustering Shakespeare‘s Authorship in The Passionate Pilgrim David Kernot1 3 Terry Bossomaier Roger Bradbury 1Joint and Operations Analysis 2The Centre for Research in 3National Security College Division Complex Systems The Australian National University Defence Science Technology Group Charles Sturt University Canberra, ACT, Australia Edinburgh, SA, Australia Bathurst, NSW, Australia Abstract—In 1598-99 printer, William Jaggard named Modern scholars are divided on the authorship of the Shakespeare as the sole author of The Passionate Pilgrim even remaining unknown twelve. Reference [5] suggests Jaggard though Jaggard chose a number of non-Shakespearian poems in used Shakespeare‘s name because the majority of the poems the volume. Using a neurolinguistics approach to authorship were Shakespeare‘s, including 12 unidentified poems in The identification, a four-feature technique, RPAS, is used to convert Passionate Pilgrim said to be his earlier quality work and the 21 poems in The Passionate Pilgrim into a multi-dimensional never meant for publishing. She also adds there is some doubt vector. Three complementary analytical techniques are applied surrounding the authorship of the Barnfield and Griffin to cluster the data and reduce single technique bias before an poems. Reference [6] disputes Shakespeare‘s authorship, alternate method, seriation, is used to measure the distances while [7] suggest eight, not 12 of the anonymous poems are between clusters and test the strength of the connections. The Shakespeare‘s. However, [2] suggest poems 7, 10, 13, 14, 15, multivariate techniques are found to be robust and able to allocate nine of the 12 unknown poems to Shakespeare.
    [Show full text]
  • Automatic IQ Estimation Using Stylometry Methods
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by University of Louisville University of Louisville ThinkIR: The University of Louisville's Institutional Repository Electronic Theses and Dissertations 5-2018 Automatic IQ estimation using stylometry methods. Polina Shafran Abramov University of Louisville Follow this and additional works at: https://ir.library.louisville.edu/etd Recommended Citation Abramov, Polina Shafran, "Automatic IQ estimation using stylometry methods." (2018). Electronic Theses and Dissertations. Paper 2922. https://doi.org/10.18297/etd/2922 This Master's Thesis is brought to you for free and open access by ThinkIR: The University of Louisville's Institutional Repository. It has been accepted for inclusion in Electronic Theses and Dissertations by an authorized administrator of ThinkIR: The University of Louisville's Institutional Repository. This title appears here courtesy of the author, who has retained all other copyrights. For more information, please contact [email protected]. AUTOMATIC IQ ESTIMATION USING STYLOMETRY METHODS By Polina Shafran Abramov B.A., Technion, Israel, 2004 A Thesis Submitted to the Faculty of the J.B. Speed School of Engineering of the University of Louisville in Partial Fulfillment of the requirements for the degree of Master of Science in Computer Science Department of Computer Engineering & Computer Science University of Louisville Louisville, Kentucky May 2018 Copyright 2018 by Polina Shafran Abramov All rights reserved AUTOMATIC IQ ESTIMATION USING STYLOMETRY METHODS By Polina Shafran Abramov B.A., Technion, Israel, 2004 A Thesis Approved On April 24th, 2018 by the following Thesis Committee: ________________________________ Dr. Roman V. Yampolskiy, CECS Department, Thesis Director ________________________________ Dr.
    [Show full text]