Stylometric Techniques for Multiple Author Clustering Shakespeare‘S Authorship in the Passionate Pilgrim
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Sonnets. Edited by C. Knox Pooler
Presented to the LIBRARY of the UNIVERSITY OF TORONTO hy The 'Estate of the late PROFESSOR A. S. P. WOODHOIISE Head of the Department of English -» University College 1944-1964 \ '^/i^ /F. ^r:y r. -1 "^ NiL- ' 7^ ( ^S, U , - ^ ^' ^ ^/f '^i>-, '^Si^6,i(i? THE ARDEN SHAKESPEARE GENERAL EDITOR : W. J. CRAIG 1899-1906: R. H. CASE, 1909 SONNETS J^' THE WORKS OF SHAKESPEARE SONNETS EDITED BY C. KNOX POOLER ? METHUEN AND CO. LTD. 36 ESSEX STREET : STRAND LONDON First Published in igi8 z£4S CONTENTS PAOE Introduction ^* Dedication ^ Sonnets ..... 3 A Lover's Complaint *45 INTRODUCTION According to the Stationers' Registers, a license to print a book called Shakespeare's Sonnets was granted to Thomas Tjiprpe on the 20th of May, 1609. It appeared with the : Sonnets Never before following title-page Shake-speares | | At London G. Eld for T. T. and are to be ] Imprinted. | | by solde William Some instead of by Apsley. \ 1609. copies " " William have " lohn at Christ Apsley Wright, dwelling j Church gate," an indication that these two publishers shared in the venture. The publication cannot have been long delayed, for Edward Alleyn, the actor, bought a copy (for ^d.) in June. " " The words never before imprinted are not strictly accurate, as two of the sonnets, cxxxviii. and cxliv., had already ap- peared in The Passionate Pilgrim (1599). The book seems to have been issued without Shakespeare's his are knowledge, certainly without super\'ision ; misprints the often both sense unusually frequent ; punctuation neglects and and there are other errors of more rhythm ; consequence which no author or competent reader could have overlooked. -
Quantitative Authorship Attribution: a History and an Evaluation of Techniques
QUANTITATIVEAUTHORSHIP ATTRIBUTION: A HISTORYAND AN EVALUATIONOF TECHNIQUES A thesis submitted in partial fulfillment of the requirements for the degree of IN THE DEPARTMENT OF LINGUISTICS O Jack Grieve 2005 SIMONFRASER UNI~RSITY Summer 2005 All rights reserved. This work may not be reproduced in whole or in part, by photocopy or other means, without the permission of the author Name: Jack William Grieve Degree: Master of Arts Title of Thesis: Quantitative Authorship Attribution: A History and an Evaluation of Techniques Examining Committee: Dr. Zita McRobbie Chair Associate Professor, Department of Linguistics Dr. Paul McFetridge Senior Supervisor Associate Professor, Department of Linguistics Dr. Maria Teresa Taboada Supervisor Assistant Professor, Department of Linguistics Dr. Fred Popowich External Examiner Professor, School of Computing Science Date Defended: SIMON FRASER UNIVERSITY PARTIAL COPYRIGHT LICENCE The author, whose copyright is declared on the title page of this work, has granted to Simon Fraser University the right to lend this thesis, project or extended essay to users of the Simon Fraser University Library, and to make partial or single copies only for such users or in response to a request from the library of any other university, or other educational institution, on its own behalf or for one of its users. The author has further granted permission to Simon Fraser University to keep or make a digital copy for use in its circulating collection. The author has further agreed that permission for multiple copying of this work for scholarly purposes may be granted by either the author or the Dean of Graduate Studies. It is understood that copying or publication of this work for financial gain shall not be allowed without the author's written permission. -
Stylometric Analysis of Parliamentary Speeches: Gender Dimension
Stylometric Analysis of Parliamentary Speeches: Gender Dimension Justina Mandravickaite˙ Tomas Krilaviciusˇ Vilnius University, Lithuania Vytautas Magnus University, Lithuania Baltic Institute of Advanced Baltic Inistitute of Advanced Technology, Lithuania Technology, Lithuania [email protected] [email protected] Abstract to capture the differences in the language due to the gender (Newman et al., 2008; Herring and Relation between gender and language has Martinson, 2004). Some results show that gen- been studied by many authors, however, der differences in language depend on the con- there is still some uncertainty left regard- text, e.g., people assume male language in a for- ing gender influence on language usage mal setting and female in an informal environ- in the professional environment. Often, ment (Pennebaker, 2011). We investigate gender the studied data sets are too small or texts impact to the language use in a professional set- of individual authors are too short in or- ting, i.e., transcripts of speeches of the Lithua- der to capture differences of language us- nian Parliament debates. We study language wrt age wrt gender successfully. This study style, i.e., male and female style of the language draws from a larger corpus of speeches usage by applying computational stylistics or sty- transcripts of the Lithuanian Parliament lometry. Stylometry is based on the two hypothe- (1990–2013) to explore language differ- ses: (1) human stylome hypothesis, i.e., each in- ences of political debates by gender via dividual has a unique style (Van Halteren et al., stylometric analysis. Experimental set 2005); (2) unique style of individual can be mea- up consists of stylistic features that indi- sured (Stamatatos, 2009), stylometry allows gain- cate lexical style and do not require exter- ing meta-knowledge (Daelemans, 2013), i.e., what nal linguistic tools, namely the most fre- can be learned from the text about the author quent words, in combination with unsu- - gender (Luyckx et al., 2006; Argamon et al., pervised machine learning algorithms. -
Detection of Translator Stylometry Using Pair-Wise Comparative Classification and Network Motif Mining
Detection of Translator Stylometry using Pair-wise Comparative Classification and Network Motif Mining Heba Zaki Mohamed Abdallah El-Fiqi M.Sc. (Computer Sci.) Cairo University, Egypt B.Sc. (Computer Sci.) Zagazig University, Egypt SCIENTIA MANU E T MENTE A thesis submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the School of Engineering and Information Technology University of New South Wales Australian Defence Force Academy © Copyright 2013 by Heba El-Fiqi [This page is intentionally left blank] i Abstract Stylometry is the study of the unique linguistic styles and writing behaviours of individuals. The identification of translator stylometry has many contributions in fields such as intellectual-property, education, and forensic linguistics. Despite the research proliferation on the wider research field of authorship attribution using computational linguistics techniques, the translator stylometry problem is more challenging and there is no sufficient machine learning literature on the topic. Some authors even claimed that detecting who translated a piece of text is a problem with no solution; a claim we will challenge in this thesis. In this thesis, we evaluated the use of existing lexical measures for the transla- tor stylometry problem. It was found that vocabulary richness could not identify translator stylometry. This encouraged us to look for non-traditional represen- tations to discover new features to unfold translator stylometry. Network motifs are small sub-graphs that aim at capturing the local structure of a real network. We designed an approach that transforms the text into a network then identifies the distinctive patterns of a translator by employing network motif mining. -
Identifying Idiolect in Forensic Authorship Attribution: an N-Gram Textbite Approach Alison Johnson & David Wright University of Leeds
Identifying idiolect in forensic authorship attribution: an n-gram textbite approach Alison Johnson & David Wright University of Leeds Abstract. Forensic authorship attribution is concerned with identifying authors of disputed or anonymous documents, which are potentially evidential in legal cases, through the analysis of linguistic clues left behind by writers. The forensic linguist “approaches this problem of questioned authorship from the theoretical position that every native speaker has their own distinct and individual version of the language [. ], their own idiolect” (Coulthard, 2004: 31). However, given the diXculty in empirically substantiating a theory of idiolect, there is growing con- cern in the Veld that it remains too abstract to be of practical use (Kredens, 2002; Grant, 2010; Turell, 2010). Stylistic, corpus, and computational approaches to text, however, are able to identify repeated collocational patterns, or n-grams, two to six word chunks of language, similar to the popular notion of soundbites: small segments of no more than a few seconds of speech that journalists are able to recognise as having news value and which characterise the important moments of talk. The soundbite oUers an intriguing parallel for authorship attribution studies, with the following question arising: looking at any set of texts by any author, is it possible to identify ‘n-gram textbites’, small textual segments that characterise that author’s writing, providing DNA-like chunks of identifying ma- terial? Drawing on a corpus of 63,000 emails and 2.5 million words written by 176 employees of the former American energy corporation Enron, a case study approach is adopted, Vrst showing through stylistic analysis that one Enron em- ployee repeatedly produces the same stylistic patterns of politely encoded direc- tives in a way that may be considered habitual. -
Sweet Cytherea”
Proving Oxfordian Authorship in “Sweet Cytherea” The Wind-Up Oxford’s poems do not resemble Shakespeare’s. They were two different writers. Such is Academe’s preclusive claim that a literary chasm exists between the known, usually early, writings of Edward de Vere, 17th Earl of Oxford, and the collected works we recognize by the spectacular epithet ‘Shakespeare’. (Baldrick, l7-18; Elliott, The Shakespeare Files; Kathman, website; Low, letter NY Times; Nelson, quoted, “Shakespeare Matters”, 7; Nelson, website) Since Lord Oxford published under a series of pseudonyms and proxies in order to carry on an artistic vocation shunned by his class, only three subscribed poems after his youth have survived. (“Shakespeare” Vol I, 553) There are no original notes and manuscripts to document an Oxford/’Shakespeare’ stylistic evolution. His plays are said to have been lost. (Sidney Lee, in “Shakespeare” Vol I, 112) The 1951 Encyclopaedia Britannica noted only, “He was a lyric poet of no small merit.” Orthodoxy therefore may prefer the slanted odds of comparing The Sonnets, ‘Shakespeare’s masterpiece, with Oxford’s juvenilia, involving a gap of twenty-five to thirty-five years in a life full of writing and personal change. Lacking the autograph work, critics who credit Oxford as the mind behind the name 'Shakespeare' must build their evidence from logical deduction, similar phrasing and poetic devices, biographical allusion, vocabulary, allegorical reference, and a recombination of previously disparate sources. But these investigative techniques apply to any author’s unprovenanced writings. The literary detective work is no different. Should it link an unattributed work to Francois Marie Arouet, for instance, which means simultaneously to his pseudonym Voltaire, it would be a red-letter day for literature. -
I599-I6oi: the Author Brought Into Print Play Scene: "Loves Labors Lost" to "Troilus and Cressida "
PART THREE I599-I6oi: the author brought into print Play scene: "Loves Labors Lost" to "Troilus and Cressida " "How many tales to please me she hath coined." "W. Shakespere," The Passionate Pilgrim, Poem 7 In his monumental 1790 edition, tided The Plays and Poems of William Shakspeare, Edmund Malone performs a curious editorial procedure. As rhe final poem to William Jaggard's The Passionate Pilgrim (1599, 1612) he prints "The Phoenix and Turtle," lifted from Robert C hester's Love's MartyJ; or Rosa/ins Complaint (r6or). Malone's procedure may obscure the textual independence of these rwo volumes of early modern poetry, bur it nonetheless highlights a shared genealogy for them: the genre of the printed miscellany.' In bringing rhe rwo volumes together, Malone makes available a set of comparisons berween rwo works of erotic verse printed at about rhe same nme. Together, rhe printing of The Passionate Pilgrim and "The Phoenix and Turde" near rhe mid-point ofShal<espeare's professional career represents a second phase of rhe national poet-playwright in print. Unlike rhe 1593-94 Venus and Lucrece, however, these works do nor show the a uthor presenting himself as a poet through the medium of print, but rather they are works rhat show a manuscript poet brought into print by others. The nature of rhe appropriation differs, as do rhe roles of Jaggard and C hester in Eliza bethan culture. Jaggard was a publisher and businessman on rhe look for a market success; Chester, an "obscure poet" in search of a patron (Burrow, ed., Sonnets and Poems, 82). -
The Sonnets and Shorter Poems
AN ANALYSIS OF A NOTEBOOK OF JAMES ORCHARD HALLIWELL-PHILLIPPS THE SONNETS AND SHORTER POEMS by ELIZABETH PATRICIA PRACY A thesis submitted to the Faculty of Arts of The University of Birmingham for the degree of MASTER OF PHILOSOPHY The Shakespeare Institute Faculty of Arts The University of Birmingham March 1999 University of Birmingham Research Archive e-theses repository This unpublished thesis/dissertation is copyright of the author and/or third parties. The intellectual property rights of the author or third parties in respect of this work are as defined by The Copyright Designs and Patents Act 1988 or as modified by any successor legislation. Any use made of information contained in this thesis/dissertation must be in accordance with that legislation and must be properly acknowledged. Further distribution or reproduction in any format is prohibited without the permission of the copyright holder. O t:O SYNOPSIS The thesis starts with an Introduction which explains that the subject of the work is an analysis of the Notebook of J. O. Halliwell-Phillipps dealing with the Sonnets and shorter poems of Shakespeare owned by the Shakespeare Centre Library, Stratford-upon- Avon. This is followed by an explanation of the material and methods used to examine the pages of the Notebook and a brief account of Halliwell-Phillipps and his collections as well as a description of his work on the life and background of Shakespeare. Each page of the Notebook is then dealt with in order and outlined, together with a photocopy of Halliwell-Phillipps1 entry. Entries are identified where possible, with an explanation and description of the work referred to. -
Learning Stylometric Representations for Authorship Analysis
0 Learning Stylometric Representations for Authorship Analysis STEVEN H. H. DING, School of Information Studies, McGill University, Canada BENJAMIN C. M. FUNG, School of Information Studies, McGill University, Canada FARKHUND IQBAL, College of Technological Innovation, Zayed University, UAE WILLIAM K. CHEUNG, Department of Computer Science, Hong Kong Baptist University, Hong Kong Authorship analysis (AA) is the study of unveiling the hidden properties of authors from a body of expo- nentially exploding textual data. It extracts an author’s identity and sociolinguistic characteristics based on the reflected writing styles in the text. It is an essential process for various areas, such as cybercrime investigation, psycholinguistics, political socialization, etc. However, most of the previous techniques criti- cally depend on the manual feature engineering process. Consequently, the choice of feature set has been shown to be scenario- or dataset-dependent. In this paper, to mimic the human sentence composition process using a neural network approach, we propose to incorporate different categories of linguistic features into distributed representation of words in order to learn simultaneously the writing style representations based on unlabeled texts for authorship analysis. In particular, the proposed models allow topical, lexical, syn- tactical, and character-level feature vectors of each document to be extracted as stylometrics. We evaluate the performance of our approach on the problems of authorship characterization and authorship verification -
The Passionate Pilgrim
CHAPTE R 5 "TJ.ates ' . cozne· d" : ''W. S''-na kespeare , zn· Jagg' ar,d 's The Passionate Pilgrim [William Jaggard was] an infamous pirate, liar, and thief [who pro duced a] worthless litrle volume of stolen and mutilated poetry, parched up and padded our with dirty and dreary doggerel. Algernon Charles Swinburne, Studies in Prose and Poetry (1894), 90 W ith the 1623 First Folio and rhe 1599 and 1612 editions of The Passionate PiLgrim, William Jagga rd had primed the first collecrions of both Shakespeare's plays and his poems. Margrcra de Grazia, Sbakespeare Verbntim (1991). 167 T he above epigraphs pi npoint changing critical perceptions of William Jaggard's role in Shal(espeare's professional career. At the end of the nineteenth century, Swinburne works from a "Romantic" view of the autonomous author to judge Jaggard morally and The Passionate Pilgrim aesthetically. Jaggard is a cheat and the poetry poor. Since the poems' only begetter is a pirate, liar, and thief, and his li ttle volume stolen, mutilated, patched, padded, dirty, dreary, and worthless, who could find interest in the enterprise? A hundred years later, de Grazia helps us begin to under stand why. Even if we condemn Jaggard, he occupies a historic position in the printing of the national poet-playwright. He is the first to anticipate modern editors, including Malone, in the publication of both "the plays and poems ofWill iam Shakspeare." In between Swinburne and de Grazia, W illiam Empson gets at the crux of the histori cal matter when he remarks, "The Passionate Pilgrim (1599) is a cheat, by a pirate who is very appreciative of the work of Shakespeare" ("Narrative Poems," u ). -
A Comparison of Classifiers and Features for Authorship Authentication of Social Networking Messages
CONCURRENCY AND COMPUTATION: PRACTICE AND EXPERIENCE Concurrency Computat.: Pract. Exper. (2016) Published online in Wiley Online Library (wileyonlinelibrary.com). DOI: 10.1002/cpe.3918 SPECIAL ISSUE PAPER A comparison of classifiers and features for authorship authentication of social networking messages Jenny S. Li1, Li-Chiou Chen1,*,†, John V. Monaco1, Pranjal Singh2 and Charles C. Tappert1 1Seidenberg School of Computer Science and Information Systems, Pace University, New York City, NY 10038, USA 2VISA Inc. Technology Center, Bangalore, India SUMMARY This paper develops algorithms and investigates various classifiers to determine the authenticity of short social network postings, an average of 20.6 words, from Facebook. This paper presents and discusses several experiments using a variety of classifiers. The goal of this research is to determine the degree to which such postings can be authenticated as coming from the purported user and not from an intruder. Various sets of stylometry and ad hoc social networking features were developed to categorize 9259 posts from 30 Facebook authors as authentic or non-authentic. An algorithm to utilize machine-learning classifiers for investigating this problem is described, and an additional voting algorithm that combines three classifiers is investigated. This research is one of the first works that focused on authorship authentication in short messages, such as postings on social network sites. The challenges of applying traditional stylometry techniques on short messages are discussed. Experimental results demonstrate an average accuracy rate of 79.6% among 30 users. Further empirical analyses evaluate the effect of sample size, feature selection, user writing style, and classification method on authorship authentication, indicating varying degrees of success compared with previous studies. -
Profiling a Set of Personality Traits of Text Author: What Our Words Reveal About Us*
Research in Language, 2016, vol. 14:4 DOI: 10.1515/rela-2016-0019 PROFILING A SET OF PERSONALITY TRAITS OF TEXT AUTHOR: WHAT OUR WORDS REVEAL ABOUT US* TATIANA LITVINOVA Voronezh State Pedagogical University [email protected] PAVEL SEREDIN Voronezh State University [email protected] OLGA LITVINOVA Voronezh State Pedagogical University [email protected] OLGA ZAGOROVSKAYA Voronezh State Pedagogical University [email protected] Abstract Authorship profiling, i.e. revealing information about an unknown author by analyzing their text, is a task of growing importance. One of the most urgent problems of authorship profiling (AP) is selecting text parameters which may correlate to an author’s personality. Most researchers’ selection of these is not underpinned by any theory. This article proposes an approach to AP which applies neuroscience data. The aim of the study is to assess the probability of self- destructive behaviour of an individual via formal parameters of their texts. Here we have used the “Personality Corpus”, which consists of Russian-language texts. A set of correlations between scores on the Freiburg Personality Inventory scales that are known to be indicative of self-destructive behaviour (“Spontaneous Aggressiveness”, “Depressiveness”, “Emotional Lability”, and “Composedness”) and text variables (average sentence length, lexical diversity etc.) has been calculated. Further, a mathematical model which predicts the probability of self- destructive behaviour has been obtained. Keywords: authorship profiling, neurolinguistics, language personality, computational stylometry, discourse production * The study was funded by the RF President's grants for young scientists N° МК-4633.2016.6 “Predicting the Probability of Suicide Behavior Based on Speech Analysis”. 409 © by the author, licensee Łódź University – Łódź University Press, Łódź, Poland.