Stylistics Versus Statistics: a Corpus Linguistic Approach to Combining Techniques in Forensic Authorship Analysis Using Enron E

Total Page:16

File Type:pdf, Size:1020Kb

Stylistics Versus Statistics: a Corpus Linguistic Approach to Combining Techniques in Forensic Authorship Analysis Using Enron E Stylistics versus Statistics: A corpus linguistic approach to combining techniques in forensic authorship analysis using Enron emails David Wright Submitted in accordance with the requirements for the degree of Doctor of Philosophy The University of Leeds School of English September 2014 The candidate confirms that the work submitted is his/her own, except where work which has formed part of jointly authored publications has been included. The contribution of the candidate and the other authors to this work has been explicitly indicated below. The candidate confirms that appropriate credit has been given within the thesis where reference has been made to the work of others. The work in Chapter 6 of the thesis is an expansion of the experiment performed and reported in: Johnson, Alison and David Wright. 2014. Identifying idiolect in forensic authorship attribution: an n-gram textbite approach. Language and Law (Linguagem e Direito) 1(1), 37–69. In the above article, I was responsible for the experiment and the following sections are directly attributable to me: ‘Computational tools’ (p. 44) ‘Jaccard’s similarity coefficient’ (p. 44–5) ‘Experimental set up’ (p. 45–6) ‘Attribution task’ (p. 56–7) ‘Results’ (p. 57–60) In the above article, the following sections are directly attributable to Dr Alison Johnson: ‘James Derrick’ (p. 43–4) ‘Case study of James Derrick’ (p. 46) ‘Identifying Derrick’s professional authorial style’ (p. 46–54) ‘Conclusion’ (p. 62–63) The remaining sections of the above article are co-authored: ‘Introduction’ (p. 38–40) ‘N-grams, idiolect and email’ (p. 40–42) ‘The Enron Email Corpus’ (p. 42–43) ‘Summary of findings’ (p. 54–56) ‘Derrick’s discriminating n-grams’ (p. 60–62) This copy has been supplied on the understanding that it is copyright material and that no quotation from the thesis may be published without proper acknowledgement. © 2014 The University of Leeds and David Wright The right of David Wright to be identified as Author of this work has been asserted by him in accordance with the Copyright, Designs and Patents Act 1988. Acknowledgements First and foremost, my deepest thanks go to my wonderful parents Janet and Joseph William Wright, for their unwavering support throughout my PhD and beyond. Their selflessness is such that when the idea of me doing a PhD first arose, they offered to sell the family home to finance it. Thankfully, that was not necessary (I would never have allowed them to sell it, anyway). They have had to maintain me during the times when I worked from home in the North East, though. I hope that they know that I love them very much. I would also like to thank the rest of my close family and friends, who have been consistently supportive during my seven years at University. Special mention must go to someone I have known and loved for the last three of those years: my girlfriend Rachael Groenendaal. She has known me only as a PhD student, so I am looking forward to sharing with her my less anxious, nervous and restless side. The patience and understanding she has is immeasurable, and every day I am reminded that I am lucky to have such an extraordinary person in my life. This PhD would not have been possible without the exceptional supervision of Dr Alison Johnson. Alison has the unique ability to calm me when I am worried, and inspire me when I am doubtful. It is that ability, combined with her remarkable trust in me, which has guided me through this work. I hope that one day, I can mentor someone in the way Alison has mentored me. Special thanks also go to David Woolls for enduring countless ridiculous questions and requests from me. David has responded every time without complaint, and his software and insightful comment have been invaluable to this research. It was him, along with Alison, who first introduced me to the Enron corpus. Finally, I would like to extend my thanks to past and present lecturing staff in the School of English at the University of Leeds, in particular: Clive Upton, Anthea Fraser Gupta and Fiona Douglas. Without their inspirational teaching and critical feedback during my time as an undergraduate and postgraduate, I would not be pursuing an academic career. This study is funded by an Arts and Humanities Research Council Doctoral Studentship (Grant number AH/J502686/1). Abstract This thesis empirically investigates how a corpus linguistic approach can address the main theoretical and methodological challenges facing the field of forensic authorship analysis. Linguists approach the problem of questioned authorship from the theoretical position that each person has their own distinctive idiolect (Coulthard 2004: 431). However, the notion of idiolect has come under scrutiny in forensic linguistics over recent years for being too abstract to be of practical use (Grant 2010; Turell 2010). At the same time, two competing methodologies have developed in authorship analysis. On the one hand, there are qualitative stylistic approaches, and on the other there are statistical ‘stylometric’ techniques. This study uses a corpus of over 60,000 emails and 2.5 million words written by 176 employees of the former American company Enron to tackle these issues in the contexts of both authorship attribution (identifying authors using linguistic evidence) and author profiling (predicting authors’ social characteristics using linguistic evidence). Analyses reveal that even in shared communicative contexts, and when using very common lexical items, individual Enron employees produce distinctive collocation patterns and lexical co-selections. In turn, these idiolectal elements of linguistic output can be captured and quantified by word n-grams (strings of n words). An attribution experiment is performed using word n-grams to identify the authors of anonymised email samples. Results of the experiment are encouraging, and it is argued that the approach developed here offers a means by which stylistic and statistical techniques can complement each other. Finally, quantitative and qualitative analyses are combined in the sociolinguistic profiling of Enron employees by gender and occupation. Current author profiling research is exclusively statistical in nature. However, the findings here demonstrate that when statistical results are augmented by qualitative evidence, the complex relationship between language use and author identity can be more accurately observed. Table of Contents List of Tables .......................................................................................................................... i List of Figures ....................................................................................................................... ii 1 Introduction .................................................................................................................. 1 1.1 Thesis outline .......................................................................................................... 4 2 Theoretical and methodological issues in authorship attribution and profiling .... 7 2.1 The usefulness and utility of a theory of idiolect .................................................... 7 2.2 Approaches to identifying authorship ................................................................... 11 2.2.1 Lexis in stylistic approaches to authorship attribution ................................. 12 2.2.2 Lexis in stylometric approaches to authorship attribution ............................ 14 2.2.3 Benefits and drawbacks of stylistic and stylometric approaches .................. 19 2.2.4 Combining the stylistic and the stylometric.................................................. 23 2.3 Collocation, idiolect and word n-grams ................................................................ 26 2.4 The emergence of author profiling ....................................................................... 34 2.4.1 Linguistic profiling and criminal profiling ................................................... 37 2.5 Chapter conclusion: addressing the issues ............................................................ 41 3 The Enron Email Corpus .......................................................................................... 43 3.1 Introducing Enron and the corpus ......................................................................... 43 3.2 Extracting and preparing the data ......................................................................... 47 3.3 Ethical considerations ........................................................................................... 51 3.4 Breakdown of the corpus ...................................................................................... 52 3.4.1 The full Enron Email Corpus (EEC) ............................................................. 53 3.4.2 The Pilot Corpus ........................................................................................... 58 3.4.3 The twelve-author sample (EEC12) ............................................................. 60 3.4.4 The eighty-author sample (EEC80) .............................................................. 63 3.5 Chapter conclusion ............................................................................................... 66 4 Methodology: tools and techniques .......................................................................... 67 4.1 Wordsmith Tools ................................................................................................... 67 4.1.1 ‘Keyword’ analysis ......................................................................................
Recommended publications
  • Systemic Functional Features in Stylistic Text Classification
    Systemic Functional Features in Stylistic Text Classification Casey Whitelaw Shlomo Argamon School of Information Technologies Department of Computer Science University of Sydney, NSW, Australia Illinois Institute of Technology [email protected] 10 W. 31st Street, Chicago, IL 60616, USA [email protected] Abstract of meaning, we are unlikely to gain any true insight into the nature of the stylistic dimension(s) under study. We propose that textual ‘style’ should be best de- fined as ‘non-denotational meaning’, i.e., those as- How then should we approach finding more theoretically- pects of a text’s meaning that are mostly inde- motivated stylistic feature sets? First consider the well- pendent of what the text refers to in the world. accepted notion that style is indicated by features that in- To make this more concrete, we describe a lin- dicate the author’s choice of one mode of expression from guistically well-motivated framework for compu- among a set of equivalent modes. At the surface level, this tational stylistic analysis based on Systemic Func- may be expressed by specific choice of words, syntactic struc- tional Linguistics. This theory views a text as a tures, discourse strategy, or combinations. The causes of such realisation of multiple overlapping choices within surface variation are similarly heterogeneous, including the a network of related meanings, many of which re- genre, register, or purpose of the text, as well as the educa- late to non-denotational (traditionally ‘pragmatic’) tional background, social status, and personality of the au- aspects such as cohesion or interpersonal distance. thor and audience.
    [Show full text]
  • The Empirical Base of Linguistics: Grammaticality Judgments and Linguistic Methodology
    UCLA UCLA Previously Published Works Title The empirical base of linguistics: Grammaticality judgments and linguistic methodology Permalink https://escholarship.org/uc/item/05b2s4wg ISBN 978-3946234043 Author Schütze, Carson T Publication Date 2016-02-01 DOI 10.17169/langsci.b89.101 Data Availability The data associated with this publication are managed by: Language Science Press, Berlin Peer reviewed eScholarship.org Powered by the California Digital Library University of California The empirical base of linguistics Grammaticality judgments and linguistic methodology Carson T. Schütze language Classics in Linguistics 2 science press Classics in Linguistics Chief Editors: Martin Haspelmath, Stefan Müller In this series: 1. Lehmann, Christian. Thoughts on grammaticalization 2. Schütze, Carson T. The empirical base of linguistics: Grammaticality judgments and linguistic methodology 3. Bickerton, Derek. Roots of language ISSN: 2366-374X The empirical base of linguistics Grammaticality judgments and linguistic methodology Carson T. Schütze language science press Carson T. Schütze. 2019. The empirical base of linguistics: Grammaticality judgments and linguistic methodology (Classics in Linguistics 2). Berlin: Language Science Press. This title can be downloaded at: http://langsci-press.org/catalog/book/89 © 2019, Carson T. Schütze Published under the Creative Commons Attribution 4.0 Licence (CC BY 4.0): http://creativecommons.org/licenses/by/4.0/ ISBN: 978-3-946234-02-9 (Digital) 978-3-946234-03-6 (Hardcover) 978-3-946234-04-3 (Softcover) 978-1-523743-32-2
    [Show full text]
  • Newcastle University Eprints
    Newcastle University ePrints Knight D, Adolphs S, Carter R. CANELC: constructing an e-language corpus. Corpora 2014, 9(1), 29-56. Copyright: The definitive version of this article, published by Edinburgh University Press, 2014, is available at: http://dx.doi.org/10.3366/cor.2014.0050 Always use the definitive version when citing. Further information on publisher website: www.euppublishing.com Date deposited: 23-07-2014 Version of file: Author Accepted Manuscript This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License ePrints – Newcastle University ePrints http://eprint.ncl.ac.uk CANELC – Constructing an e-language corpus ___________________________________________________________________________ Dawn Knight1, Svenja Adolphs2 and Ronald Carter2 This paper reports on the construction of CANELC: the Cambridge and Nottingham e- language Corpus 3 . CANELC is a one million word corpus of digital communication in English, taken from online discussion boards, blogs, tweets, emails and SMS messages. The paper outlines the approaches used when planning the corpus: obtaining consent; collecting the data and compiling the corpus database. This is followed by a detailed analysis of some of the patterns of language used in the corpus. The analysis includes a discussion of the key words and phrases used as well as the common themes and semantic associations connected with the data. These discussions form the basis of an investigation of how e-language operates in both similar and different ways to spoken and written records of communication (as evidenced by the BNC - British National Corpus). Keywords: Blogs, Tweets, SMS, Discussion Boards, e-language, Corpus Linguistics 1. Introduction Communication in the digital age is a complex many faceted process involving the production and reception of linguistic stimuli across a multitude of platforms and media types (see Boyd and Heer, 2006:1).
    [Show full text]
  • Ehrensperger Report
    American Name Society 51st Annual Ehrensperger Report 2005 A Publication of the American Name Society Michael F. McGoff, Editor PREFACE After a year’s hiatus the Ehrensperger Report returns to its place as a major publication of the American Name Society (ANS). This document marks the 51st year since its introduction to the membership by Edward C. Ehrensperger. For over twenty-five years, from 1955 to 1982, he compiled and published this annual review of scholarship. Edward C. Ehrensperger 1895-1984 As usual, it is a partial view of the research and other activity going on in the world of onomastics, or name study. In a report of this kind, the editor must make use of what comes in, often resulting in unevenness. Some of the entries are very short; some extensive, especially from those who are reporting not just for themselves but also for the activity of a group of people. In all cases, I have assumed the prerogative of an editor and have abridged, clarified, and changed the voice of many of the submissions. I have encouraged the submission of reports by email or electronically, since it is much more efficient to edit text already typed than to type the text myself. For those not using email, I strongly encourage sending me written copy. There is some danger, however, in depending on electronic copy: sometimes diacritical marks or other formatting matters may not have come through correctly. In keeping with the spirit of onomastics and the original Ehrensperger Report, I have attempted where possible to report on research and publication under a person’s name.
    [Show full text]
  • RASLAN 2015 Recent Advances in Slavonic Natural Language Processing
    RASLAN 2015 Recent Advances in Slavonic Natural Language Processing A. Horák, P. Rychlý, A. Rambousek (Eds.) RASLAN 2015 Recent Advances in Slavonic Natural Language Processing Ninth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2015 Karlova Studánka, Czech Republic, December 4–6, 2015 Proceedings Tribun EU 2015 Proceedings Editors Aleš Horák Faculty of Informatics, Masaryk University Department of Information Technologies Botanická 68a CZ-602 00 Brno, Czech Republic Email: [email protected] Pavel Rychlý Faculty of Informatics, Masaryk University Department of Information Technologies Botanická 68a CZ-602 00 Brno, Czech Republic Email: [email protected] Adam Rambousek Faculty of Informatics, Masaryk University Department of Information Technologies Botanická 68a CZ-602 00 Brno, Czech Republic Email: [email protected] This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the Czech Copyright Law, in its current version, and permission for use must always be obtained from Tribun EU. Violations are liable for prosecution under the Czech Copyright Law. Editors ○c Aleš Horák, 2015; Pavel Rychlý, 2015; Adam Rambousek, 2015 Typography ○c Adam Rambousek, 2015 Cover ○c Petr Sojka, 2010 This edition ○c Tribun EU, Brno, 2015 ISBN 978-80-263-0974-1 ISSN 2336-4289 Preface This volume contains the Proceedings of the Ninth Workshop on Recent Advances in Slavonic Natural Language Processing (RASLAN 2015) held on December 4th–6th 2015 in Karlova Studánka, Sporthotel Kurzovní, Jeseníky, Czech Republic.
    [Show full text]
  • Preprint Corpus Analysis in Forensic Linguistics
    Nini, A. (2020). Corpus analysis in forensic linguistics. In Carol A. Chapelle (ed), The Concise Encyclopedia of Applied Linguistics, 313-320, Hoboken: Wiley-Blackwell Corpus Analysis in Forensic Linguistics Andrea Nini Abstract This entry is an overview of the applications of corpus linguistics to forensic linguistics, in particular to the analysis of language as evidence. Three main areas are described, following the influence that corpus linguistics has had on them in recent times: the analysis of texts of disputed authorship, the provision of evidence in cases of trademark disputes, and the analysis of disputed meanings in criminal or civil cases. In all of these areas considerable advances have been made that revolve around the use of corpus data, for example, to study forensically realistic corpora for authorship analysis, or to provide naturally occurring evidence in cases of trademark disputes or determination of meaning. Using examples from real-life cases, the entry explains how corpus analysis is therefore gradually establishing itself as the norm for solving certain forensic problems and how it is becoming the main methodological approach for forensic linguistics. Keywords Authorship Analysis; Idiolect; Language as Evidence; Ordinary Meaning; Trademark This entry focuses on the use of corpus linguistics methods and techniques for the analysis of forensic data. “Forensic linguistics” is a very general term that broadly refers to two areas of investigation: the analysis of the language of the law and the analysis of language evidence in criminal or civil cases. The former, which often takes advantage of corpus methods, includes areas as wide as the study of the language of the judicial process and courtroom interaction (e.g., Tkačuková, 2015), the study of written legal documents (e.g., Finegan, 2010), and the investigation of interactions in police interviews (e.g., Carter, 2011).
    [Show full text]
  • Out of Style: Reanimating Stylistic Study in Composition and Rhetoric
    Utah State University DigitalCommons@USU All USU Press Publications USU Press 2008 Out of Style: Reanimating Stylistic Study in Composition and Rhetoric Paul Butler Follow this and additional works at: https://digitalcommons.usu.edu/usupress_pubs Part of the Rhetoric and Composition Commons Recommended Citation Butler, Paul, "Out of Style: Reanimating Stylistic Study in Composition and Rhetoric" (2008). All USU Press Publications. 162. https://digitalcommons.usu.edu/usupress_pubs/162 This Book is brought to you for free and open access by the USU Press at DigitalCommons@USU. It has been accepted for inclusion in All USU Press Publications by an authorized administrator of DigitalCommons@USU. For more information, please contact [email protected]. 6679-0_OutOfStyle.ai79-0_OutOfStyle.ai 5/19/085/19/08 2:38:162:38:16 PMPM C M Y CM MY CY CMY K OUT OF STYLE OUT OF STYLE Reanimating Stylistic Study in Composition and Rhetoric PAUL BUTLER UTAH STATE UNIVERSITY PRESS Logan, Utah 2008 Utah State University Press Logan, Utah 84322–7800 © 2008 Utah State University Press All rights reserved. ISBN: 978-0-87421-679-0 (paper) ISBN: 978-0-87421-680-6 (e-book) “Style in the Diaspora of Composition Studies” copyright 2007 from Rhetoric Review by Paul Butler. Reproduced by permission of Taylor & Francis Group, LLC., http:// www. informaworld.com. Manufactured in the United States of America. Cover design by Barbara Yale-Read. Library of Congress Cataloging-in-Publication Data Library of Congress Cataloging-in- Publication Data Butler, Paul, Out of style : reanimating stylistic study in composition and rhetoric / Paul Butler. p. cm. Includes bibliographical references and index.
    [Show full text]
  • CONTENT Unit I 2-18 Communication and Language INTRODUCTION to LINGUISTICS Major Linguists and Their Contribution
    1 CONTENT Unit I 2-18 Communication and Language INTRODUCTION TO LINGUISTICS Major Linguists and their Contribution Unit II 19-27 Phonology of English and Phonetics Mechanism of Speech Production Organs of speech The Respiratory System The Phonatory System The Articulatory System The Air-stream Mechanism Unit III 29-43 Phonology of English and Phonetics (Continued) Description and Classification of Vowels and Consonants Description of Vowel Sounds Description of Pure Vowels in English Description of Dipthongs or Glides in English Occurrence of Vowel Sounds in English Description of Consonant Sounds Unit IV 44-56 STRESS AND INTONATION Word Accent/Stress Intonation Unit V 57-70 SOCIOLINUISTICS Definitions of Sociolinguistics Language Variation or Varieties of Language Dialects Sociolect Idiolect Register Language Contact: Pidgin and Creole 2 UNIT-I INTRODUCTION TO LINGUISTICS 1.0 1.1 Linguistics/lɪŋˈɡwɪstɪks/ refers to the scientific study of language and its structure, including the study of grammar, syntax, and phonetics. Specific branches of linguistics include sociolinguistics, dialectology, psycholinguistics, computational linguistics, comparative linguistics, and structural linguistics. WHAT IS LINGUISTICS? Linguistics is defined as the scientific study of language.It is the systematic study of the elements of language and the principles governing their combination and organization. Linguistics provides for a rigorous experimentation with the elements or aspects of language that are actually in use by the speech community. It is based on observation and the data collected thereby from the users of the language, a scientific analysis is made by the investigator and at the end of it he comes out with a satisfactory explanation relating to his field of study.
    [Show full text]
  • A Pragmatic Stylistic Framework for Text Analysis
    International Journal of Education ISSN 1948-5476 2015, Vol. 7, No. 1 A Pragmatic Stylistic Framework for Text Analysis Ibrahim Abushihab1,* 1English Department, Alzaytoonah University of Jordan, Jordan *Correspondence: English Department, Alzaytoonah University of Jordan, Jordan. E-mail: [email protected] Received: September 16, 2014 Accepted: January 16, 2015 Published: January 27, 2015 doi:10.5296/ije.v7i1.7015 URL: http://dx.doi.org/10.5296/ije.v7i1.7015 Abstract The paper focuses on the identification and analysis of a short story according to the principles of pragmatic stylistics and discourse analysis. The focus on text analysis and pragmatic stylistics is essential to text studies, comprehension of the message of a text and conveying the intention of the producer of the text. The paper also presents a set of standards of textuality and criteria from pragmatic stylistics to text analysis. Analyzing a text according to principles of pragmatic stylistics means approaching the text’s meaning and the intention of the producer. Keywords: Discourse analysis, Pragmatic stylistics Textuality, Fictional story and Stylistics 110 www.macrothink.org/ije International Journal of Education ISSN 1948-5476 2015, Vol. 7, No. 1 1. Introduction Discourse Analysis is concerned with the study of the relation between language and its use in context. Harris (1952) was interested in studying the text and its social situation. His paper “Discourse Analysis” was a far cry from the discourse analysis we are studying nowadays. The need for analyzing a text with more comprehensive understanding has given the focus on the emergence of pragmatics. Pragmatics focuses on the communicative use of language conceived as intentional human action.
    [Show full text]
  • Stylistic and Linguistic Analysis of a Literary Text Using Systemic Functional Grammar+῍
    ῍ῌ Stylistic and Linguistic Analysis of a Literary Text Using Systemic Functional Grammar+῍ Noriko Iwamoto Ώ῝M.A.K.Halliday ῜ Ίῑ῔ῢ`Ὼῌtransitivity῍ ῜Ῡῤ Ί῭ ῜ῗ῏ῢ῍ `Ὼ῝ῷῸ῜&ῧ ῜῞Ῐ*ῗ῏ῢ+,Ὼ- ῌthe ideational function῍ ῤ῿ῶ0ῢ 1῜ῷῸ 3ῗ῏ῡῌ 6῾`ῌ 8῵`ῌ ;`ῌ <ῷ`ῌ ῼ >`ῌ ?@`Ῐῲ`Ίῑ῔ῢBCDEῒῠῢ῍ ῕ΰῠῤJKῪ ῦ ῜N P6Q Ί῭ῌ ῲP6῜S῾ῌ ῼ>TUῌ P6V῜W ῺXῌ YῠΊZΰῠῤ[ῖ] ῜Q῕ῐῘ0ῢ`῰+ῤῬc ῑd ῟Ὸf῜g ῒῠhῠῒΊdῐῘij῍ ῕῜dῐῚ ῝ῌ l ῳῨῥῦpῤq1Ῐῖῌ ZΰΎ῱῜Ῠῥῦp ῌ uῌΌ´ ῚῙ῍ Ί ῭΅ῗ῏ῢ῕Ῐῤz{ῌ ῕ΰ|ῗ}῎῜ῘYΰ ῖΐῬc Ῐ Ῐ῜ῤῴ1ῢ ῜ῗ῏ῢ῍ Key words : functional grammar ; transitivity ; stylistics ; narrative ; gender +. Introduction This article explores the relationship between linguistic structures and socially constructed meaning in a narrative text. By employing Halliday’s transitivity framework, the article attempts ῍ῌ to reveal the ideology and power relations that underpin a literary text from a semantico-grammatical point of view. This study seeks common ground where systemic grammar and narrative, which have long been considered separate disciplines, can meet. +. + Narrative as a linguistically constructed world We humans beings often put our experiences and thoughts into stories. Narrative refers to storytelling, both written and spoken, including oral narrative. A narrative constructs a world using various linguistic resources. A narrative is a microcosm of how people act, feel, and think, and what they value as an individual or as a member of a community or institution. There are various methods for, and theories of, narrative analysis and its presentation. One of the most widely adopted is that of Labov and Waletsky ῌ+301῍, who presented structural stages for narrative analysis that have been widely accepted.
    [Show full text]
  • Identifying Idiolect in Forensic Authorship Attribution: an N-Gram Textbite Approach Alison Johnson & David Wright University of Leeds
    Identifying idiolect in forensic authorship attribution: an n-gram textbite approach Alison Johnson & David Wright University of Leeds Abstract. Forensic authorship attribution is concerned with identifying authors of disputed or anonymous documents, which are potentially evidential in legal cases, through the analysis of linguistic clues left behind by writers. The forensic linguist “approaches this problem of questioned authorship from the theoretical position that every native speaker has their own distinct and individual version of the language [. ], their own idiolect” (Coulthard, 2004: 31). However, given the diXculty in empirically substantiating a theory of idiolect, there is growing con- cern in the Veld that it remains too abstract to be of practical use (Kredens, 2002; Grant, 2010; Turell, 2010). Stylistic, corpus, and computational approaches to text, however, are able to identify repeated collocational patterns, or n-grams, two to six word chunks of language, similar to the popular notion of soundbites: small segments of no more than a few seconds of speech that journalists are able to recognise as having news value and which characterise the important moments of talk. The soundbite oUers an intriguing parallel for authorship attribution studies, with the following question arising: looking at any set of texts by any author, is it possible to identify ‘n-gram textbites’, small textual segments that characterise that author’s writing, providing DNA-like chunks of identifying ma- terial? Drawing on a corpus of 63,000 emails and 2.5 million words written by 176 employees of the former American energy corporation Enron, a case study approach is adopted, Vrst showing through stylistic analysis that one Enron em- ployee repeatedly produces the same stylistic patterns of politely encoded direc- tives in a way that may be considered habitual.
    [Show full text]
  • An Introduction to Forensic Linguistics: Language in Evidence
    An Introduction to Forensic Linguistics ‘Seldom do introductions to any fi eld offer such a wealth of information or provide such a useful array of exercise activities for students in the way that this book does. Coulthard and Johnson not only provide their readers with extensive examples of the actual evidence used in the many law cases described here but they also show how the linguist’s “toolkit” was used to address the litigated issues. In doing this, they give valuable insights about how forensic linguists think, do their analyses and, in some cases, even testify at trial.’ Roger W. Shuy, Distinguished Research Professor of Linguistics, Emeritus, Georgetown University ‘This is a wonderful textbook for students, providing stimulating examples, lucid accounts of relevant linguistic theory and excellent further reading and activities. The foreign language of law is also expertly documented, explained and explored. Language as evidence is cast centre stage; coupled with expert linguistic analysis, the written and spoken clues uncovered by researchers are foregrounded in unfolding legal dramas. Coulthard and Johnson have produced a clear and compelling work that contains its own forensic linguistic puzzle.’ Annabelle Mooney, Roehampton University, UK From the accusation of plagiarism surrounding The Da Vinci Code, to the infamous hoaxer in the Yorkshire Ripper case, the use of linguistic evidence in court and the number of linguists called to act as expert witnesses in court trials has increased rapidly in the past fi fteen years. An Introduction to Forensic Linguistics provides a timely and accessible introduction to this rapidly expanding subject. Using knowledge and experience gained in legal settings – Coulthard in his work as an expert witness and Johnson in her work as a West Midlands police offi cer – the two authors combine an array of perspectives into a distinctly unifi ed textbook, focusing throughout on evidence from real and often high profi le cases including serial killer Harold Shipman, the Bridgewater Four and the Birmingham Six.
    [Show full text]