Forensic Linguistics

Total Page:16

File Type:pdf, Size:1020Kb

Forensic Linguistics ‘The substantially revised edition of this pioneering forensic linguistics textbook combines the formidable expertise and wisdom of two of the field’s most respected leaders with the fresh perspective of one of its bright new scholars. Students will love the fascinating illustrative data from a large number of real cases from five continents.’ Diana Eades, University of New England, Australia Praise for the First Edition ‘Seldom do introductions to any field offer such a wealth of information or provide such a useful array of exercise activities for students in the way that this book does. Coulthard and Johnson not only provide their readers with extensive examples of the actual evidence used in the many law cases described here but they also show how the linguist’s “toolkit” was used to address the litigated issues. In doing this, they give valuable insights about how forensic linguists think, do their analyses and, in some cases, even testify at trial.’ Roger W. Shuy, Distinguished Research Professor of Linguistics, Emeritus, Georgetown University, USA ‘This is a wonderful textbook for students, providing stimulating examples, lucid accounts of relevant linguistic theory and excellent further reading and activities. The foreign language of law is also expertly documented, explained and explored. Language as evidence is cast centre stage; coupled with expert linguistic analysis, the written and spoken clues uncovered by researchers are foregrounded in unfolding legal dramas. Coulthard and Johnson have produced a clear and compelling work that contains its own forensic linguistic puzzle.’ Annabelle Mooney, Roehampton University, UK This page intentionally left blank An Introduction to Forensic Linguistics An Introduction to Forensic Linguistics: Language in Evidence has established itself as the essential textbook written by leading authorities in this expanding field. The second edition of this bestselling textbook begins with a new introduc- tion and continues in two parts. Part One deals with the language of the legal process, and begins with a substan- tial new chapter exploring key theoretical and methodological approaches. In four updated chapters it goes onto cover the language of the law, initial calls to the emergency services, police interviewing, and courtroom discourse. Part Two looks at language as evidence, with substantially revised and updated chapters on the following key topics: • the work of the forensic linguist • forensic phonetics • authorship attribution • the linguistic investigation of plagiarism • the linguist as expert witness. The authors combine an array of perspectives on forensic linguistics, using knowledge and experience gained in legal settings – Coulthard in his work as an expert witness for cases such as the Birmingham Six and the Derek Bentley appeal, and Johnson as a former police officer. Research tasks, further reading, web links, and a new conclusion ensure that this remains the core textbook for courses in forensic linguistics and language and the law. A glossary of key terms is also available at https://www.routledge.com/products/9781138641716 and on the Routledge Language and Communication Portal. Malcolm Coulthard is Emeritus Professor of Forensic Linguistics at Aston University, UK, and Visiting Professor at the Federal University of Santa Catarina, Brazil. He is the co-editor of the recently launched international journal Language and Law, Linguagem e Direito and author of many books including An Introduction to Discourse Analysis (1985). Alison Johnson is Lecturer in English Language at the University of Leeds, UK. She is the co-editor (with Malcolm Coulthard) of The Routledge Handbook of Forensic Linguistics (2010) and an editor of The International Journal of Speech, Language and the Law. David Wright is Lecturer in Linguistics at Nottingham Trent University, UK, and Reviews editor of Language and Law, Linguagem e Direito. An Introduction to Forensic Linguistics Language in Evidence Second edition Malcolm Coulthard, Alison Johnson and David Wright Second edition published 2017 by Routledge 2 Park Square, Milton Park, Abingdon, Oxon OX14 4RN and by Routledge 711 Third Avenue, New York, NY 10017 Routledge is an imprint of the Taylor & Francis Group, an informa business © 2017 Malcolm Coulthard, Alison Johnson and David Wright The right of Malcolm Coulthard, Alison Johnson and David Wright to be identified as authors of this work has been asserted by them in accordance with sections 77 and 78 of the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this book may be reprinted or reproduced or utilised in any form or by any electronic, mechanical, or other means, now known or hereafter invented, including photocopying and recording, or in any information storage or retrieval system, without permission in writing from the publishers. Trademark notice: Product or corporate names may be trademarks or registered trademarks, and are used only for identification and explanation without intent to infringe. First edition published by Routledge 2007 British Library Cataloguing-in-Publication Data A catalogue record for this book is available from the British Library Library of Congress Cataloging-in-Publication Data Names: Coulthard, Malcolm, author. | Johnson, Alison, 1959- author. | Wright, David, 1948 December 10-Title: An Introduction to forensic linguistics/ by Malcolm Coulthard, Alison Johnson and David Wright. Description: Abingdon, Oxon ; New York, NY ; Routledge, [2016] | Includes bibliographical references and index. Identifiers: LCCN 2016007458 | ISBN 9781138641709 (hardback) | ISBN 9781138641716 (pbk.) | ISBN 9781315630311 (ebook) Subjects: LCSH: Forensic linguistics. Classification: LCC K2287.5 C68 2016 | DDC 363.25–dc23LC record available at http://lccn.loc.gov/2016007458 ISBN: 978-1-138-64170-9 (hbk) ISBN: 978-1-138-64171-6 (pbk) ISBN: 978-1-3156-3031-1 (ebk) Typeset in Times by Cenveo Publisher Services This book is affectionately dedicated to our families This page intentionally left blank Contents List of illustrations xiii Acknowledgements xv 1 Introduction 1 Legal words, murder, plagiarism, trademarks and a voice hoax 2 Who is this book for? 3 Organisation of the book 4 Reading and research tasks and how they function 4 The second edition 5 PART I The Language of the Legal Process 7 2 Critical, theoretical, and methodological approaches to language in legal settings 9 Introduction – Powerful professionals and ordinary people 9 Sociolinguistics and forensic linguistics 14 Pragmatics and legal language 19 (Critical) Discourse and Conversation Analysis 22 Corpus linguistics 26 Conclusion 29 Further reading 30 Research task 30 3 The language of the law 31 Introduction 31 Legal style and register 33 Ordinary and special meanings 44 On applying the law 46 Conclusion 48 x Contents Further reading 49 Research tasks 49 4 Emergency service calls and police interviewing 51 Collecting evidence in first encounters with witnesses and suspects 51 Introduction 51 First encounters – calls to the emergency services 52 Active listening in police negotiations when making an arrest 56 Police interviews and statements 58 Vulnerable witnesses – on interviewing children and rape victims 67 Context, intertextuality and audience design 70 Conclusion 72 Further reading 73 Research tasks 74 5 Trial discourse 75 Introduction – into the courtroom 76 The trial as a complex genre 77 Trial genres – from jury selection to deliberation and verdict 80 Examination and cross-examination of witnesses 82 Narrative in the courtroom 94 The expert witness in the courtroom 96 Children in the courtroom 97 Conclusion 100 Further reading 101 Research tasks 101 PART II Language as Evidence 103 6 The work of the forensic linguist 105 Introduction 105 Morphological meaning and phonetic similarity 105 Syntactic complexity 108 Lexico-grammatical ambiguity 109 Lexical meaning 110 Pragmatic meaning 112 The recording of interaction in written form – police interview notes 117 Narrative analysis of a disputed statement 120 The challenges for non-native speakers 122 Conclusion 127 Contents xi Further reading 128 Research tasks 128 7 Forensic phonetics 129 The work of the forensic phonetician 129 Transcription and disputed utterances 130 Analysing the human voice 133 Speaker profiling 135 Speaker comparison 138 Naïve speaker recognition, earwitnesses and voice parades 145 Conclusion 149 Further reading 149 Research tasks 150 8 Authorship attribution 151 Introduction 151 A brief history of authorship attribution 152 Linguistic variation and style markers 155 Consistency and distinctiveness 157 The Jenny Nicholl case 158 Combining stylistics and statistics: The Amanda Birks case 160 Corpus methods in authorship attribution 162 Conclusion 171 Further reading 171 Research tasks 171 9 On textual borrowing 174 Introduction 174 The history of plagiarism 175 Universities and plagiarism 176 Plagiarism and translation 182 Do people repeat themselves? 184 The evidential value of single identical strings 189 Coda 190 Further reading 191 Research tasks 191 10 The linguist as expert witness 193 On being an expert witness 193 Admissible evidence 194 Expressing opinions 196 xii Contents Measuring success statistically 206 Consulting and testifying as tour guides 210 Further reading 213 Research task 213 11 Conclusion 215 Contributing to society 215 Codes of conduct 216 New developments and future directions 217 The next generation 219 Closing statement 220 References 222 Index 246 List of illustrations Figures 2.1âConcordance
Recommended publications
  • Newcastle University Eprints
    Newcastle University ePrints Knight D, Adolphs S, Carter R. CANELC: constructing an e-language corpus. Corpora 2014, 9(1), 29-56. Copyright: The definitive version of this article, published by Edinburgh University Press, 2014, is available at: http://dx.doi.org/10.3366/cor.2014.0050 Always use the definitive version when citing. Further information on publisher website: www.euppublishing.com Date deposited: 23-07-2014 Version of file: Author Accepted Manuscript This work is licensed under a Creative Commons Attribution-NonCommercial 3.0 Unported License ePrints – Newcastle University ePrints http://eprint.ncl.ac.uk CANELC – Constructing an e-language corpus ___________________________________________________________________________ Dawn Knight1, Svenja Adolphs2 and Ronald Carter2 This paper reports on the construction of CANELC: the Cambridge and Nottingham e- language Corpus 3 . CANELC is a one million word corpus of digital communication in English, taken from online discussion boards, blogs, tweets, emails and SMS messages. The paper outlines the approaches used when planning the corpus: obtaining consent; collecting the data and compiling the corpus database. This is followed by a detailed analysis of some of the patterns of language used in the corpus. The analysis includes a discussion of the key words and phrases used as well as the common themes and semantic associations connected with the data. These discussions form the basis of an investigation of how e-language operates in both similar and different ways to spoken and written records of communication (as evidenced by the BNC - British National Corpus). Keywords: Blogs, Tweets, SMS, Discussion Boards, e-language, Corpus Linguistics 1. Introduction Communication in the digital age is a complex many faceted process involving the production and reception of linguistic stimuli across a multitude of platforms and media types (see Boyd and Heer, 2006:1).
    [Show full text]
  • RASLAN 2015 Recent Advances in Slavonic Natural Language Processing
    RASLAN 2015 Recent Advances in Slavonic Natural Language Processing A. Horák, P. Rychlý, A. Rambousek (Eds.) RASLAN 2015 Recent Advances in Slavonic Natural Language Processing Ninth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2015 Karlova Studánka, Czech Republic, December 4–6, 2015 Proceedings Tribun EU 2015 Proceedings Editors Aleš Horák Faculty of Informatics, Masaryk University Department of Information Technologies Botanická 68a CZ-602 00 Brno, Czech Republic Email: [email protected] Pavel Rychlý Faculty of Informatics, Masaryk University Department of Information Technologies Botanická 68a CZ-602 00 Brno, Czech Republic Email: [email protected] Adam Rambousek Faculty of Informatics, Masaryk University Department of Information Technologies Botanická 68a CZ-602 00 Brno, Czech Republic Email: [email protected] This work is subject to copyright. All rights are reserved, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting, reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication or parts thereof is permitted only under the provisions of the Czech Copyright Law, in its current version, and permission for use must always be obtained from Tribun EU. Violations are liable for prosecution under the Czech Copyright Law. Editors ○c Aleš Horák, 2015; Pavel Rychlý, 2015; Adam Rambousek, 2015 Typography ○c Adam Rambousek, 2015 Cover ○c Petr Sojka, 2010 This edition ○c Tribun EU, Brno, 2015 ISBN 978-80-263-0974-1 ISSN 2336-4289 Preface This volume contains the Proceedings of the Ninth Workshop on Recent Advances in Slavonic Natural Language Processing (RASLAN 2015) held on December 4th–6th 2015 in Karlova Studánka, Sporthotel Kurzovní, Jeseníky, Czech Republic.
    [Show full text]
  • Actus Reus and Mens Rea of Murder  Understand Coke’S Definition of Murder  Explain How the Definition of Murder Has Changed and Evolved
    Criminal Law [G153] OOFFENCES AAGAINST THE PPERSON:: MMUURRDDEERR By the end of this unit, you will be able to: Explain the actus reus and mens rea of murder Understand Coke’s definition of murder Explain how the definition of murder has changed and evolved. You will also be able to: Critically evaluate the current law, and possible reforms. HOMEWORK During this unit, you will be set the following. In completing homework, you will be expected to do your own research and supplement your own notes. This is essential to show understanding. 1. How far does the case of Kiranjit Ahluwalia highlight the problems with the current law on murder and voluntary manslaughter. In your opinion, what should she have been liable for and why, and how did the law respond and why. END OF UNIT ASSESSMENT As with AS, you will sit a DRAG test but not until we have looked at voluntary and involuntary manslaughter as well. Remember, you will have the choice to answer 20 out of 60 questions, reflecting your understanding and knowledge of the subject. At the end of each unit on manslaughter, we will look at a section B question, but for now you will not complete an essay question on the subject (hmmm... think ahead to mocks!) 1 Criminal Law [G153] Murder Murder is generally accepted as one of the worst crimes imaginable. It is a common law crime, which means that the courts are able to develop the definition and the crime itself through case law using ……………………. However, this can also be a problem because it means that the definition is constantly changing and it can be a little tricky to work out the exact meaning of the law.
    [Show full text]
  • Children Online: a Survey of Child Language and CMC Corpora
    Children Online: a survey of child language and CMC corpora Alistair Baron, Paul Rayson, Phil Greenwood, James Walkerdine and Awais Rashid Lancaster University Contact author: Paul Rayson Email: [email protected] The collection of representative corpus samples of both child language and online (CMC) language varieties is crucial for linguistic research that is motivated by applications to the protection of children online. In this paper, we present an extensive survey of corpora available for these two areas. Although a significant amount of research has been undertaken both on child language and on CMC language varieties, a much smaller number of datasets are made available as corpora. Especially lacking are corpora which match requirements for verifiable age and gender metadata, although some include self-reported information, which may be unreliable. Our survey highlights the lack of corpus data available for the intersecting area of child language in CMC environments. This lack of available corpus data is a significant drawback for those wishing to undertake replicable studies of child language and online language varieties. Keywords: child language, CMC, survey, corpus linguistics Introduction This survey is part of a wider investigation into child language and computer-mediated communication (CMC)1 corpora. Its aim is to assess the availability of relevant corpora which can be used to build representative samples of the language of children online. The Isis project2, of which the survey is a part, carries out research in the area of online child protection. Corpora of child and CMC language serve two main purposes in our research. First, they enable us to build age- and gender-based standard profiles of such language for comparison purposes.
    [Show full text]
  • OPEC Ministers Adn^It Failure
    24 - THE HERALD. Thuni.. Aug. 20. IW Plenty of cents here, Best ideas may ■ M and service their, goods. Ruppman does not .,.page 16 numbers in pribited advertising but Ruppman N E W Y O R K (U P I) - The "kOO ” telepbone said it must be mbauntial. sell products of its own. ___ . line systeifi is a wonderful aid to nuirketiiig Rm pm an’s 244ioor 800 n ^ b w but not, everywhere He said the use of 800 numbers In but, like everything else revoluUonary, it has calledWaloguo MarketWg. When a caU com­ marketing stiil is growing i|t an astoniming produced some unforeseen problems. es in the c l ^ l r s t asks, “ What Is your postal ppce despite softness in the general economic By Barbara Richmond ^ bank, a savings bank, is different For one, says Charles Riippman, bead of clinute. His company alone will handle two from the com mercial banks. He said Ruppman Marketing Services of Peoria, 111., number is ^ h e d into ^ Herald Reporter million such toll-free calls for infWmatlon peoploisave coins in banks at home if you advertise an 800 number on radio or puter the names and addresses o f the cIosMt While local banken aren't exactly and turn them into the savings bank. about specific prodgeU o r services this year Cool tonight; Manchester, Conn. television, the roof m ay faU in on you. driers for the products or ■iiijpng "Penniet from Heaven,” He said if other banka run short of and thousands o f companies are using 800- “ You just never know how many people are customer asked about appear on the cterks sunny Saturday ttere doesn’t seem to be a dearth of pennies his bank tries to help them going to pick up their phones in the nest few number lines.
    [Show full text]
  • Email in the Australian National Corpus
    Email in the Australian National Corpus Andrew Lampert CSIRO ICT Centre, North Ryde, Australia and Macquarie University, Australia 1. Introduction Email is not only a distinctive and important text type but one that touches the lives of most Australians. In 2008, 79.4% of Australians used the internet, ahead of all Asia Pacific countries except New Zealand (Organisation for Economic Co-operation and Development, 2008). A recent Nielsen study suggests that almost 98% of Australian internet users have sent or received email messages in the past 4 weeks (Australian Communications and Media Authority, 2008), making email the most used application on the internet by a significant margin. It seems logical to embrace a communication medium used by the vast majority of Australians when considering the text types and genres that should be included in the Australian National Corpus. Existing corpora such as the British National Corpus (2007) and the American National Corpus (Macleod, Ide, & Grishman, 2000) provide many insights and lessons for the creation and curation of the Australian National Corpus. Like many existing corpora, the British National Corpus and the American National Corpus contain language data drawn from a wide variety of text types and genres, including telephone dialogue, novels, letters, transcribed face-to-face dialogue, technical books, newspapers, web logs, travel guides, magazines, reports, journals, and web data. Notably absent from this list are email messages. In many respects, email has replaced more traditional forms of communication such as letters and memoranda, yet this reality is not reflected in existing corpora. This lack of email text is a significant gap in existing corpus resources.
    [Show full text]
  • C 2014 Gourab Kundu DOMAIN ADAPTATION with MINIMAL TRAINING
    c 2014 Gourab Kundu DOMAIN ADAPTATION WITH MINIMAL TRAINING BY GOURAB KUNDU DISSERTATION Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2014 Urbana, Illinois Doctoral Committee: Professor Dan Roth, Chair Professor Chengxiang Zhai Assistant Professor Julia Hockenmaier Associate Professor Hal DaumeIII, University of Maryland Abstract Machine learning models trained on labeled data of a domain degrade performance severely when tested on a different domain. Traditional approaches deal with this problem by training a new model for every new domain. In Natural language processing, top performing systems often use multiple interconnected models and therefore training all of them for every new domain is computationally expensive. This thesis is a study on how to adapt to a new domain, using the system trained on a different domain, avoiding the cost of retraining. This thesis identifies two key ingredients for adaptation without training: broad coverage re- sources and constraints. We show how resources like Wikipedia, VerbNet, WordNet that contain comprehensive coverage of entities, semantic roles and words in English can help a model adapt to a new domain. For the task of semantic role labeling, we show that in the decision phase, we can replace a linguistic unit (e.g. verb, word) with another equivalent linguistic unit residing in the same cluster defined in these resources (e.g. VerbNet, WordNet) such that after replacement, text becomes more like text on which the model was trained. We show that the model's output is more accurate on the transformed text than on original text.
    [Show full text]
  • Shail, Robert, British Film Directors
    BRITISH FILM DIRECTORS INTERNATIONAL FILM DIRECTOrs Series Editor: Robert Shail This series of reference guides covers the key film directors of a particular nation or continent. Each volume introduces the work of 100 contemporary and historically important figures, with entries arranged in alphabetical order as an A–Z. The Introduction to each volume sets out the existing context in relation to the study of the national cinema in question, and the place of the film director within the given production/cultural context. Each entry includes both a select bibliography and a complete filmography, and an index of film titles is provided for easy cross-referencing. BRITISH FILM DIRECTORS A CRITI Robert Shail British national cinema has produced an exceptional track record of innovative, ca creative and internationally recognised filmmakers, amongst them Alfred Hitchcock, Michael Powell and David Lean. This tradition continues today with L GUIDE the work of directors as diverse as Neil Jordan, Stephen Frears, Mike Leigh and Ken Loach. This concise, authoritative volume analyses critically the work of 100 British directors, from the innovators of the silent period to contemporary auteurs. An introduction places the individual entries in context and examines the role and status of the director within British film production. Balancing academic rigour ROBE with accessibility, British Film Directors provides an indispensable reference source for film students at all levels, as well as for the general cinema enthusiast. R Key Features T SHAIL • A complete list of each director’s British feature films • Suggested further reading on each filmmaker • A comprehensive career overview, including biographical information and an assessment of the director’s current critical standing Robert Shail is a Lecturer in Film Studies at the University of Wales Lampeter.
    [Show full text]
  • Perceptions of Safety, Fear and Social Change in the Public's Prodeath
    Perceptions of safety, fear and social change in the public's pro-death penalty discourse in mid twentieth-century Britain Article (Accepted Version) Seal, Lizzie (2017) Perceptions of safety, fear and social change in the public’s pro-death penalty discourse in mid twentieth-century Britain. Crime, History and Societies, 21 (1). pp. 13-34. ISSN 1422-0857 This version is available from Sussex Research Online: http://sro.sussex.ac.uk/id/eprint/68090/ This document is made available in accordance with publisher policies and may differ from the published version or from the version of record. If you wish to cite this item you are advised to consult the publisher’s version. Please see the URL above for details on accessing the published version. Copyright and reuse: Sussex Research Online is a digital repository of the research output of the University. Copyright and all moral rights to the version of the paper presented here belong to the individual author(s) and/or other copyright owners. To the extent reasonable and practicable, the material made available in SRO has been checked for eligibility before being made available. Copies of full text items generally can be reproduced, displayed or performed and given to third parties in any format or medium for personal research or study, educational, or not-for-profit purposes without prior permission or charge, provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content is not changed in any way. http://sro.sussex.ac.uk Perceptions of safety, fear and social change in the public’s pro-death penalty discourse in mid twentieth-century Britain.
    [Show full text]
  • Albert Pierrepoint and the Cultural Persona of the Twentieth- Century Hangman
    CORE Metadata, citation and similar papers at core.ac.uk Provided by Sussex Research Online Albert Pierrepoint and the cultural persona of the twentieth- century hangman Article (Accepted Version) Seal, Lizzie (2016) Albert Pierrepoint and the cultural persona of the twentieth-century hangman. Crime, Media, Culture, 12 (1). pp. 83-100. ISSN 1741-6590 This version is available from Sussex Research Online: http://sro.sussex.ac.uk/60124/ This document is made available in accordance with publisher policies and may differ from the published version or from the version of record. If you wish to cite this item you are advised to consult the publisher’s version. Please see the URL above for details on accessing the published version. Copyright and reuse: Sussex Research Online is a digital repository of the research output of the University. Copyright and all moral rights to the version of the paper presented here belong to the individual author(s) and/or other copyright owners. To the extent reasonable and practicable, the material made available in SRO has been checked for eligibility before being made available. Copies of full text items generally can be reproduced, displayed or performed and given to third parties in any format or medium for personal research or study, educational, or not-for-profit purposes without prior permission or charge, provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content is not changed in any way. http://sro.sussex.ac.uk Introduction Despite his symbolic importance, the figure of the English hangman remains largely ignored by scholars.1 In an article dating from the mid-s, ‘oi : oted that it is supisig that geate attetio has ot ee dieted to the eeutioe ad this oseatio eais petiet.
    [Show full text]
  • The Teddy Boy Subculture in Britain Kristýna Slepičková
    Západočeská univerzita v Plzni Fakulta filozofická Bakalářská práce The Teddy Boy Subculture in Britain Kristýna Slepičková Plzeň 2020 Západočeská univerzita v Plzni Fakulta filozofická Katedra anglického jazyka a literatury Studijní program Filologie Studijní obor Cizí jazyky pro komerční praxi Kombinace angličtina – němčina Bakalářská práce The Teddy Boy Subculture in Britain Kristýna Slepičková Vedoucí práce: PhDr. Alice Tihelková, Ph.D. Katedra anglického jazyka a literatury Fakulta filozofická Západočeské univerzity v Plzni Plzeň 2020 Prohlašuji, že jsem práci zpracovala samostatně a použila jen uvedených pramenů a literatury. Plzeň, květen 2020 ……………………… I would like to thank PhDr. Alice Tihelková, PhD. for her invaluable advice on the content and style of the thesis, in particular on the selection of adequate literature and resources. I would also like to express my gratitude to my family for their immense support. Table of Contents Introduction ...................................................................................................... 1 1. Great Britain in the 1950s ......................................................................... 3 1.1. Introduction ......................................................................................... 3 1.2. The Remains of the Second World War ............................................ 3 1.3. Military Operations.............................................................................. 4 1.3.1. The Suez Crisis ......................................................................................
    [Show full text]
  • CS224N Section 3 Corpora, Etc. Pi-Chuan Chang, Friday, April 25
    CS224N Section 3 Corpora, etc. Pi-Chuan Chang, Friday, April 25, 2008 Some materials borrowed from Bill’s notes in 2006: http://www.stanford.edu/class/cs224n/handouts/cs224n-section3-corpora.txt Proposal for Final project due two weeks later (Wednesday, 5/7) Look for interesting topics? Æ go through the syllabus : http://www.stanford.edu/class/cs224n/syllabus.html Æ final projects from previous years : http://nlp.stanford.edu/courses/cs224n/ Æ what data / tools are out there? Æ collect your own dataset 1. LDC (Linguistic Data Consortium) o http://www.ldc.upenn.edu/Catalog/ 2. Corpora@Stanford o http://www.stanford.edu/dept/linguistics/corpora/ o Corpus TA: Anubha Kothari o The inventory: http://www.stanford.edu/dept/linguistics/corpora/inventory.html o Some of them are on AFS; some of them are available on DVD/CDs in the linguistic department 3. http://nlp.stanford.edu/links/statnlp.html 4. ACL Anthology o http://aclweb.org/anthology-new/ 5. Various Shared Tasks o CoNLL (Conference on Computational Natural Language Learning) 2006: Multi-lingual Dependency Parsing 2005, 2004: Semantic Role Labeling 2003, 2002: Language-Independent Named Entity Recognition 2001: Clause Identification 2000: Chunking 1999: NP bracketing o Machine Translation shared tasks: 2008, 2007, 2006, 2005 o Pascal Challenges (RTE, etc) o TREC (IR) o Senseval (Word sense disambiguation) o … Parsing Most widely-used : Penn Treebank 1. English: (LDC99T42) Treebank-3 (see Bill’s notes) 2. In many different languages, like Chinese (CTB6.0), Arabic Other parsed corpora 1. Switchboard (spoken) 2. German: NEGRA, TIGER, Tueba-D/Z • There’s an ACL workshop on German Parsing this year… 3.
    [Show full text]