Introduction
Total Page:16
File Type:pdf, Size:1020Kb
Introduction Ever since corpus linguistics entered the mainstream, it has become increasingly difficult to keep track of its most recent developments due to the sheer volume of corpus-based, corpus-driven or corpus-informed studies. For twelve years, the conferences known as PALC (Practical Applications in Language and Computers) and organized at the University of ód in Poland have served the international community of corpus and computational linguists by providing a useful forum for the exchange of views and ideas on how corpora and computational tools can be effectively employed to explore and advance our understanding of language. The conferences and the ensuing volumes have attempted to reflect the widening scope and perspectives on language and computers. The present volume is no different in that it documents new developments and explorations in these areas encompassing an array of topics and themes, ranging from national corpora and corpus tools through cognitive processes, discourse and ideology, academic discourse, translation, and lexicography to language teaching and learning. In keeping with the PALC tradition, it is our policy to publish contributions from both seasoned researchers as well as from colleagues who are recently initiated members of the corpus linguistics community. The contributions are drawn from papers presented at the 7th Practical Applications in Language and Computers PALC conference held at the University of ód in 2009. The plenary speakers were Khurshid Ahmad (Trinity College, Dublin), Mark Davies (Brigham Young University), Ken Hyland (then at University of London), Terttu Nevalainen (University of Helsinki) and Margaret Rogers (University of Surrey). This volume is divided into nine Parts, each Part being further subdivided into chapters. Part One NATIONAL CORPORA provides overviews of three national corpora. First, Mark Davies (Brigham Young University) in his plenary paper Semantically-Based, Learner-Oriented Queries with the 400+ Million Word Corpus of Contemporary American English demonstrates the unique ways in which language learners can use the new Corpus of Contemporary American English to carry out and use semantically-based queries. František ermák (Charles University, Prague) in The Case of the Czech National Corpus: Its Design and History introduces readers to the methodological issues of data acquisition and corpus design involved in creating the Czech National Corpus. The next two contributions are related to the National Corpus of Polish. Rafa L. Górski (Institute of Polish Language, Polish Academy of Sciences) in The Design of the National Corpus of Polish describes the proposed design of the National Corpus of Polish arguing that the corpus should reflect the perception of the language by the Polish linguistic community. Finally, Adam 6 Introduction Przepiórkowski (Polish Academy of Sciences) and Piotr Bański (University of Warsaw) in XML Text Interchange Format in the National Corpus of Polish describe and provide rationale for employing the XML encoding of texts within the National Corpus of Polish. Part Two CORPUS TOOLS, INFORMATION AND TERMINOLOGY EXTRACTION comprises eight contributions. The first paper of this part is a contribution from Natalia Kotsyba (Warsaw University), Andrij Mykulyak (A. Sołtan Institute for Nuclear Studies, Warsaw), Igor V. Shevchenko (ULIF NANU, Kyiv) UGTag: Morphological Analyzer and Tagger for Ukrainian Language in which the authors describe the UGTag, a programme for morphological analysis and tagging of Ukrainian texts developed within the Polish-Ukrainian Parallel Corpus (PolUKR) project to support morphosyntactic annotation for the Ukrainian part of the corpus. The authors of the next seven papers which follow are Michal Křen (Charles University, Prague) and Martina Waclawičová (Charles University, Prague), who in their contribution Database Framework for a Distributed Spoken Data Collection Project, look at the main features of database system that is used in the Czech National Corpus for collecting recordings and transcriptions of authentic spoken Czech used in informal situations, Adam Przepiórkowski (Institute of Computer Science, Polish Academy of Sciences) and Grzegorz Murzynowski: Manual Annotation of the National Corpus of Polish with Anotatornia present the procedure of the manual annotation of a 1-million-word subcorpus of the National Corpus of Polish using a purpose-built tool, Anotatornia, Danuta Karwańska (University of Warsaw) and Adam Przepiórkowski in their paper On the Evaluation of Two Polish Taggers discuss the results of the comparison of two Polish taggers and the implications they carry for future taggers of Polish, especially the tagger, developed within the National Corpus of Polish, Tomas By (Centro de Linguística da Universidade Nova de Lisboa) in Additional Comments on the Prolog Version of the Tiger Dependency Bank updates information and provides more details on some of the methods used to verify that the word order disambiguation produces accurate results, Marcin Miłkowski (Polish Academy of Sciences) in his contribution Automating Rule Generation for Grammar Checkers decribes several approaches to automatic or semi-automatic development of symbolic rules for grammar checkers from the information contained in the corpora, Piotr Pęzik (University of Łódź) in Providing Corpus Feedback for Translators with the PELCRA Search Engine for NKJP introduces the PELCRA search engine for the National Corpus of Polish (PSEN) focusing on the usefulness of the tool in verifying the phraseology of translated texts, and finally Ewelina Kwiatek, Pius ten Hacken (Swansea University) in Evaluating the Efficiency of MultiTerm Extract for Extraction of English and Polish Terms investigate the efficiency of MultiTerm Extract, a component of SDL Trados 2007 to extract terms from English and Polish specialized corpora. Introduction 7 Part Three CORPUS-BASED LANGUAGE STUDIES includes the papers by Janusz Badio (University of Łódź), who in What’s in a “Duck”? A Corpus- based Study of Salience and Attention within Animal-Related, Denominal Verb, attempts to find evidence for the claim that three processes of indexing, deriving affordances and meshing are used simultaneously in language understanding, Łukasz Grabowski (Opole University) in Application of Parallel Corpora in Typological Investigations: the Case of Using English-Russian Parallel Subcorpus of the National Russian Corpus in Typological Study of Motion Verbs, uses the English-Russian Parallel Subcorpus of the National Russian Corpus to explore the applications of this type of corpora in typological investigations, Milena Herbal-Jezierska (University of Warsaw) in Corpus- based Morphology in the Czech Language looks at various ways of using corpora in morpohological (especially inflexion) research in Czech; finally María Pérez Blanco (University of León) in her contribution The Language of Evaluation in English and Spanish Editorials: A Corpus-based Study discusses some of the linguistic features of newspaper editorials in English and Spanish. Part Four COGNITIVE LINGUISTICS (Barbara Lewandowska- Tomaszczyk, University of Łódź and Paul Wilson, University of Łódź), Culture Based Conceptions of Emotion in Polish and English compares emotional dimensional structure of Polish and English native speakers and discusses the cultural bases to the findings. Part Five groups papers which adopt A CORPUS-ASSISTED PERSPECTIVE ON DISCOURSE ANALYSIS AND IDEOLOGY. Cinzia Bevitori (University of Bologna at Forlì) in her paper The Meanings of Responsibility in the British and American Press on Climate Change: A Corpus- Assisted Discourse Analysis Perspective examines the ways in which the lemma ‘responsibility’ is used in a corpus of British and American press coverage of climate change in 2007. Mikołaj Deckert (University of Łódź): Towards an Axiological Picture of the EU – Evidence from a Polish TV News Corpus uses Polish broadcast news data to explore the axiological constructions of the personified European Union and its metonymic discursive forms. Katarzyna Fronczak (University of Łódź) in her contribution Keywords and the Discourse of the Northern Ireland Peace Process, 1997–2007. A Case Study in the Election Manifestoes of the Democratic Unionist Party and Sinn Féin adopts a keyword approach to carry out a textual analysis of language used in the election manifestoes focusing on the development and changes in the usage of words significant for the peace process. Joanna Kaim-Kerth (Jagiellonian University) in Correspondence Analysis in Discourse Studies employs multidimensional statistical analysis to demonstrate how such methods, particularly correspondence analysis, can be used in researching discourses focused on similar subject, i.e. values and authorities as well as visions of the family as perceived by different parliamentary fractions (understood as different 8 Introduction discourses). Finally, Inga Massalina (Kaliningrad State Technical University) in her study Selected Cognitive and Discoursal Aspects of the LSP of the Navy touches upon the cognitive and discourse aspects of the LSP of Navy. ACADEMIC DISCOURSE is the topic of Part Six, which comprises three contributions. The first paper is a plenary contribution by Ken Hyland (University of Hongkong) Corpora and EAP: Specificity in Disciplinary Discourses in which the author draws on his own work conducted over several years into student and research genres to show how some familiar conventions of academic writing are