Building Basic Vocabulary Across 40 Languages

Total Page:16

File Type:pdf, Size:1020Kb

Building Basic Vocabulary Across 40 Languages Building basic vocabulary across 40 languages Judit Acs´ Katalin Pajkossy Andras´ Kornai HAS Computer and Automation Research Institute H-1111 Kende u 13-17, Budapest {judit.acs,pajkossy,kornai}@sztaki.mta.hu Abstract to the top 40 languages (by Wikipedia size) using a variety of methods. Since some of the resources The paper explores the options for build- studied here are not available for the initial list of ing bilingual dictionaries by automated 40 languages, we extended the original list to 50 methods. We define the notion ‘ba- languages so as to guarantee at least 40 languages sic vocabulary’ and investigate how well for every method. Throughout the paper, results the conceptual units that make up this are provided for all 50 languages, indicating miss- language-independent vocabulary are cov- ing data as needed. ered by language-specific bindings in 40 Section 1 outlines the approach taken toward languages. defining the basic vocabulary and translational equivalence. Section 2 describes how Wiktionary Introduction itself measures up against the 4lang resource Globalization increasingly brings languages in directly and after triangulation across language contact. At the time of the pioneering IBM work pairs. Section 2.3 and Section 2.4 deals with ex- on the Hansard corpus (Brown et al., 1990), only traction from multiply parallel and near-parallel two decades ago, there was no need for a Basque- corpora, and Section 3 offers some conclusions. Chinese dictionary, but today there is (Saralegi et al., 2012). While the methods for building dic- 1 Basic vocabulary tionaries from parallel corpora are now mature The idea that there is a basic vocabulary composed (Melamed, 2000), there is a dearth of bilingual or of a few hundred or at most a few thousand ele- even monolingual material (Zseder´ et al., 2012), ments has a long history going back to the Renais- hence the increased interest in comparable cor- sance – for a summary, see Eco (1995). The first pora. modern efforts in this direction are Thorndike’s Once we find bilingual speakers capable of car- (1921) Word Book, based entirely on frequency rying out a manual evaluation of representative counts (combining TF and DF measures), and samples, it is relatively easy to measure the pre- Ogden’s (1944) Basic English, based primarily cision of a dictionary built by automatic meth- on considerations of definability. Both had last- ods. But measuring recall remains a challenge, for ing impact, with Thorndike’s approach forming if there existed a high quality machine-readable the basis of much subsequent work on readabil- dictionary (MRD) to measure against, building a ity (Klare 1974, Kanungo and Orr 2009) and new one would largely be pointless, except per- Ogden’s forming the basis of the Simple En- haps as a means of engineering around copyright glish Wikipedia1. An important landmark is the restrictions. We could measure recall against Wik- Swadesh (1950) list, which puts special emphasis tionary, but of course this is a moving target, and on cross-linguistic definability, as its primary goal more importantly, the coverage across language is to support glottochronological studies. pairs is extremely uneven. Until the advent of large MRDs, the frequency- What we need is a standardized vocabulary re- based method was much easier to follow, and source that is equally applicable to all language Thorndike himself has extended his original list of pairs. In this paper we describe our work toward ten thousand words to twenty thousand (Thorndike creating such a resource by extending the 4lang conceptual dictionary (Kornai and Makrai, 2013) 1http://simple.wikipedia.org 52 Proceedings of the 6th Workshop on Building and Using Comparable Corpora, pages 52–58, Sofia, Bulgaria, August 8, 2013. c 2013 Association for Computational Linguistics 1931) and thirty thousand (Thorndike and Lorge the most frequent 2,000 words according to the 1944). For a recent example see Davies and Gard- Google unigram count (Brants and Franz 2006) ner (2010), for a historical survey see McArthur and the BNC, as well as the most frequent 2,000 (1998). The main problem with this approach is words from Polish (Halacsy´ et al 2004) and Hun- the lack of clear boundaries both at the top of the garian (Kornai et al 2006). Since Latin is one of list, where function words dominate, and at the the four languages supported by 4lang (the other bottom, where it seems quite arbitrary to cut the three being English, Polish, and Hungarian), we list off after the top three hundred words (Diedrich added the classic Diederich (1938) list and Whit- 1938), the top thousand, as is common in foreign ney’s (1885) Roots. language learning, or the top five thousand, es- The basic list emerging from the iteration has pecially as the frequency curves are generally in 1104 elements (including two bound morphemes good agreement with Zipf’s law and thus show no but excluding technical terms of the formal seman- obvious inflection point. The problem at the top tic model that have no obvious surface reflex). We is perhaps more significant, since any frequency- will refer to this as the basic or uroboros set as it based listing will start with the function words has the property that each of its members can be of the language, characterizing more its grammar defined in terms of the others, and we reserve the than its vocabulary. For this reason, the list is name 4lang for the larger set of 3,345 elements highly varied across languages, and what is a word from which it was obtained. Since 4lang words (free form) in one language, like English the, often can be defined using only the uroboros vocabu- ends up as an affix (bound form) in another, like lary, and every word in the Longman Dictionary the Romanian suffix -ul. By choosing a frequency- of Contemporary English can be defined using the based approach, we inevitably put the emphasis on 4lang vocabulary (since this is a superset of LDV), comparing grammars and morphologies, instead we have full confidence that every sense of every of comparing vocabularies. non-technical word can be defined by the uroboros The definitional method is based on the assump- vocabulary. In fact, the Simple English Wikipedia tion that dictionaries will attempt to define the is an attempt to do this (Yasseri et al., 2012) based more complex words by simpler ones. Therefore, on Ogden’s Basic English, which overlaps with the starting with any word list L, the list D(L) ob- uroboros set very significantly (Dice 0.527). tained by collecting the words appearing on the The lexicographic principles underlying 4lang right-hand side of the dictionary definitions will have been discussed elsewhere (Kornai, 2012; be simpler, the list D(D(L)) obtained by repeat- Kornai and Makrai, 2013), here we just summa- ing the method will be yet simpler, and so on, un- rize the most salient points. First, the system is til we arrive at an irreducible list of basic words intended to capture everyday vocabulary. Once that can no longer be further simplified. Mod- the boundaries of natural language are crossed, ern MRDs, starting with the Longman Dictionary and goats are defined by their set of genes (rather of Contemporary English (LDOCE), generally en- than an old-fashioned taxonomic description in- force a strict list of words and word senses that can volving cloven hooves and the like), or derivative appear in definitions, which guarantees that the ba- is defined as lim (f(x + ∆) − f(x))/∆, the sic list will be a subset of this defining vocabulary. ∆→0 uroboros vocabulary loses its grip. But for the This method, while still open to charges of arbi- non-technical vocabulary, and even the part of the trariness at the high end, in regards to the separa- technical vocabulary that rests on natural language tion of function words from basic words, creates a (e.g. legal definitions or the definitions in philos- bright line at the low end: no word, no matter how ophy and discursive prose in general), coverage frequent, needs to be included as long as it is not of the uroboros set promises a strategy of grad- necessary for defining other words. ually extending the vocabulary from the simple In creating the 4lang conceptual dictionary to the more complex. Thus, to define Jupiter as (Kornai and Makrai, 2013), we took advantage ‘the largest planet of the Sun’, we need to define of the fact that the definitional method is robust planet, but not large as this item is already listed in in terms of choosing the seed list L, and built a the uroboros set. Since planet is defined ‘as a large seed of approximately 3,500 entries composed of body in space that moves around a star’, by substi- the Longman Defining Vocabulary (2,200 entries), tution we will obtain for Jupiter the definition ‘the 53 largest body in space that moves around the Sun’ pairs are only obtained indirectly, through the con- where all the key items large, body, space, move, ceptual pivot, and thus do not amount to fully valid around are part of the uroboros set. Proper nouns bilingual translation pairs. For example, he-goat like Sun are discussed further in (Kornai, 2010), in one language may just get mapped to the con- but we note here that they constitute a very small cept goat, and if billy-goat is present in another proportion (less than 6%) of the basic vocabulary. language, the strict translational equivalence be- tween the gendered forms will be lost because Second, the ultimate definitions of the uroboros of the poverty of the pivot.
Recommended publications
  • New Insights Into the Semantics of Legal Concepts and the Legal Dictionary
    TERMINOLOGY and LEXICOGRAPHY B a j c i ´c RESEARCH and PRACTICE 17 and the Legal Dictionary Legal the and Martina Semantics of Legal Concepts Concepts Legal of Semantics New Insights into the the into Insights New John Benjamins Publishing Company Publishing Benjamins John New Insights into the Semantics of Legal Concepts and the Legal Dictionary Terminology and Lexicography Research and Practice (TLRP) issn 1388-8455 Terminology and Lexicography Research and Practice aims to provide in-depth studies and background information pertaining to Lexicography and Terminology. General works include philosophical, historical, theoretical, computational and cognitive approaches. Other works focus on structures for purpose- and domain-specific compilation (LSP), dictionary design, and training. The series includes monographs, state-of-the-art volumes and course books in the English language. For an overview of all books published in this series, please see www.benjamins.com/catalog/tlrp Editors Marie-Claude L’ Homme Kyo Kageura University of Montreal University of Tokyo Volume 17 New Insights into the Semantics of Legal Concepts and the Legal Dictionary by Martina Bajčić New Insights into the Semantics of Legal Concepts and the Legal Dictionary Martina Bajčić University of Rijeka John Benjamins Publishing Company Amsterdam / Philadelphia TM The paper used in this publication meets the minimum requirements of 8 the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984. doi 10.1075/tlrp.17 Cataloging-in-Publication Data available from Library of Congress: lccn 2016053197 (print) / 2016055071 (e-book) isbn 978 90 272 2341 8 (Hb) isbn 978 90 272 6600 2 (e-book) © 2017 – John Benjamins B.V.
    [Show full text]
  • The Thinking of Speaking Issue #27 May /June 2017 Ccooggnnaatteess,, Tteelllliinngg Rreeaall Ffrroomm Ffaakkee More About Cognates Than You Ever Wanted to Know
    Parrot Time The Thinking of Speaking Issue #27 May /June 2017 CCooggnnaatteess,, TTeelllliinngg RReeaall ffrroomm FFaakkee More about cognates than you ever wanted to know AA PPeeeekk iinnttoo PPiinnyyiinn The Romaniizatiion of Mandariin Chiinese IInnssppiirraattiioonnaall LLaanngguuaaggee AArrtt Maxiimiilliien Urfer''s piiece speaks to one of our wriiters TThhee LLeeaarrnniinngg MMiinnddsseett Language acquiisiitiion requiires more than study An Art Exhibition That Spoke To Me LLooookk bbeeyyoonndd wwhhaatt yyoouu kknnooww Parrot Time is your connection to languages, linguistics and culture from the Parleremo community. Expand your understanding. Never miss an issue. 2 Parrot Time | Issue#27 | May/June2017 Contents Parrot Time Parrot Time is a magazine covering language, linguistics Features and culture of the world around us. 8 More About Cognates Than You Ever Wanted to Know It is published by Scriveremo Languages interact with each other, sharing aspects of Publishing, a division of grammar, writing, and vocabulary. However, coincidences also Parleremo, the language learning create words which only looked related. John C. Rigdon takes a community. look at these true and false cognates, and more. Join Parleremo today. Learn a language, make friends, have fun. 1 6 A Peek into Pinyin Languages with non-Latin alphabets are often a major concern for language learners. The process of converting a non-Latin alphabet into something familiar is called "Romanization", and Tarja Jolma looks at how this was done for Mandarin Chinese. 24 An Art Exhibition That Spoke To Me Editor: Erik Zidowecki Inspiration is all around us, often crossing mediums. Olivier Email: [email protected] Elzingre reveals how a performance piece affected his thinking of languages.
    [Show full text]
  • Proceedings of the XVI EURALEX International Congress: the User in Focus 15-19 July 2014, Bolzano/Bozen
    Proceedings of the XVI EURALEX International Congress: The User in Focus 15-19 July 2014, Bolzano/Bozen Edited by Andrea Abel, Chiara Vettori, Natascia Ralli Part 3 1 Proceedings of the XVI EURALEX International Congress: The User in Focus Index Part 1 Plenary Lectures 23 From Lexicography to Terminology: a Cline, not a Dichotomy .......................................................................................................................... 25 Thierry Fontenelle Natural Language Processing Techniques for Improved User-friendliness of Electronic Dictionaries 47 Ulrich Heid Using Mobile Bilingual Dictionaries in an EFL Class 63 Carla Marello Meanings, Ideologies, and Learners’ Dictionaries 85 Rosamund Moon The Dictionary-Making Process 107 The Making of a Large English-Arabic/Arabic-English Dictionary: the Oxford Arabic Dictionary 109 Tressy Arts Simple and Effective User Interface for the Dictionary Writing System 125 Kamil Barbierik, Zuzana Děngeová, Martina Holcová Habrová, Vladimír Jarý, Tomáš Liška, Michaela Lišková, Miroslav Virius Totalitarian Dictionary of Czech .................................................................................................................................................................................................................................. 137 František Čermák Dictionary of Abbreviations in Linguistics: Towards Defining Cognitive Aspects as Structural Elements of the Entry 145 Ivo Fabijancic´ La definizione delle relazioni intra- e interlinguistiche nella costruzione dell’ontologia
    [Show full text]
  • Applying the CIDOC-CRM to Archaeological Grey Literature
    Semantic Indexing via Knowledge Organization Systems: Applying the CIDOC-CRM to Archaeological Grey Literature Andreas Vlachidis A thesis submitted in partial fulfilment of the requirements of the University of Glamorgan / Prifysgol Morgannwg for the degree of a Doctor of Philosophy. July 2012 University of Glamorgan Faculty of Advanced Technology Certificate of Research This is to certify that, except where specific reference is made, the work presented in this thesis is the result of the investigation undertaken by the candidate. Candidate: ................................................................... Director of Studies: ..................................................... Declaration This is to certify that neither this thesis nor any part of it has been presented or is being currently submitted in candidature for any other degree other than the degree of Doctor of Philosophy of the University of Glamorgan. Candidate: ........................................................ Andreas Vlachidis PhD Thesis University of Glamorgan Abstract The volume of archaeological reports being produced since the introduction of PG161 has significantly increased, as a result of the increased volume of archaeological investigations conducted by academic and commercial archaeology. It is highly desirable to be able to search effectively within and across such reports in order to find information that promotes quality research. A potential dissemination of information via semantic technologies offers the opportunity to improve archaeological practice,
    [Show full text]
  • Man-Aided Computer, Translation from English
    Conference Internationale sur le traitement Automatique des Langues. MANTAIDED COMPUTER TRANSLATION PROM ENGLISH INT0 FRENCH USING AN ON-LINE SYSTEM TO F~NIPULATE A BI-LINGUAL CONCEPTUAL DICTIONARY, OR THESAURUS. Author: MARGARET MASTERMAN Cambridge Language Research Unit, 20, Millington Road, CAMBRIDGE. ENGLAND. Conference Internationals sur le traitement Author: Margaret Masterman Institute: Cambridge Language Research Unit, 20, Milllngton Road, CAMBRIDGE, ENGLAND. Title: MAN-AIDED COMPUTER TRANSLATION FROM ENGLISH INT0 FRENCH USING AN ON-LINE SYSTEM TO MANIPULATE Work supported by: Canadian National Research Council. Office of Naval Research, Washington, D.C. background basic research @arlier supported by: National Science Foundation, Washington, D.C. Air Force Office of Scientific Research, Washington, D.C. Office of Scientific andTechnical Information, State House, High Holborn, London. ~.B, References will be denoted thus: ~D~ I. Long-term querying of the current state of despondenc~ with regard to the prospects of Mechanics1 Translation. The immediate effect of the recently issued Report on Computers inTranslation and L~istics. LANGUAGE ~ MACHINES ~J3 has been to spread the view that there is no future at all for research in Mechanical Translation as such; a view which contrasts sharply with the earlier, euphoric view that (now that disc-files provide computers with indefinitely large memory-systems which can be quickly searched by random-access procedures) the Mechanical Translation research problem was all but "solved". It is possible, however, that this second, ultra- despondent view is as exaggerated as the first one was; all the more so as the~ is written from a very narrow research background without iny indication of this narrowness being ~iven.
    [Show full text]
  • Informational Retrieval Thesaurus of Yaroslav Mudryi National Library of Ukraine: Content, Structure, and Use
    Informational Retrieval Thesaurus of Yaroslav Mudryi National Library of Ukraine: Content, Structure, and Use Oksana Zbanatskaа, Oksana Turb and Ksenia Sizovab a National Academy of Managerial Staff of Culture and Arts, Lavrska str., 9, bldg. 15, Kyiv, 01015, Ukraine b Kremenchuk Mykhailo Ostrohradskyi National University, Pershotravneva str., 20, bldg. 3, Kremenchuk, 39600, Ukraine Abstract The paper deals with terminological and species content of the Yaroslav Mudryi National Library of Ukraine information retrieval thesaurus; its structure is characterized; examples of dictionary entries are given. For clarity, the dynamics of thesaurus filling is shown. A historical digression on the origin of term “thesaurus” is implemented. Keywords 1 Informational retrieval thesaurus (IRT), Automated information library systems (AILS), Descriptor, Non-descriptor, Document content, Yaroslav Mudryi National Library of Ukraine. 1. Introduction In Ukraine, as well as all over the world, information is one of critical and importance strategic resource and a driving factor for the further state development. Library is one of the main institutions that provide collection, organization and public use of information. A priority of the Yaroslav Mudryi National Library of Ukraine (Yaroslav Mudryi NLU) is to help users navigate the large information space, and quickly search for and access the necessary information resources, and ensure guarantee the constitutional rights of individuals, such as the right to information. In order to successfully solve this problem, library subject specialists who are experts in finding the best information created the first Ukrainian-language universal information retrieval thesaurus (IRT), designed to display the content of documents and user requests for further search in automated information library systems (AILS).
    [Show full text]
  • Wreaths of Time: Perceiving the Year in Early Modern Germany (1475-1650)
    Wreaths of Time: Perceiving the Year in Early Modern Germany (1475-1650) Nicole Marie Lyon October 12, 2015 Previous Degrees: Master of Arts Degree to be conferred: PhD University of Cincinnati Department of History Dr. Sigrun Haude ii DISSERTATION ABSTRACT “Wreaths of Time” broadly explores perceptions of the year’s time in Germany during the long sixteenth century (approx. 1475-1650), an era that experienced unprecedented change with regards to the way the year was measured, reckoned and understood. Many of these changes involved the transformation of older, medieval temporal norms and habits. The Gregorian calendar reforms which began in 1582 were a prime example of the changing practices and attitudes towards the year’s time, yet this event was preceded by numerous other shifts. The gradual turn towards astronomically-based divisions between the four seasons, for example, and the use of 1 January as the civil new year affected depictions and observations of the year throughout the sixteenth century. Relying on a variety of printed cultural historical sources— especially sermons, calendars, almanacs and treatises—“Wreaths of Time” maps out the historical development and legacy of the year as a perceived temporal concept during this period. In doing so, the project bears witness to the entangled nature of human time perception in general, and early modern perceptions of the year specifically. During this period, the year was commonly perceived through three main modes: the year of the civil calendar, the year of the Church, and the year of nature, with its astronomical, agricultural and astrological cycles. As distinct as these modes were, however, they were often discussed in richly corresponding ways by early modern authors.
    [Show full text]
  • A Conceptual Dictionary of German and Spanish
    Published in: Silvestre, João Paulo/Villalva, Alina (eds.): Planning non-existent dictionaries. - Lisboa: Centro de Linguística da Universidade de Lisboa, 2015. Pp. 163-179. (Dicionarística portuguesa 4) Theoretical and methodological foundations of the DICONALE project: a conceptual dictionary of German and Spanish Meike Meliss & Paloma Sánchez Hernández l. Introduction The research project DICONALE-online"2 concerns the development of an online dictionary of verbal lexemes, onomasiological and bilingual-bidirectional in nature, and covering the German and Spanish languages. The dictionary is intended for those who are learning German or Spanish as a foreign language at an advanced level (B2),112 113 * and as a pedagogical dictionary it is conceived especially as a resource for those actively producing texts."4 The project is a response to studies which have shown that conventional dictionaries (both monolingual and bilingual), in both print and online format, do not satisfy the specific needs of users involved in the production of texts (cf. Haß 2005, Fuentes Moran 1997: 84, Meliss 2013a, 2014c, 2oi4d). Thus, it arises from the need to fill the current gap in German-Spanish bilingual lexicography, and is intended to create a dictionary with a conceptual- onomasiological MACROSTRUCTURE which offers a more appropriate kind of help here, with the possibility of searching for forms of expression according to context, and hence differing from a traditional, alphabetic-semantic orientation from the 112 This study forms part of the following
    [Show full text]
  • A THEORETICAL MODEL for a YIPUNU-ENGLISH-FRENCH EXPLANATORY DICTIONARY of MEDICAL TERMS Guy-Roger MIHINDOU: Doctoral Dissertation
    A THEORETICAL MODEL FOR A YIPUNU-ENGLISH-FRENCH EXPLANATORY DICTIONARY OF MEDICAL TERMS Guy-Roger MIHINDOU Dissertation presented for the Degree of Doctor of Literature (Lexicography) at the University of Stellenbosch Promoter: Prof. R.H. GOUWS April 2006 DECLARATION I, the undersigned, hereby declare that the work contained in this dissertation is my own original work and had not previously in its entirety or in part been submitted at any other university for a degree. ............................................................ .................................................... Signature Date A THEORETICAL MODEL FOR A YIPUNU-ENGLISH-FRENCH EXPLANATORY DICTIONARY OF MEDICAL TERMS Guy-Roger MIHINDOU: Doctoral dissertation DEDICATION Tata Kondi- Mihindou, Yilumbu I ma rugha Diweru di matola mu vhegha yirondu na yisungu ombu fumu MUGHURI, dub du kiri. Ghirighanu, Kala asa mafu na mbagha, mutu tsinga asa mafu na mitanguni. I dedicate this work to my late father KONDI-MIHIDOU DANIEL (MUGHURI), as I say Diboti di neni tata, mu malongi motsu, la yiraa! mwana yaloolu. This Dedication is also for you MWA DIGHABI-DI- MUNDUNGA, my grand mother. For you taught me that duya mulong tsambi omulong, uvhiol mutsi mulund mutsi. vii ACKNOWLEDGEMENT I would like to express my deepest gratitude to my promoter Prof. R.H. Gouws for his diligent guidance, gentleness, his ability to encourage for excellence. From him I got the conviction that accuracy and completion pave the way to excellence. To my wife, Piekielele “Piks” Mihindou, my sincerest appreciation, for the unshakable faith in the completion of this work, the unreserved support, the help and encouragements I received from her, when facing the utmost uncertainties. I whish to thank my mother, Kondi Augustine, my uncle Nzigou Ditengou Dan, as well as my brothers, Ndzia-Saji, Fabrice, Jordan, and my sister Soulejka “Copine”, for their emotional support and prayers.
    [Show full text]
  • Social & Behavioural Sciences SCTCMG 2019 International
    The European Proceedings of Social & Behavioural Sciences EpSBS ISSN: 2357-1330 https://doi.org/10.15405/epsbs.2019.12.04.277 SCTCMG 2019 International Scientific Conference «Social and Cultural Transformations in the Context of Modern Globalism» TO A PROBLEM OF COMPLEX DESCRIPTION OF SEMANTICS OF LEXICAL UNITS Maierbeck Makhaev (a, c)*, Arbi Vagapov (a, b, c), Said-Khamzat Ireziev (d), Toita Abdulazimova (d), Khedi Selmurzaeva (d) *Corresponding author (a) Kh. Ibragimov Complex Research Institute of the Russian Academy of Sciences, 364051, 21a, Staropromyslovskoye Highway St., Grozny, Russia [email protected], 7-938-893-7229 (b) Academy of Sciences of the Chechen Republic, 13, M. Esambayev Ave., Grozny, 364024, Russia [email protected], (8712) 22-26-76 (c) Chechen State Pedagogical University, 62, Kh. Isaev Ave., Grozny, 364068, Russia [email protected], (8712) 22-43-01 (d) Chechen State University, 32, A. Sheripov St., Grozny, 364024, Russia [email protected], (8712) 29-00-04 Abstract The paper studies such major methodological problem of modern lexicography as the description of semantics of lexical units. The lexicographic meaning of lexical units formulated according to the principle of reductionism is presented in explanatory dictionaries (listing a minimum features of denotation sufficient to express its meaning). However, “when formulating the description of a word meaning a lexicographer often relies on the personal understanding of either feature of a word”, due to which “the lexical content of many words is differently described in explanatory dictionaries”. This phenomenon demonstrates the principle of plurality of metalanguage description of mental units causing the difference in meanings of the same lexical units.
    [Show full text]
  • Dictionary of Lexicography
    Dictionary of Lexicography Anyone who has ever handled a dictionary will have wondered how it was put together, where the information has come from, and how and why it can benefit so many of its users. The Dictionary of Lexicography addresses all these issues. The Dictionary of Lexicography examines both the theoretical and practical aspects of its subject, and how they are related. In the realm of dictionary research the authors highlight the history, criticism, typology, structures and use of dictionaries. They consider the subjects of data-collection and corpus technology, definition-writing and editing, presentation and publishing in relation to dictionary-making. English lexicography is the main focus of the work, but the wide range of lexicographical compilations in other cultures also features. The Dictionary gives a comprehensive overview of the current state of lexicography and all its possibilities in an interdisciplinary context. The representative literature has been included and an alphabetically arranged appendix lists all bibliographical references given in the more than 2,000 entries, which also provide examples of relevant dictionaries and other reference works. The authors have specialised in various aspects of the field and have contributed significantly to its astonishing development in recent years. Dr R.R.K.Hartmann is Director of the Dictionary Research Centre at the University of Exeter, and has founded the European Association for Lexicography and pioneered postgraduate training in the field. Dr Gregory James is Director of the Language Centre at the Hong Kong University of Science and Technology, where he has done research into what separates and unites European and Asian lexicography.
    [Show full text]
  • TESIS DOCTORAL Verbs of Pure Chande of State in English Beatriz
    TESIS DOCTORAL Verbs of pure chande of state in English projection, constructions and event structure Beatriz Martínez Fernández TESIS DOCTORAL Verbs of pure chande of state in English projection, constructions and event structure Beatriz Martínez Fernández Universidad de La Rioja Servicio de Publicaciones 2004 Esta tesis doctoral, dirigida por el doctor D. Javier Martín Arista, fue leída el 25 de julio de 2007, y obtuvo la calificación de Sobresaliente Cum Laude Unanimidad © Beatriz Martínez Fernández Edita: Universidad de La Rioja Servicio de Publicaciones ISBN 978-84-691-1311-0 Verbs of pure change of state in English: projection, constructions and event structure Beatriz Martínez Fernández Doctoral thesis supervised by Dr Javier Martín Arista Department of Modern Languages University of La Rioja CONTENTS Chapter 1. Introduction 1.1 Research aims 1 1.2 Rationale 3 1.3 Working hypotheses 4 1.4 Chapter outline 5 Chapter 2. Literature Review 2.0 Introduction 11 2.1 Different approaches to lexical representation 12 2.1.1 Levin and Rappaport Hovav (2005) 13 2.1.2 Alsina (1999) 18 2.1.3 Lieber (2004) 20 2.1.4 Faber and Mairal (1999), Van Valin and Mairal (2001), Mairal and Ruiz de Mendoza (forthcoming 2007), Ruiz de Mendoza and Mairal (forthcoming 2007) 23 2.1.5 Pustejovsky’s generative lexicon 27 2.1.5.a Introduction: aims of the GL 27 2.1.5.b The generative lexicon: structure 28 2.1.5.c Event structure 29 2.1.6 Role and reference grammar 34 2.1.6.a The RRG’s system of lexical representation 34 2.1.6.b RRG’s event structure in comparison to GL’s event structure 40 2.2 Conclusions 43 iii Chapter 3.
    [Show full text]