Ontological Parsing of Encyclopedia Information1

Total Page:16

File Type:pdf, Size:1020Kb

Ontological Parsing of Encyclopedia Information1 Ontological Parsing of Encyclopedia Information 1 Victor Bocharov, Lidia Pivovarova, Valery Rubashkin, Boris Chuprin St. Petersburg State University, Universitetskaya nab. 11, Saint-Petersburg, Russia [email protected], [email protected], [email protected], [email protected] Abstract. Semi-automatic ontology learning from encyclopedia is presented with primary focus on syntax and semantic analyses of definitions. Keywords: Ontology Learning, Syntax Analysis, Relation Extraction, Encyclopedia, Wikipedia . Introduction Ontology Learning is a rapidly expanding area of Natural Language Processing. Many language technologies – from machine translation to speech recognition – should be supported by ontologies that provide conceptual interpretation encompassing the entire corpus vocabulary. However, a formal ontology, which is sufficient to encompass the entire lexis even in a narrow domain, should include a few dozen thousand concepts. Therefore, manual development of an ontology is a very time consuming process that can not be completed at the required level of completeness. Nowadays, this “bottleneck” problem is considered as the main obstacle to using ontologies [1]. This problem becomes even more complex if a universal knowledge base is necessary instead of a domain ontology. Therefore, ontology learning technologies are quite popular now. It is possible to use different sources (such as natural language texts, machine readable dictionaries, semi-structured data, knowledge bases, etc.; a complete survey is presented in [2]) for ontology learning, which is generally understood as ontology development based on natural language. However, parsing of machine-readable dictionaries seems to be more effective. The main difference between a natural language text and a dictionary is the form of knowledge representation. Knowledge in a dictionary is more structured and compact than in free texts. In some cases, the structure is presented in dictionaries explicitly (as markups, tags, etc.), and otherwise it is expressed only by syntax. Many efforts are currently underway in this area. (e.g., [3], [4], [5], [6], [7], [8], [9]). Nevertheless, we are unaware of any comparable effort for Russian dictionaries, though certain approaches to ontology learning from Russian free texts are known (e.g., [10], [11], [12]). Problem Statement and Basic Algorithm We present here ontology learning from machine-readable version of “Russian Encyclopedic Dictionary” [13]. We use the entire dictionary with the exception of toponyms and proper names. A portion of the dictionary taken into consideration includes of 26,375 entries, which describe 21,782 different terms. The difference between these two figures is caused by presence of disambiguated terms (e.g., there are five different definitions for “aberration” in such areas as biology, physics, etc.). The learned ontology is a universal ontology developed primarily for semantic text analysis. The basic structure for this ontology is represented by an attribute tree where objects alternate with attributes [15]. A small fragment of this tree is presented as an example below: • TRANSPORT o BY ENERGY SOURCE • ELECTRIC TRANSPORT • ATOMIC TRANSPORT • FUEL TRANSPORT • WIND-DRIVEN TRANSPORT o BY ENVIRONMENT TYPE • AIR TRANSPORT • WATER TRANSPORT • LAND TRANSPORT • SPACE TRANSPORT 1 This paper is supported by Russian Foundation For Basic Research, project №09-06-00275-а This structure provides the most natural way to present different links such as correspondence of a value to an attribute (* great color vs. great volume ), correspondence of an attribute to an object class (SOLID –> SHAPE vs. *LIQUID –> SHAPE), or a complete set of extension relations between concepts (incompatibility, intersection, inclusion). The ontology provides also representation of different associative relations, which are either unified (PART –> WHOLE, OBJECT –> LOCALIZATION, OBJECT –> FUNCTION, etc.) or specialized (COUNTRY –> CAPITAL, ORGANIZATION –> CHIEF, etc.). Lexicon is an integral part of a working ontology, which connects a conceptual model with natural language units. Such a lexicon includes words and collocations that can be used to express various concepts. These words and collocations can represent standard terms (i.e., names of concepts used for the ontology) or their synonyms (we use the “synonym” term here in its broad sense as any natural language expression that refers to a respective concept with a reasonable probability). We use our own ontoedidor [13] with additional tools for encyclopedia information import at the stage of ontology learning. Since the requirements for concept description in natural language processing are very strict, it is hardly possible to populate the ontology from our source in fully automatic fashion. Therefore, ontology learning is broken down into two stages: first, the dictionary entries are pre-classified automatically, and, second, an ontology administrator in given an opportunity to approve, change or cancel a decision made by the program. We discuss here primarily the first stage of this process, which represents automatic linguistic analysis of encyclopedia entries. This linguistic analysis is based on the following simple hypothesis: usually, a hyperonym for a dictionary term is the first subjective-case noun of its definition (referred to hereafter as “basic word”). Several examples of typical dictionary entries, which correspond to this hypothesis, are shown below 2. АГРАФ – нарядная заколка для волос, с помощью которой крепили в прическах перья, цветы, искусственные локоны и т. д. HAIRPIN – a pin to hold the hair in place. ПЕРИСТИЛЬ – прямоугольный двор , сад, площадь, окруженные с 4 сторон крытой колоннадой. PERISTYLE – a colonnade surrounding a building or court . ЯТАГАН – рубяще-колющее оружие (среднее между саблей и кинжалом) у народов Ближнего и Среднего Востока (известно с 16 в.). YATAGHAN - a long knife or short saber that lacks a guard for the hand at the juncture of blade and hilt and that usually has a double curve to the edge and a nearly straight back. As was demonstrated in pilot study [17], the structure of most dictionary entries corresponds to our hypothesis; however, its direct usage yields incorrect results occasionally. A list of the most frequent basic words selected at the first step of analysis [17] is shown in Table 1. А very simple lemmatizer was used to determine the first noun in each definition. The total of 4603 different first nouns are were identified using this technique. Table 1. List of the most frequently used basic words (according to pilot study [17]) Rank Basic Word Translation Frequency Rank Basic Word Translation Frequency 1 ИЗА IZA 475 18 ЗАБОЛЕВАНИЕ DISEASE 186 2 ЧАСТЬ PART 415 19 ПРОЦЕСС PROCESS 182 3 СОВОКУПНОСТЬ COMBINATION 406 20 СПОСОБ APPROACH 169 4 НАЗВАНИЕ NAME 389 21 БОЛЕЗНЬ ILLNESS 164 5 СИСТЕМА SYSTEM 347 22 ##не выявлено ## ##undefined## 162 6 РАЗДЕЛ SECTION 336 23 ЖИДКОСТЬ LIQUID 154 7 ВИД KIND 305 24 СОЕДИНЕНИЕ COMPOUND 153 8 УСТРОЙСТВО DEVICE 298 25 КРИСТАЛЛ CRYSTAL 153 9 ПРИБОР INSTRUMENT 286 26 ПОРОДА BREED 141 10 МИНЕРАЛ MINERAL 286 27 НАПРАВЛЕНИЕ DIRECTION 137 11 ЕДИНИЦА UNIT 264 28 ОРГАН ORGAN 134 12 ФОРМА FORM 232 29 НАУКА DISCIPLINE 132 13 ГРУППА GROUP 212 30 ТКАНЬ TISSUE 132 14 ИНСТРУМЕНТ TOOL 204 31 ЛИЦО PERSON 120 15 ВЕЩЕСТВО SUBSTANCE 202 32 ОБЛАСТЬ PROVINCE 116 16 ЭЛЕМЕНТ ELEMENT 198 33 ОТРАСЛЬ BRANCH 116 17 МЕТОД METHOD 194 34 КОМПЛЕКС COMPLEX 109 The most frequent word here is Иза , a Russian woman name. Из (the plural form of this name in the genitive case), is a homonym of very frequent Russian preposition из (from ). If this preposition is situated before any noun in the definition, the program selects it as a noun. This situation and some similar cases make it necessary to complete morphological information about grammemes instead of using simple lemmatization. Then, there are such frequent words as part , compleX , name, kind , sort, etc. These words cannot be used as basic words; they are more like links that mark relationship between a dictionary term and a proper basic word. The high frequency of using such words makes it necessary to apply additional logical-linguistic rules for extracting relations of different kind. 2 Relevant definitions taken from Webster dictionary (http://www.merriam-webster.com/) or English Wikipedia (http://en.wikipedia.org/) are shown here instead of translations of respective Russian definitions. Finally, some other words are noticeable in this list. For example, единица is a part of Russian phrases единица измерения (unit of measurement ) or денежная единица (monetary unit ), which are very frequent in encyclopedic dictionary. Similarly, such frequently used words as элемент (element) and лицо (person) are parts of such phrases as химический элемент (chemical element ) and должностное лицо (official) respectively. This fact justifies extraction of noun groups (in addition to single nouns) as basic words, and, therefore, it becomes necessary to use certain elements of syntactic analysis. Very frequent occurrence of undefined basic words can be explained in two different ways. First, this phenomenon can be caused by certain errors, which are partly corrected herein. Second, it can indicate an unusual dictionary definition. For example: МОРСКАЯ АРТИЛЛЕРИЯ – состоит на вооружении кораблей и береговых ракетно-артиллерийских войск (NAVAL ARTILLERY – is in service with naval ship or coastal defense troops ) – no noun in subjective case is present in this definition.
Recommended publications
  • Calculus of Possibilities As a Technique in Linguistic Typology
    Calculus of possibilities as a technique in linguistic typology Igor Mel’uk 1. The problem stated: A unified conceptual system in linguistics A central problem in the relationship between typology and the writing of individual grammars is that of developing a cross-linguistically viable con- ceptual system and a corresponding terminological framework. I will deal with this problem in three consecutive steps: First, I state the problem and sketch a conceptual system that I have put forward for typological explora- tions in morphology (Sections 1 and 2). Second, I propose a detailed illus- tration of this system: a calculus of grammatical voices in natural languages (Section 3). And third, I apply this calculus (that is, the corresponding con- cepts) in two particular case studies: an inflectional category known as an- tipassive and the grammatical voice in French (Sections 4 and 5). In the latter case, the investigation shows that even for a language as well de- scribed as French a rigorously standardized typological framework can force us to answer questions that previous descriptions have failed to re- solve. I start with the following three assumptions: 1) One of the most pressing tasks of today’s linguistics is description of particular languages, the essential core of this work being the writing of grammars and lexicons. A linguist sets out to describe a language as pre- cisely and exhaustively as possible; this includes its semantics, syntax, morphology and phonology plus (within the limits of time and funds avail- able) its lexicon. 2) Such a description is necessarily carried out in terms of some prede- fined concepts – such as lexical unit, semantic actant, syntactic role, voice, case, phoneme, etc.
    [Show full text]
  • A Grammar of Kunbarlang
    A Grammar of Kunbarlang Ivan Kapitonov ORCID: 0000-0002-1603-6265 Submitted in total fulfilment of the requirements ofthe degree of Doctor of Philosophy School of Languages and Linguistics The University of Melbourne July 2019 Copyright 2019 Ivan Kapitonov This work is licensed under the Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License. To view a copy of this license, visit http://creativecommons. org/licenses/by-nc-nd/3.0/ or send a letter to Creative Commons, PO Box 1866, Mountain View, CA 94042, USA. Abstract This thesis is a comprehensive description of Kunbarlang, an Aboriginal language from northern Australia. The description and analysis are based on my original field work, as well as build on the preceding body of work by other scholars. Between 2015 and 2018 I have done field work in Warruwi (South Goulburn Island), Maningrida, and Darwin. The data elicited in those trips and the recordings of narratives andsemi- spontaneous conversation constitute the foundation of the present grammar. However, I was fortunate in that I was not working from scratch. Carolin Coleman did foundational work on Kunbarlang in central-western Arnhem Land from 1981, which resulted in the first grammar of the language (Coleman 1982). In her subsequent work in the area in the 1990’s, she carried on with lexicographic research in Kunbarlang, Mawng and Maningrida languages. More recently, Dr. Aung Si (Universität zu Köln), Dr. Isabel O’Keeffe (University of Sydney), and Dr. Ruth Singer (University of Melbourne / Australian National University) made a number of recordings of Kunbarlang speakers at Maningrida, Warruwi, Minjilang and Darwin.
    [Show full text]
  • Grammatical Markers and Grammatical Relations in the Simple Clause in Old French
    Grammatical markers and grammatical relations in the simple clause in Old French Nicolas Mazziotta Universität Stuttgart Institut für Linguistik/Romanistik Germany [email protected] Abstract described in a dependency framework (henceforth “DF”). As an introduction, I will first give a quick The focus of this paper is the descrip- overview of OF (1.1), and define the focus of this tion of the surface syntax relations in the study (1.2). simple clause in Old French and the way they can be described in a dependency 1.1 Old French: an overview grammar. The declension system of Old The term Old French roughly corresponds to a French is not reliable enough to cope with continuum of romance varieties that were spoken the identification of the dependents of the in the northern half of France, in Wallonia and main verb, but it remains true that related in England during the Middle Ages (9th-13th C.). grammatical markers are still observable To carry on a description of OF, one has to sys- and obey rules that forbid them to appear tematize the common ground that all these idioms in specific syntactic positions. share as well as the major differences that distin- This study relies on three previous ac- guish the varieties. The paper will focus on that counts; Igor Mel’cuk’sˇ “criteria B”, the common ground, which can be seen as the direct criteria that are used to determine which ancestor of modern French. is the syntactic governor in a syntactic de- From a grammatical point of view, OF is much pendency relation, Thomas Groß’s intra- more analytic than Latin is: many relations are in- word analysis, which grants morphs node troduced by prepositions.
    [Show full text]
  • Person Marking in South-West Mande Languages: a Tentative Reconstruction1
    Mandenkan No. 46, 2010, pp. 3-48 Person Marking in South-West Mande Languages: 1 a Tentative Reconstruction Kirill Babaev Institute of Linguistics for the Russian Academy of Sciences, Moscow The article presents a brief comparative analysis of systems of person marking in the six languages of the South-Western group of the Mande family and a tentative reconstruction of the proto- language person markers. The paper includes data from the newly- discovered Zialo language recorded in 2010 by the Russian linguistic expedition to West Africa. Keywords: personal pronouns, person marking, comparative analysis, reconstruction, Mande languages, Niger-Congo languages 0. Introduction The general idea of the present paper is to conduct an analysis of the systems of person marking in the South-Western Mande (further referred to as SWM) languages and to deliver a tentative reconstruction of the original Proto-SWM system. So far, no attempt of such a reconstruction has been published. The area populated by speakers of SWM languages lies in the savanna and rainforest zones of West Africa. The group encompasses the following languages: 1. Bandi (French bandi) is spoken by about 100 thousand people (2001) in Lofa county in the northern part of Liberia. Bandi is rather homogeneous: the only variety with significant phonological peculiarities is the dialect of Yawiazu. 2. Kpelle (or Kpese, French guerzé) is in use by nearly 800 thousand speakers equally distributed between Guinea (N’Zérékoré province) and northern Liberia. There are about a dozen of dialects of Kpelle that vary greatly: the common distinction is drawn between the Guinean and Liberian dialect clusters.
    [Show full text]
  • Morphological Mismatches in Machine Translation
    Mach Translat (2008) 22:101–152 DOI 10.1007/s10590-009-9051-z Morphological mismatches in machine translation Igor Mel’ˇcuk · Leo Wanner Received: 19 February 2008 / Accepted: 23 March 2009 / Published online: 13 May 2009 © Springer Science+Business Media B.V. 2009 Abstract This paper addresses one of the least studied, although very important, problems of machine translation—the problem of morphological mismatches between languages and their handling during transfer. The level at which we assume transfer to be carried out is the Deep-Syntactic Structure (DSyntS) as proposed in the Meaning- Text Theory. DSyntS is abstract enough to avoid all types of surface morphological divergences. For the remaining ‘genuine’ divergences between grammatical signi- fications, we propose a morphological transfer model. To illustrate this model, we apply it to the transfer of grammemes of definiteness and aspect for the language pair Russian–German and German–Russian, respectively. Keywords Machine translation · Transfer · Grammatical signification · Morphological mismatch · Deep-syntactic structure · Meaning-Text Theory · Definiteness · Aspect I. Mel’ˇcuk (B) Department of Linguistics and Translation, University of Montreal, C.P. 6128 “Centre-Ville”, Montreal, QC H3C 3J7, Canada e-mail: [email protected] L. Wanner Department of Information and Communication Technologies, Pompeu Fabra University, C. Roc Boronat, 138, 08018 Barcelona, Spain e-mail: [email protected] L. Wanner Institució Catalana de Recerca i Estudis Avançats (ICREA), 08010 Barcelona,
    [Show full text]
  • Inflection Versus Derivation and the Template for Athabaskan Verb Morphology1
    Inflection versus Derivation and the Template for Athabaskan Verb Morphology1 Andrej A. Kibrik Institute of Linguistics of the Russian Academy of Sciences 0. Introduction My purpose in this paper is two-fold: • to review some traditional and current approaches to the distinction between word derivation and inflection in the Athabaskan languages, and its relation to the template model of the Athabaskan verb form • to present some proposals toward an adequate account of the inflection vs. derivation distinction, and its relation to the structure of the Athabaskan verb form For those readers who are not specialized in Athabaskan linguistics it may not be immediately obvious why the two issues mentioned in the title of this paper – namely, the inflection vs. derivation distinction and the issue of verbal template – would be related. The reasons for connecting these two issues amount to the following. First, the Athabaskan verb is morphologically very complex, and there are many affixal (specifically, prefixal) positions in 1 This paper was originally presented at the Workshop on the Morphology-Syntax Interface in Athapaskan Languages held in Albuquerque, July 3, 1995. I was requested by the workshop organizers to present a talk specifically on this topic. In this publication I only minimally update the original text of the paper, slightly changing the introduction and adding some examples from recent field work. Between 1995 and now a number of important publications pertaining to the topic of this paper appeared, notably Rice 2000. Attempting to fully take them into account would mean completely rewriting this paper which would be a totally novel project.
    [Show full text]
  • Morphological Phrasemes in Totonacan Inflection
    MTT 2007, Klagenfurt, Austria, May 21 – 24 2007 Wiener Slawistischer Almanach, Sonderband 69, 2007 Morphological phrasemes in Totonacan inflection David Beck University of Alberta Department of Linguistics, 4-45 Assiniboia Hall Edmonton, AB T6G 2E7, Canada [email protected] Abstract Totonacan languages are well-known for their morphological complexity, including their “re- cycling” of grammatical elements in set combinations that are not compositional expressions of their individual meanings. This paper argues for the treatment of these combinations as morphological phrasemes, conventionalized morphological expressions equivalent to lexical- ized expressions such as idioms. Keywords Morphology; Totonac; inflection; morphological phraseme. 1 Introduction Upper Necaxa Totonac (UNT), spoken by some 3,400 people in the Sierra Norte of Puebla State in Mexico, belongs to the Totonac-Tepehua linguistic family, a group with roughly two dozen members. The languages in this family are renowned for their morphological complex- ity, wordforms in these languages routinely being composed of a root plus six or more deriva- tional and inflectional affixes. Totonacan languages are also known to be agglutinative and fairly easy to parse in terms of identifying the constituent affixes of a given wordform; how- ever, in spite of this the inflectional systems of these languages show a great many apparent irregularities, particularly in the use of particular combinations of affixes to express gram- matical meanings different from the meanings expressed by the same affixes when used out- side of these combinations. Recent efforts at the formal modelling of Totonacan inflectional systems (Beck, Holden & Varela n.d.) have focused on treating these affixal combinations as morphological phrasemes (Mel’çuk 1964, 1993-2000, vol.
    [Show full text]
  • Ticular Meanings of 'Active' and 'Passive' Voice United by the To
    Linguists disagree as to the category the 'perfect' belongs ticular meanings of 'active' and 'passive' voice united by the to. Some Russian authors (llyish, Vorontsova) think that it general meaning of 'voice'. forms part of the aspect system (llyish calls it the 'resultative' One of the most difficult problems connected with the aspect, Vorontsova - the 'transmissive'aspect. This point of category of voice is the problem of participle II, the most essen• view is shared by quite a number of grammarians both in our tial part of all 'passive voice' grammemes. The fact is that par• country and abroad. ticiple II has a 'passive' meaning not only when used with the Other linguists treat the 'perfect' as belonging to the sys• word-morpheme be, but also when used alone. tem of tense. Ivanova regards the 'perfect' as part of the 'tense- Participle II may have left-hand connections with the aspect' system. The first to draw attention to the fact that op• link-verbs. The combination of words thus formed is often ho• posemes like writes - has written, wrote - had written monymous with a 'passive voice' verb, as in His duty is ful• and so on represent a grammatical category different from that of tense was A.I. Smirnitsky. If we take a close look at the 'per• filled. The group is fulfilled cannot be treated as the passive fect', we can say that the 'perfect' serves to express priority, voice opposite of fulfils since whereas the non-perfect member of the opposeme leaves the 1) it does not convey the idea of action, but that of state, action unspecified to its being prior or not to another action, the result of an action; situation or point of time.
    [Show full text]
  • Inflectional Category of Voice1 Igor Mel’Čuk Observatoire De Linguistique Sens-Texte, Université De Montréal [email protected] C.J
    Inflectional Category of Voice1 Igor Mel’čuk Observatoire de linguistique Sens-Texte, Université de Montréal [email protected] C.J. Cela: – No estoy dormido, sino que estoy durmiendo. El ministro: – ¿Pero qué diferencia hay entre estar durmiendo y estar dormido? C.J. Cela: – La misma que entre estar jodiendo y estar jodido.2 1 The Problem Stated The category of voice has been in the focus of attention of linguists for several decades now. This is quite understandable: the “active ~ passive” correlation, shown in (1), (1 ) John killed the dog. ~ The dog was killed by John. as trivial as it might seem, touches on many aspects of modern linguistics: on semantics and syn- tax, on communicative organization of texts, on pragmatics, on lexicography, etc. The relevant literature is huge, and I limit myself to mentioning just three milestones: Xolodovič, ed. 1974, Siewerska 1984 and Shibatani, ed. 1988, to which I will add Xrakovskij 1974, Keenan 1985 and Givón 1990: 563–644 plus two volumes dedicated to voice and published by the Sankt-Peters- burg Typological School: Xrakovskij, ed. 1978 and 1981. My goal here is to introduce some clarity into the discussion: to propose rigorous definitions of the notion of ‘voice’ as an inflectional category and of particular voices. Doing so, I proceed from the basis established in Mel´čuk & Xolodovič 1970 and developed in Mel’čuk 1988: 186ff, 1993 and 2006: 181–262. The main thrust of this paper is thus metalinguistic: I pursue the goal of developing a logical system of linguistic notions and the corresponding terminology.
    [Show full text]
  • Download Pdf of the Lecture
    Workshop: Semantic maps: Where do we stand and where are we going? Liège, 26th-28th of June 2018 Andrej Malchukov Semantic maps, attractor networks and typological hierarchies Introduction Discussion of semantic maps and typological hierarchies, especially those related to local markedness Argue that LM-hierarchies share certain features both with typological hierarchies and semantic maps Illustrate it for two domains Voice, valency and transitivity (based on the results of the Leipzig Valency Classes Project) Tense/aspect and actionality (an ongoing joint project with V.S. Xrakovskij and his colleagues in St.Petersburg) Andrej Malchukov Semantic maps-Workshop; Liège, 26th-28th of June 2018 2 Leipzig Valency Classes Project (2010-2015) Systematic cross-linguistic investigation of valency patterns in 30 languages, based on the Leipzig Valency Questionnaire http://www.eva.mpg.de/lingua/valency/files/database_manual.php publication of the volume “Valency Classes: a comparative Handbook” (Malchukov & Comrie, eds. 2015; 2 vols), which including general chapters, as well as chapters on 30 individual languages publication of the database (ValPaL, Hartmann, Haspelmath & Taylor eds. 2013) with contributions on individual languages based on the Database Questionnaire http://www.valpal.info/ Andrej Malchukov Semantic maps-Workshop; Liège, 26th-28th of June 2018 3 Leipzig Valency Classes Project Team Goals of the Leipzig Valency Project How universal are valency classes Typological relevance of language-particular studies, such as (Levin
    [Show full text]
  • Rara & Rarissima
    Rara & Rarissima — Collecting and interpreting unusual characteristics of human languages Leipzig (Germany), 29 March - 1 April 2006 Invited speakers Larry Hyman (University of California, Berkeley) Frans Plank (Universität Konstanz) Ian Maddieson (University of California, Berkeley) Daniel L. Everett (University of Manchester) Objective Universals of language have been studied extensively for the last four decades, allowing fundamental insight into the principles and general properties of human language. Only incidentally have researchers looked at the other end of the scale. And even when they did, they mostly just noted peculiar facts as "quirks" or "unusual behavior", without making too much of an effort at explaining them beyond calling them "exceptions" to various rules or generalizations. Rarissima and rara, features and properties found only in one or very few languages, tell us as much about the capacities and limits of human language(s) as do universals. Explaining the existence of such rare phenomena on the one hand, and the fact of their rareness or uniqueness on the other, should prove a reasonable and interesting challenge to any theory of how human language works. Themes A suggested (but not exhaustive) list of relevant themes is: examples of rara from various languages examples of rara from all subfields of linguistics distribution and areal patterning the meaning of rara for linguistic theory the importance of rara for historical linguistics the concept of rara and its role in the history of linguistics methods for establishing and finding rara Local Organizers Jan Wohlgemuth, Michael Cysouw, Orin Gensler, David Gil The conference will be held in the lecture hall(s) of the Max Planck Institute for Evolutionary Anthropology, Leipzig and adjacent buildings.
    [Show full text]
  • Proceedings of the Workshop on Grammar and Lexicon: Interactions and Interfaces, Pages 1–6, Osaka, Japan, December 11 2016
    GramLex 2016 Grammar and Lexicon: Interactions and Interfaces Proceedings of the Workshop Eva Hajicovˇ a´ and Igor Boguslavsky (editors) December 11, 2016 Osaka, Japan Copyright of each paper stays with the respective authors (or their employers), unless indicated otherwise on the first page of the respective paper. ISBN 978-4-87974-706-8 ii Preface The proposal to organize the workshop on “Grammar and lexicon: interactions and interfaces” was motivated by suggestions made by several participants at previous COLINGs, who expressed their concern that linguistic issues (as a part of the computational linguistics agenda) should be made more visible at future COLINGs. We share the feeling of these colleagues that it is time to enhance the linguistic dimension in the CL spectrum, as well as to strengthen the focus on explanatory rather than engineering aspects, and we decided to organize a workshop with a broad theme concerning the relations between GRAMMAR and LEXICON, but specifically focused on burning issues from that domain. This idea was met enthusiastically by many colleagues who are also feeling that our conferences are excessively biased towards mathematical and engineering approaches to the detriment of discovering and explaining linguistic facts and regularities. The workshop is aiming at bringing together both linguistically as well as computationally minded participants in order to think of fruitful mutual exploitation of each other’s ideas. In the call for papers, we have tried to motivate the authors of the papers to bring in novel, maybe even controversial ideas rather than to repeat old practice. Two types of contributions are included in the programme of the workshop and in these Proceedings: (a) presentations of invited position statements focused on particular issues of the broader topic, and (b) papers selected through an Open Call for papers with a regular reviewing procedure.
    [Show full text]