Computational Challenges for Polysynthetic Languages
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Linguistics 203 - Languages of the World Language Study Guide
Linguistics 203 - Languages of the World Language Study Guide This exam may consist of anything discussed on or after October 20. Be sure to check out the slides on the class webpage for this date onward, as well as the syntax/morphology language notes. This guide discusses the aspects of specific languages we looked at, and you will want to look at the notes of specific languages in the ‘language notes: syntax/morphology’ section of the webpage for examples and basic explanations. This guide does not discuss the linguistic topics we discussed in a more general sense (i.e. where we didn’t focus on a specific language), but you are expected to understand these area as well. On the webpage, these areas are found in the ‘lecture notes/slides’ section. As you study, keep in mind that there are a number of ways I might pose a question to test your knowledge. If you understand the concept, you should be able to answer any of them. These include: Definitional If a language is considered an ergative-absolutive language, what does this mean? Contrastive (definitional) Explain how an ergative-absolutive language differs from a nominative- accusative language. Identificational (which may include a definitional question as well...) Looking at the example below, identify whether language A is an ergative- absolutive language or a nominative accusative language. How do you know? (example) Contrastive (identificational) Below are examples from two different languages. In terms of case marking, how is language A different from language B? (ex. of ergative-absolutive language) (ex. of nominative-accusative language) True / False question T F Basque is an ergative-absolutive language. -
What Are the Limits of Polysynthesis?
International Symposium on Polysynthesis in the World's Languages February 20-21, 2014 NINJAL, Tokyo What are the limits of Polysynthesis? Michael Fortescue, Marianne Mithun, and Nicholas Evans Of the various labels for morphological types currently in use by typologists ‘polysynthesis’ has proved to be the most difficult to pin down. For some it just represents an extreme on the dimension of synthesis (one of Sapir’s two major typological axes) while for others it is an independent category or parameter with far- reaching morphosyntactic ramifications. A recent characterization (Evans & Sasse 2002: 3f.) is the following: ‘Essentially, then, a prototypical polysynthetic language is one in which it is possible, in a single word, to use processes of morphological composition to encode information about both the predicate and all its arguments, for all major clause types [....] to a level of specificity, allowing this word to serve alone as a free-standing utterance without reliance on context.’ If the nub of polysynthesis is the packing of a lot of material into single verb forms that would be expressed as independent words in less synthetic languages, what exactly is the nature of and limitations on this ‘material’? The present paper investigates the limits – both upwards and downwards – of what the term is generally understood to cover. Reference Evans, N. & H.-J. Sasse (eds.). (2002). Problems of Polysynthesis. Berlin: Akademie Verlag. 1 2014-01-08 International Symposium on Polysynthesis in the World's Languages February 20-21, 2014 NINJAL, Tokyo Polysynthesis in Ainu Anna Bugaeva (National Institute for Japanese Language and Linguistics) Ainu is a typical polysynthetic language in the sense that a single complex verb can express what takes a whole sentence in most other languages. -
“Today's Morphology Is Yeatyerday's Syntax”
MOTHER TONGUE Journal of the Association for the Study of Language in Prehistory • Issue XXI • 2016 From North to North West: How North-West Caucasian Evolved from North Caucasian Viacheslav A. Chirikba Abkhaz State University, Sukhum, Republic of Abkhazia The comparison of (North-)West Caucasian with (North-)East Caucasian languages suggests that early Proto-West Caucasian underwent a fundamental reshaping of its phonological, morphological and syntactic structures, as a result of which it became analytical, with elementary inflection and main grammatical roles being expressed by lexical means, word order and probably also by tones. The subsequent development of compounding and incorporation resulted in a prefixing polypersonal polysynthetic agglutinative language type typical for modern West Caucasian languages. The main evolutionary line from a North Caucasian dialect close to East Caucasian to modern West Caucasian languages was thus from agglutinative to the analytical language-type, due to a near complete loss of inflection, and then to the agglutinative polysynthetic type. Although these changes blurred the genetic relationship between West Caucasian and East Caucasian languages, however, this can be proven by applying standard procedures of comparative-historical linguistics. 1. The West Caucasian languages.1 The West Caucasian (WC), or Abkhazo-Adyghean languages constitute a branch of the North Caucasian (NC) linguistic family, which consists of five languages: Abkhaz and Abaza (the Abkhaz sub-group), Adyghe and Kabardian (the Circassian sub-group), and Ubykh. The traditional habitat of these languages is the Western Caucasus, where they are still spoken, with the exception of the extinct Ubykh. Typologically, the WC languages represent a rather idiosyncratic linguistic type not occurring elsewhere in Eurasia. -
Stress Chapter
Word stress in the languages of the Caucasus1 Lena Borise 1. Introduction Languages of the Caucasus exhibit impressive diversity when it comes to word stress. This chapter provides a comprehensive overview of the stress systems in North-West Caucasian (henceforth NWC), Nakh-Dagestanian (ND), and Kartvelian languages, as well as the larger Indo-European (IE) languages of the area, Ossetic and (Eastern) Armenian. For most of these languages, stress facts have only been partially described and analyzed, which raises the question about whether the available data can be used in more theoretically-oriented studies; cf. de Lacy (2014). Instrumental studies are not numerous either. Therefore, the current chapter relies mainly on impressionistic observations, and reflects the state of the art in the study of stress in these languages: there are still more questions than answers. The hope is that the present summary of the existing research can serve as a starting point for future investigations. This chapter is structured as follows. Section 2 describes languages that have free stress placement – i.e., languages in which stress placement is not predicted by phonological or morphological factors. Section 3 describes languages with fixed stress. These categories are not mutually exclusive, however. The classification of stress systems is best thought of as a continuum, with fixed stress and free stress languages as the two extremes, and most languages falling in the space between them. Many languages with fixed stress allow for exceptions based on certain phonological and/or morphological factors, so that often no firm line can be drawn between, e.g., languages with fixed stress that contain numerous morphologically conditioned exceptions (cf. -
The Dative-Ergative Connection Miriam Butt
Empirical Issues in Syntax and Semantics 6 O. Bonami & P.Cabredo Hofherr (eds.) 2006, pp. 69–92 The Dative-Ergative Connection Miriam Butt 1 Introduction The classic division between structural vs. inherent/ lexical case proposed within Gov- ernment-Binding (Chomsky 1981) remains a very popular one, despite evidence to the contrary that the inner workings of case systems are far more complex than this simple division would suggest and that individual case markers generally make a systematic structural and semantic contribution that interacts in a generalizable manner with the lexical semantics of a predicate (see Butt 2006 for a survey of theories and data, Butt 2006:125 and Woolford 2006 for a proposed disctinction between inherent (general- izable) and lexical (idiosyncratic) case.).1 That is, the semantic contribution of case cannot (and should not) be relegated to the realm of lexical stipulation because there are systematic semantic generalizations to be captured. This fact has been recognized in more and more recent work. One prominent ex- ample is the work engaged in understanding the semantic generalizations underlying so-called object alternations, perhaps the most famous of which is the Finnish parti- tive alternation shown in (1)–(2). In Finnish, the accusative alternates with the parti- tive on objects. This alternation gives rise to readings of partitivity (1) and aspectual (un)boundedness (2).2 (1) a. Ostin leivän bought.1.Sg bread.Acc ‘I bought the bread.’ Finnish 1I would like to thank the organizers of the CSSP 2005 for inviting me to participate in the conference. I enjoyed the conference tremendously and the comments I received at the conference were extremely constructive, particularly those by Manfred Krifka. -
Nez Perce Verb Morphology Phillip Cash Cash University of Arizona, 2004
Nez Perce Verb Morphology Phillip Cash Cash University of Arizona, 2004 1.0 Introduction In this paper, I present an introduction to Nez Perce verb morphology. The goal of such a study is to describe the internal structure of Nez Perce verb form and meaning. It takes as its task identifying the constituent elements of words and examining the rules that govern their co-occurrence. The Nez Perce language is a polysynthetic language and, as such, it displays an enriched morphological system whereby complex propositions can be expressed at the level of a single word. Typologically, utterances of the polysynthetic type suggest that speakers of these languages employ a structural principle of dependent-head synthesis that treats the minimal units of meaning, that is, its morphemes, in ways different from other world languages. This is simply to say that the morphology plays a more prominent role at the clausal level than in synthetic languages like English. Consider a concrete example as in /hiwlé·ke•yke/ ‘He/she/it ran.’ When we examine the structure of a morphosyntactic word in Nez Perce, we are interested in i) identifying the pairing of each morpheme’s phonological form, often called its surface structure, with the content specified in its lexical entry, and ii) identifying how morphemes are organized and combined with respect to grammatical principles. First, we begin by examining a morphosyntactic word through its component parts. Four main representations of words are used in this analysis, these are i) the surface form, ii) the morphological form, iii) the morphological gloss, and iv) the free translation. -
٧٠٩ Morphological Typology
جامعة واسط العـــــــــــــــدد السابع والثﻻثون مجلــــــــة كليــــــــة التربيــــــة الجزء اﻷول / تشرين الثاني / 2019 Morphological Typology: A Comparative Study of Some Selected Languages Bushra Farhood Khudheyier AlA'amiri Prof. Dr. Abdulkareem Fadhil Jameel, Ph.D. Department of English, College of Education/ Ibn Rushud for Humanities Baghdad University, Baghdad, Iraq Abstract Morphology is a main part of English linguistics which deals with forms of words. Morphological typology organizes languages on the basis of these word forms. This organization of languages depends on structural features to mould morphological, patterns, typologising languages, assigning them to analytic, or synthetic types on the base of words segmentability and invariance, or measuring the number of morphemes per word. Morphological typology studies the universals in languages, the differences and similarities between languages in the structural patterns found in different languages, which occur within a restricted range. This paper aims at distinguishing the various types of several universal languages and comparing them with English. The comparison of languages are set according to the number of morphemes, the degree of being analytic ,or synthetic languages by given examples of each type. Accordingly, it is hypothesed that languages are either to be analytic, or synthetic according to the syntactic and morphological form of morphemes and their meaning relation. The analytical procedures consist of expressing the morphological types with some selected examples, then making the comparison between each type and English. The conclusions reached at to the point of the existence of similarity between these morphological types . English is Analytic , but it has some synythetic aspects, so it validated the first hypothesis and not entirely refuted the second one. -
LNGT 0250 Morphology and Syntax
3/22/2016 LNGT 0250 Announcements Morphology and Syntax • I’m extending the deadline for Assignment 3 until 8pm tomorrow Wednesday March 23. This should allow you to take advantage of my office hours tomorrow if you have questions. • Reactions to lecture on Menominee? • Answer the questions on the short handout, and give it back to me on Thursday. I’ll use that information in forming your Linguistic Lecture #11 Diversity Project groups. March 22nd, 2016 2 Neatist Vegetarian vs. Humanitarian 3 4 Today’s agenda Restrictions on productivity • Unfinished business: Constraints on the • ‐ee productivity of derivational morphemes. – draw drawee • Morphological typology: How do languages – pay payee differ? – free *freeee • Nominal and verbal categories of inflectional – accompany *accompanyee morphology. 5 6 1 3/22/2016 Spanish diminutive morpheme: ‐illo Restrictions on productivity • mesa mesillo ‘little table’ • private privatize • grupo grupillo ‘little group’ • capital capitalize • gallo *gallillo ‘little rooster’ • corrupt *corruptize • camello *camellillo ‘little camel’ • secure *securize 7 8 Restrictions on productivity Restrictions on productivity • combat combatant fight *fightant • escalate deescalate • brutal brutality brittle *brittality • assassinate *deassassinate • monster monstrous spinster *spinstrous • parent parental mother *motheral • But notice: murderous, thunderous 9 10 Index of synthesis: How many morphemes Morphological typology does your language have per word? • One aspect of morphological variation -
Peter Arkadiev & Yury Lander October 2018 Version. Submitted to Maria
THE NORTHWEST CAUCASIAN LANGUAGES Peter Arkadiev & Yury Lander October 2018 version. Submitted to Maria Polinsky (ed.), The Oxford Handbook of the Languages of the Caucasus. 1. The languages and their speakers The Northwest Caucasian (NWC) family, also known as West Caucasian or Abkhaz- Adyghe, comprises five languages, which are grouped into three branches: • Abkhaz-Abaza: o Abkhaz (including the Sadz, Ahchypsy, Bzyp, Tsabal, and Abzhywa dialects), o Abaza (nominally including the Tapanta and Ashkharywa dialects), • Ubykh, • Circassian: o West Circassian (also known as Adyghe, including the Bzhedugh, Shapsugh, Abzakh/Abadzekh, and Temirgoy dialects), o Kabardian (also known as East Circassian and including the Besleney, Baksan, Mozdok, Malka, Terek, and Kuban dialects). This conventional division is based on heterogeneous linguistic and sociolinguistic considerations and hence is a kind of simplification. For example, Abaza consists of two dialects, Tapanta and Ashkharywa, of which the latter is closer to Abkhaz than to Tapanta, West Circassian and Kabardian are often considered by their speakers to constitute a single Adyghe (or Circassian) language (despite the absence of mutual intelligibility), and some dialects of Abkhaz (e.g., Sadz) and West Circassian (e.g., Shapsugh) may be treated as separate languages. Nonetheless, below we use the language list as given above with a proviso that whenever it is possible we will try to overtly mark the variety referred to – with the exception of Abaza, whose examples always belong to the Tapanta dialect. During the centuries, the speakers of the languages of the family inhabited areas to the North and partly to the South of the Western part of the Caucasian Ridge including the Northeast coast of the Black Sea. -
Towards an Alternative Description of Incomplete Sentences in Agglutinative Languages S. Ido a Thesis Submitted in Fulfilment O
Towards an Alternative Description of Incomplete Sentences in Agglutinative Languages S. Ido A thesis submitted in fulfilment of the requirements for the degree of Doctor of Philosophy 2001 University of Sydney I declare that this thesis is all my own work. I have acknowledged in formal citation the sources of any reference I have made to the work of others. ____________________________ Shinji Ido ____________________________ Date Title: Towards an Alternative Description of Incomplete Sentences in Agglutinative Languages Abstract: This thesis analyses ‘incomplete sentences’ in languages which utilise distinctively agglutinative components in their morphology. In the grammars of the languages dealt with in this thesis, there are certain types of sentences which are variously referred to as ‘elliptical sentences’ (Turkish eksiltili cümleler), ‘incomplete sentences’ (Uzbek to‘liqsiz gaplar), ‘cut-off sentences’ (Turkish kesik cümleler), etc., for which the grammarians provide elaborated semantic and syntactic analyses. The current work attempts to present an alternative approach for the analysis of such sentences. The distribution of morphemes in incomplete sentences is examined closely, based on which a system of analysis that can handle a variety of incomplete sentences in an integrated manner is proposed from a morphological point of view. It aims to aid grammarians as well as researchers in area studies by providing a simple description of incomplete sentences in agglutinative languages. The linguistic data are taken from Turkish, Uzbek, -
The Word in Polysynthetic Languages
To appear in the Oxford Handbook of Polysynthesis e ‘word’ in polysynthetic languages: phonological and syntactic challenges* Balthasar Bickel¹ & Fernando Zúñiga² ¹University of Zürich ²University of Bern Polysynthesis presupposes the existence of ‘words’, a domain or unit of phonology and syntax that is extremely variable within and across languages: what behaves as a ‘word’ with respect to one phonological or syntactic rule or constraint may not behave as such with respect to other rules or constraints. Here we develop a system of variables that allows cataloguing all verb-based domains in a language in a boom-up fashion and then determining any potential convergence of domains in an empirical way. We apply the system to case studies of Mapudungung and Chintang. ese confirm earlier observations that polysynthetic languages do not operate with unified units of type ‘word’ in either phonology or syntax. Keywords: word, clitic, affix, incorporation, head 1 Introduction Consider the following data from Mapudungun (unclassified; Chile and Argentina), a canonical polysynthetic language:¹ (1) Mapudungun a. entu-soyüm-yaw-küle-i. remove-shrimp-PERAMB-PROG-IND ‘ey are going around gathering shrimp.’ (Salas 2006:179) * e first author thanks Sabine Stoll, Jekatarina Mažara, and Robert Schikowski for insightful discussion of some of the issues in Chintang. Both authors thank Rik Van Gijn and Nick Evans for helpful comments on a first dra. e research reported here was supported by Grant Nr. CRSII1_160739 (“Linguistic morphology in time and space”) from the Swiss National Science Foundation. ¹ For the Mapudungun data cited in this article, we follow the traditional Alfabeto Mapue Unificado convention still widely used by linguists in Chile, which is specific about segmental representation but, quite like other spelling conventions (like the one used in Argentina, or the Azümefi), not prescriptive in a principled way when it comes to word boundaries. -
Multi-Task Neural Model for Agglutinative Language Translation
Multi-Task Neural Model for Agglutinative Language Translation Yirong Pan1,2,3, Xiao Li1,2,3, Yating Yang1,2,3, and Rui Dong1,2,3 1 Xinjiang Technical Institute of Physics & Chemistry, Chinese Academy of Sciences, China 2 University of Chinese Academy of Sciences, China 3 Xinjiang Laboratory of Minority Speech and Language Information Processing, China [email protected] {xiaoli, yangyt, dongrui}@ms.xjb.ac.cn Abstract (Ablimit et al., 2010). Since the suffixes have many inflected and morphological variants, the Neural machine translation (NMT) has vocabulary size of an agglutinative language is achieved impressive performance recently considerable even in small-scale training data. by using large-scale parallel corpora. Moreover, many words have different morphemes However, it struggles in the low-resource and morphologically-rich scenarios of and meanings in different context, which leads to agglutinative language translation task. inaccurate translation results. Inspired by the finding that monolingual Recently, researchers show their great interest data can greatly improve the NMT in utilizing monolingual data to further improve performance, we propose a multi-task the NMT model performance (Cheng et al., 2016; neural model that jointly learns to perform Ramachandran et al., 2017; Currey et al., 2017). bi-directional translation and agglutinative Sennrich et al. (2016) pair the target-side language stemming. Our approach employs monolingual data with automatic back-translation the shared encoder and decoder to train a as additional training data to train the NMT model. single model without changing the standard Zhang and Zong (2016) use the source-side NMT architecture but instead adding a token before each source-side sentence to monolingual data and employ the multi-task specify the desired target outputs of the two learning framework for translation and source different tasks.