Enriching Plwordnet with Morphology

Total Page:16

File Type:pdf, Size:1020Kb

Enriching Plwordnet with Morphology Enriching plWordNet with morphology Agnieszka Dziob and Wiktor Walentynowicz G4.19 Research Group, Department of Computational Intelligence Wrocław University of Technology, Wrocław, Poland {agnieszka.dziob,wiktor.walentynowicz}@pwr.edu.pl Abstract tions it enters. Thus, LUs having different rela- tions in the language system cannot be treated as In the paper, we present the process of synonyms and belong to the same synset (Derwo- adding morphological information to the jedowa et al., 2008). Maziarz et al. (2013) for- Polish WordNet (plWordNet). We de- mulated the concept of constitutive relations and scribe the reasons for this connection and constitutive features, i.e. those which differentiate the intuitions behind it. We also draw at- the meaning. They include all synset relations, ex- tention to the specificity of the Polish mor- cept fuzzynymy, and LU features, such as stylistic phology. We show in which tasks the mor- register, aspect for verbs, and semantic classes for phological information is important and adjectives and verbs (Maziarz et al., 2016). There- how the methods can be developed by ex- fore, there are no theoretical and methodological tending them to include combined mor- assumptions, which would allow to define inflec- phological information based on WordNet. tional features as distinguishing meanings of an LU in plWordNet. At the same time, it was also 1 Introduction not explicitly stated that morphological features plWordNet is a very large wordnet of Polish. cannot influence meaning. The 4.1 version presented in Dziob et al. (2019) Lexicographic works are ongoing, and currently contains 192,495 lemmas, 290,366 lexical units focus on completing the structure of plWord- (henceforth, LUs), and 224,179 synsets belonging Net with new LUs, increasing the density of to four parts of speech: verbs, adjectives, adverbs lexico-semantic relations, and correcting errors and nouns. Since 2012, there have been carried resulting from manual work. The most recent out ongoing works on its connection to Princeton work has consisted of connecting morphological WordNet (Rudnicka et al., 2012). descriptions from the Grammatical Dictionary of For the description of synsets and LUs, lexical- Polish (pol. Słownik Gramatyczny J˛ezyka Pol- semantic relations are used, the concept of which skiego, henceforth SGJP) (Saloni et al., 2007). was taken from Princeton WordNet (Fellbaum, This process has four practical objectives: 1) in- 1998) and EuroWordNet (Vossen, 2002). In the vestigating the influence of morphological char- course of those works, the need emerged to create acteristics of LUs on their semantic description; new relations specific to the Polish language (Pi- 2) verifying the membership to parts of speech of asecki et al., 2012). It is related to the necessity morphologically ambiguous LUs and lemmas; 3) of describing derivational dependencies for a lan- completing the semantic description of the exist- guage with a rich derivational morphology. Inflec- ing lemmas with new senses, based on their mor- tional morphology was not dealt with in plWord- phological description in the SGJP; 4) building a Net. Following Miller (1995), we assumed that practical resource, combining semantic and mor- describing inflectional morphology is the task of a phological description, for tasks related to the pro- separate resource that is morphological dictionar- cessing of the Polish language. The result of this ies for the Polish language, which support the use work is plWordNet combined with the morpholog- of plWordNet. ical description from SGJP, created automatically In plWordNet meaning is defined according to with a manual disambiguation of morphologically the assumptions of relational semantics (Lyons, ambiguous lemmas, i.e. those which have more 1977) that is as a bundle of lexico-semantic rela- than one pattern of inflection. The purpose of this paper is to present the re- ogy has also been carried out on a larger scale for sults of combining resources and to indicate the Bulgarian (Koeva, 2008). applications of them. The works that are the closest to those presented in this paper were described in Obradovic´ and 2 The Problem of Morphology Stankovic´ (2007). The authors have developed a Description tool for the creation of complex lexicographic data 2.1 The Morphological Description in obtained from wordnets, morphological dictionar- Wordnets ies, and text corpora. It is possible to highlight several important as- The assumption of Princeton WordNet was a se- pects of morphological description in wordnets: mantic description, i.e. including derivational, not 1) on a wider scale it describes derivational, not inflectional morphology (Miller et al., 1990). Still, inflectional morphology; 2) there is a regular re- morphological data resources are developed as in- lationship between inflection and derivation, but dependent linguistic databases. One of them is these two levels of description are not treated as CELEX – a lexical database for Dutch and English equivalent; 3) the combination of semantic and (Van der Wouden, 1990), extended in 2.0 version morphological description (both, at the inflec- with German (Baayen et al., 1995). It contains tional and derivational level) is useful, e.g. for the orthographic, phonological, morphological (also tasks related to Word Sense Disambiguation and, inflectional), syntactic, and statistical information in connection with it, the extraction of information (frequency in text corpora), but it does not contain from texts and building knowledge bases. semantics. The morphological description of the deriva- 2.2 The Specificity of the Polish Morphology tion and inflection in CELEX offers great op- portunities to enrich it with semantic informa- As already mentioned, plWordNet describes four tion. Hathout (2002) describes combining the parts of speech: verbs, adjectives, adverbs, and morphological resources of CELEX for the En- nouns. In Polish, nouns and adjectives are in- glish language with Princeton WordNetto create flected by seven cases in two numbers: singular a language-independent mechanism for obtaining and plural. Furthermore, adjectives are inflected semantic relational data (synonyms and deriva- for gender, while nouns are always lexically spec- tives). Similar studies with the use of CELEX ified for grammatical gender. and semantic-relational thesauri were conducted There are five genders: three masculine (per- for Dutch (Kraaij and Pohlmann, 1996). This re- sonal, animate, and inanimate), feminine and search is particularly applicable to the extraction neuter (Lewandowska-Tomaszczyk et al., 2012). of information and knowledge from text. For example, pies (zw) ‘dog’ and pies (os) ‘cop’ The inflectional morphology is less of a prob- (depreciative and colloquial) differ in their gen- lem for languages with residual inflection, such ders and, consequently, in the pattern of the inflec- as English, but a serious challenge for highly in- tion. The gender of adjectives depends on the gen- flected languages. In the paper Henrich et al. der of nouns they have syntactic relations in the (2012), semantic data from Princeton Word- text, e.g. szafa ‘wardrobe’, feminine, czerwon-a Net and GermaNet (Hamp and Feldweg, 1997) ‘red’ an adjective, feminine. Moreover, adjectives and morphological data for these languages from and adverbs have three degrees: positive, compar- Wiktionary were used to create a method for ative, and superlative. sense-annotation and automatically annotated text Verbs in Polish are inflected for two numbers corpora. and three persons for each of them, four tenses (in- Slavic languages have an even more compli- cluding two future ones that differ with each other cated inflection than Germanic ones. The paper only grammatically), and three modes of express- of Pala and Hlaváckovᡠ(2007) presents the re- ing modality (indicative, conditional, imperative). sults of the work consisting of adapting a mech- They have an assigned aspect having grammati- anism of the Czech morphological analyzer Ajka cal and semantic functions (Dziob and Piasecki, (Sedlácekˇ and Smrž, 2001) to extend the Czech 2018). They form gerunds and four types of par- WordNet with derivational relations. For Slavic ticiples, which are treated as forms of the verb. A languages, the research on derivational morphol- semantic-syntactic feature of the verb (to a lesser extent also of other parts of speech) is valence, un- Net contains, allows searching for new patterns in derstood as the ability of predicates to attach ar- data. guments in specific forms and syntactic positions (Przepiórkowski et al., 2014). For example, the 3 Resources verb jes´c´ ‘to eat’ opens three syntactic positions, 3.1 Morfeusz and SGJP for a subject, an object, and a circumstance. In this case, the object is expressed as a noun in the SGJP (Saloni, 2012) aims to give grammatical accusative case. Walenty is connected at the se- characteristics of Polish words. The main ele- mantic layer to the plWordNet by using synsets ment of this characteristic is an open description to determine a semantic preference, e.g. ‘jes´c’´ + of the inflection of units taken into account by giv- jedzenie 2 ‘food’. ing all their forms of inflection and determining Traditional Polish grammars cf. (Grzegor- their grammatical functions. The dictionary does czykowa et al., 1998) distinguish a quite large not contain lexeme sense. Morfeusz2 (Wolinski,´ group of numbers and pronouns. In syntactically 2014) is a morphological analyzer that can use oriented grammars, e.g. (Saloni, 2012),
Recommended publications
  • Palatalization Processes in Kashubian from the Perspective of Lexical Phonology and Optimality Theory
    Palatalization processes in Kashubian from the perspective of Lexical Phonology and Optimality Theory Justyna Kosecka Praca doktorska wykonana w Instytucie Anglistyki Wydziału Neofilologii Uniwersytetu Warszawskiego pod kierunkiem prof. zw. dr. hab. Jerzego Rubacha Warszawa 2020 Palatalization processes in Kashubian from the perspective of Lexical Phonology and Optimality Theory Justyna Kosecka A dissertation submitted to the Institute of English Studies, University of Warsaw in partial fulfilment of the requirements for the degree of Doctor of Philosophy in Linguistics Thesis supervisor: Professor Jerzy Rubach Warszawa 2020 Preface This dissertation examines palatalization processes of the North-Slavic language, Kashubian, spoken in East Pomerania, in Northern Poland. The investigation of the processes is framed in the theoretical context of three generative phonological theories: Lexical Phonology, Optimality Theory, and Derivational Optimality Theory. The primary goal of this dissertation is to test the operation of the theoretical frameworks on the material from Kashubian. The dissertation also aims at analysing palatalization processes active in Kashubian, especially those applying to coronals and velars. The third aim of this dissertation is to take part in the long-standing debate on the status of the vowel [ɨ] in Slavic languages, namely, whether [ɨ] is an allophone of /i/ or whether it constitutes a separate phoneme. This dissertation is organised as follows. Chapter 1 presents the goals of the dissertation and gives a general introduction to the frameworks of Lexical Phonology and Optimality Theory, to be used throughout the dissertation. Chapter 2 provides the basic facts regarding the Kashubian consonants and vowels that are relevant from the point of view of the dissertation.
    [Show full text]
  • Polish Katarzyna DZIUBALSKA-KOŁACZYK and 1 Bogdan WALCZAK ( )
    Polish Katarzyna DZIUBALSKA-KOŁACZYK and 1 Bogdan WALCZAK ( ) 1. The identity 1.1. The name In the 10th century, individual West Slavic languages were differentiated from the western group, Polish among others. The name of the language comes from the name of a tribe of Polans (Polanie) who inhabited the midlands of the river Warta around Gniezno and Poznań, and whose tribal state later became the germ of the Polish state. Etymologically, Polanie means ‘the inhabitants of fields’. The Latin sources provide also other forms of the word: Polanii, Polonii, Poloni (at the turn of the 10th and 11th century king Bolesław Chrobry was referred to as dux Poloniorum in The Life of St. Adalbert [Żywot św. Wojciecha]) (cf. Klemensiewicz : 1961-1972). 1.2. The family affiliation 1.2.1. Origin The Polish language is most closely related to the extinct Polabian- Pomeranian dialects (whose only live representative is Kashubian) and together with them is classified by Slavicists into the West Lechitic subgroup of the Slavic languages. It is less closely related to the remaining West Slavic languages, i.e. Slovak, Czech and High- and Low Sorbian, and still less closely to the East and (1) Katarzyna Dziubalska-Kołaczyk (School of English at Adam Mickiewicz University, Poznań) is Professor of English linguistics and Head of the School. She has published extensively on phonology and phonetics, first and second language acquisition and morphology, in all the areas emphasizing the contrastive aspect (especially with Polish, but also other languages, e.g. German, Italian). She has taught Polish linguistics at the University of Vienna.
    [Show full text]
  • Current Alternations in Inflection of Polish Masculine Inanimate Nouns in the Singular: a Pilot Study
    Investigationes Linguisticae, vol. Pozna IX, Current Alternations in Inflection of Polish Masculine Inanimate Nouns in the Singular: A Pilot Study Hanna Mausch ¢ , April 2003 Institute of Linguistics, Adam Mickiewicz University ul. Mi dzychodzka 5, 60-371 Pozna ¡ , POLAND [email protected] Abstract This paper discusses current use of genitive and accusative case forms of common nouns borrowed from English that are increasingly present in Polish and identified as masculine inanimate by virtue of their consonantal offsets in the nominative singular (bare forms). * In the first part, an overview of approved usage is presented. That is, a survey the assignment of nouns to various inflectional patterns based on works of three acclaimed Polish linguists is offered. In the second part, a pilot study involving a test given to Polish students is discussed. Finally, standard patterns are compared to the actual usage of the nouns tested. The objective is to identify regularities in the assignment of case endings. Results of this pilot study are to be treated as hints for further research on current alternations in case forms. 1. Overview In Modern Polish, nouns follow a number of declensions. Two tendencies can be observed: genitive forms are increasingly used in the accusative case and, somewhat less often, accusative forms are used in the genitive case. These alternations are common in both numbers, singular and plural, however in the singular paradigms, these developments are most striking. 1 In the singular, there are three basic patterns; there are masculine, feminine and neuter declensions. 2 Almost all masculine nouns 3 in the nominative singular end in a consonant.
    [Show full text]
  • Polish Inflection Fit for Man and Machine
    Polish Inflection Fit for Man and Machine Marjorie J. McShane Memoranda in Computer and Cognitive Science Computing Research Laboratory New Mexico State University January 22, 2001 Table of Contents Preface and Acknowledgments i Introduction 1 Part I: Nouns 25 Masculine 26 Feminine 59 Neuter 79 Part II: Adjectives 89 Part III: Adverbs 101 Part IV: Verbs 109 -ać 110 -awać 129 -ąć 130 -c 135 -eć 143 -ić 153 -ieć 167 -iwać 176 -nąć 178 -ować 195 -ść 197 -uć 209 -yć 210 -ywać 215 -źć 217 References 221 Preface and Acknowledgments This book may be of interest to the following groups of people: teachers of Polish who seek a more detailed treatment of Polish inflection than is found in traditional textbooks, as well as extensive practice examples; students of Polish with those same goals; scholars studying Polish morphology, since a fresh perspective can often lead to new insights; and computational linguists, because the whole approach was driven by computational considerations. However, since the materials contained herein were not expressly catered to any single one of these groups, some degree of selective focus might be in order. This work is not a traditional treatment of Polish inflection, and users should be aware that some traditional assumptions must be left behind. This issue is discussed at length in the Introduction. Special thanks to Adam Przepiórkowski. Not being a native speaker, I particularly appreciate his help with the Polish and his many excellent suggestions. Naturally, any remaining mistakes are my own and, if brought to my attention, will be posted as a list of errata on the web page housing the book.
    [Show full text]
  • Allomorphy in Optimality Theory: Polish Iotation
    ALLOMORPHY IN OPTIMALITY THEORY: POLISH IOTATION JERZY RUBACH GEERT E. BOOIJ University of Iowa Free University, Amsterdam University of Warsaw This article discusses IOTATION, a process that has been analyzed in generative phonology as a palatalization rule. We argue that optimality theory predicts the treatment of this process in terms of allomorphy, which in fact is desirable for a synchronic analysis. The consequence is that, with regard to iotation effects, the task of phonology is to account for the distribution of allomorphs rather than to derive them from a single underlying representation. While, as a result of diachronic changes, the allomorphs are arbitrary, their distribution is not. It follows from the interaction of universal phonological and morphological constraints, and from the considerations of segment markedness.* All phonological theories address the problem of the surface realization of mor- phemes. This is often straightforward because many morphemes have one invariable realization; for example, Polish dom ‘house’ is always represented as [dom]. Central to phonology are instances that are realized differently in different contexts. Tradition- ally, it is said that such instances show allomorphy because allomorphy is understood as a situation in which two or more different morphs share the same grammatical or semantic function and hence are allomorphs of one morpheme. The task of phonology is then to explain the distribution and the phonological shape of allomorphs. The most straightforward way to achieve this goal is to posit a single underlying representation and account for surface allomorphs in terms of phonological generalizations. A clear example of this situation is an analysis of the surface allomorphs [hut] and [hud] of hoed ‘hat’ in Dutch.
    [Show full text]
  • The Identification of Bases in Morphological Paradigms
    The Identification of Bases in Morphological Paradigms Adam C. Albright University of California, Los Angeles June, 2002 A dissertation submitted in partial satisfaction of the requirements for the degree Doctor of Philosophy in Linguistics UCLA Linguistics Department Thesis committee: Bruce Hayes, co-chair Donca Steriade, co-chair Carson Schutze¨ Brent Vine c Copyright by Adam C. Albright 2002 Contents 1 Introduction 1 1.1 The problem of bases in word-based morphology . 8 1.2 The problem of bases in stem-based morphology . 13 1.3 The problem of bases in correspondence theory . 14 1.4 Plan of the thesis . 15 2 Paradigm leveling in Yiddish 17 2.1 Leveling to the 1sg in Yiddish verb paradigms . 18 2.1.1 MHG present tense patterns . 18 2.1.2 Yiddish present tense patterns . 19 2.2 The 1sg as the “optimal base” in Yiddish . 23 2.2.1 Identifying the optimal base . 23 2.2.2 Using the 1sg as the base of Yiddish verb paradigms . 28 2.3 Comparison with other German dialects . 30 2.4 Local summary . 31 2.5 Significance of the single surface base hypothesis . 32 3 Identifying bases algorithmically 35 3.1 Desiderata for an automated learner . 36 3.2 The minimal generalization learner . 38 3.3 Metrics for sub-grammars . 43 3.4 Results for synthetic languages . 45 3.4.1 Synthetic language 1: Neutralization in suffixed forms . 45 3.4.2 Synthetic language 2: Neutralization in suffixless forms . 47 3.4.3 Synthetic language 3: Morphological neutralization . 48 3.4.4 Synthetic language 4: Lexical exceptions .
    [Show full text]
  • The Morphophonology of Russian Adjectival Inflection Morris Halle Ora Matushansky
    The Morphophonology of Russian Adjectival Inflection Morris Halle Ora Matushansky In this article, we present the morphosyntactic structure underlying the Russian adjectival declension and the phonological rules that apply to it to derive the surface representations. We describe the two declen- sion classes of Russian adjectives and argue that adjectives and nouns employ the same theme suffixes (-oj- and -o-) and, importantly, that choice of theme suffix also determines choice of Case exponents. On this view, there is no special adjectival declension class; instead, Case exponents are shared between adjectives and nouns, and the choice of a ‘‘paradigm’’ is determined by the choice of the theme suffix. The article covers all adjectival inflections, including those of the possess- ives, demonstratives, interrogatives, and paucal numerals. Keywords: Russian, adjectives, declension, theme suffixes, morphol- ogy, phonology 1 Introduction The goal of this article is to determine the underlying syntactic structure of Russian adjectives in all of their inflected forms. We aim to show that all inflected forms of an adjective have a simple, transparent morphological composition that is obscured by the operation of various phonological rules, independently motivated in Russian. Our more specific objectives are (a) to describe the two declension paradigms of Russian adjectives and (b) to determine the function of the theme suffix -oj- and of its absence from the surface representation. We are also concerned in this article with elucidating the nature of theme suffixes (so called by analogy with theme vowels of Latin and Catalan, on which see Oltra-Massuet 2000) and with comparing theme suffixes characteristic of adjectival declension classes with those of nominal declension classes and with those of verbal conjugation classes (Halle and Matushansky, in preparation).
    [Show full text]
  • Boundaries Crossed: the Influence of English on Modern Polish
    Volume 2 (1), 2009 ISSN 1756-8226 Boundaries Crossed: The Influence of English on Modern Polish MAGDALENA SZTENCEL Newcastle University The influence of English on Polish dates back to the turn of the eighteenth and nineteenth centuries; however, it gained momentum after 1989, when Poland overthrew communism and opened its borders to the West. In research on the Polish language, which culminated at the end of the twentieth century, Mańczak-Wohlfeld (2006) recorded about 1700 English borrowings collected from various dictionaries, media and spoken language sources. Because of the continuous influx of borrowings, this number is probably not truly indicative of the real scale of the influence that English has been exerting on Polish. This paper examines different aspects of lexical as well as structural borrowing from English to Polish and discusses social attitudes to the infiltration from English. Based on the current literature and data collected from Polish internet pages and the Polskie Wydawnictwo Naukowe (PWN, “Polish Scientific Publishers”) corpus, the paper shows that the boundaries of casual (superficial) contact of Polish with English have now been crossed. I argue that, with the structural infiltration at play, the contact between the two languages should be characterised as “more intense”, which parallels Stage 3 of Thomason and Kaufman’s (1988) borrowing scale. The paper also shows how the tensions between the prescriptive stand of language purists and the linguistic behaviour of native speakers of Polish have resulted in the unpredictability of loan assimilation process. Lexical borrowings and their influence on Polish orthography Otwinowska-Kasztelanic (2000) distinguishes three types of lexical borrowing from English in Polish.
    [Show full text]
  • Representation and Processing of Composition, Variation and Approximation in Language Resources and Tools Agata Savary
    Representation and Processing of Composition, Variation and Approximation in Language Resources and Tools Agata Savary To cite this version: Agata Savary. Representation and Processing of Composition, Variation and Approximation in Lan- guage Resources and Tools. Computation and Language [cs.CL]. Université François Rabelais Tours, 2014. tel-01322052 HAL Id: tel-01322052 https://hal.archives-ouvertes.fr/tel-01322052 Submitted on 26 May 2016 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Université François Rabelais Tours Année universitaire (Academic Year): 2013-2014 Discipline (Domain): Informatique (Computer Science) Dissertation en vue d’obtention d’une Habilitation à diriger des recherches (Habilitation dissertation in view of an accreditation to supervise research) présentée et soutenue publiquement par (orally presented by) Agata SAVARY 27 Mars 2014 devant le jury suivant (in front of the following jury): Anne ABEILLÉ Professeur des universités Université Paris 7, France (Full professor) Jean-Yves ANTOINE Professeur des universités Université
    [Show full text]
  • Case in Heritage Polish. a Cross-Generational Approach
    Case in Heritage Polish. A Cross-Generational Approach Dissertation Presented in Partial Fulfilment of the Requirements for the Degree Doctor of Philosophy of The Ohio State University By Izolda Wolski-Moskoff, M.A. Graduate Program in Slavic and East European Languages and Cultures The Ohio State University 2019 Dissertation Committee: Ludmila Isurin, Advisor Leslie Moore Helena Goscilo Copyright by Izolda Wolski-Moskoff 2019 Abstract Nominal case morphology is often considered one of the most reanalyzed elements of heritage grammar (Benmamoun et al., 2013; Montrul, 2016). Several case typologies have been put forward to investigate which cases are more vulnerable in heritage languages. The two considered in this dissertation are based solely on the internal qualities of cases. Benmamoun et al. (2013) have suggested that structural case is better preserved than inherent case, whereas Laskowski (2014) has proposed that strong cases replace weak ones in the language of Polish heritage speakers. Neither of these approaches, however, has compared divergences between the speech of heritage speakers and the language of their parents. By analyzing patterns of case use in the language of two generations of Polish immigrants, this dissertation offers new insights into nominal morphology in the Polish heritage language. It also analyzes whether any of the proposed case typologies are applicable to heritage Polish. To gain a more comprehensive view of nominal case morphology in heritage Polish, the present study focuses not only on the internal characteristics of cases, but also on such external qualities as case markings. Conducted mainly in Chicago, the study investigates the case use and knowledge of twenty-five Polish heritage speakers, twelve first-generation immigrants, and a control group of twelve monolinguals from Poland.
    [Show full text]
  • Complex Declension Systems and Morphology in Fluid Construction Grammar: a Case Study of Polish
    Complex Declension Systems and Morphology in Fluid Construction Grammar: A Case Study of Polish Sebastian Höfer Robotics and Biology Laboratory, Technische Universität Berlin, Germany Abstract. Different languages employ different strategies for grammati- cal agreement. Slavic languages such as Polish realize agreement with rich declension systems. The Polish declension system features seven cases, two number categories and is subdivided further with respect to gender and animacy. In order to differentiate among these different grammati- cal categories Polish exhibits a complex, syncretistic and highly irregular morphology. But not only the morphology is complex, the grammatical rules that govern agreement are, too. For example, the appropriate case of a noun in a verbal phrase does not only depend on the verb itself but also on whether the verb is in the scope of a negation or not. In this paper we give an implementation of the Polish declension system in Fluid Construction Grammar. In order to account for the complexity of the Polish declension system we develop a unification-based formalism, called nested feature matrices. To demonstrate the power of the proposed formalism we investigate its appropriateness for solving the following linguistic problems: a) selecting appropriate morphological markers with respect to the noun’s gender and stem for expressing case and number, b) establishing phrasal agreement between nouns and other parts of speech such as verbs, and finally c) dealing with long-distance dependencies in phrasal agreement. We show that our formalism succeeds in solving these problems and that the presented implementation is fully operational for correctly parsing and producing simple Polish transitive sentences.
    [Show full text]
  • Numeral Subjects in Polish: Surface Morphology Vs
    Beiträge zur allgemeinen und vergleichenden Sprachwissenschaft. Band 9 DOI: 10.23817/bzspr.9-9 Jacek Witkoś (ORCID 0000-0001-6462-3117) Uniwersytet Adama Mickiewicza w Poznaniu, Poland Numeral Subjects in Polish: Surface Morphology vs. Abstract Syntax1 2 1. Introduction Numeral subjects in Polish show a non-uniform agreement pattern with the verb, depending on the cardinality of the numeral (paucal vs. high) and the gender of the NP. Full agreement shows with paucal numerals (<5) on non-virile NPs in (1a), while high numerals (≥5) with non-virile NPs in ex. (1b) require default agreement (3 person, neuter, singular) on Tfin. Paucal numerals combined with virile NPs may show either full agreement or default agreement, (2a-b), while high numerals with vir- ile NPs show default agreement, (2c). High numerals combined with virile NPs show a surface morphological form of accusative/genitive, whereas high numerals combined with non-virile NPs show a surface morphological form of accusative/nominative. This matter of numeral subject/verb (dis)agreement has been discussed extensively in the lit- erature (Babby 1987; Pesetsky 1982, 2013; Franks 1994, 1995, 2002; Przepiórkowski 1999, 2001; Rappaport 2001; Bailyn 2004; Bošković 2006; Pereltsvaig 2006; Rutkowski 2002, 2007; Przepiórkowski/Pate- juk 2012; Miechowicz-Mathiasen 2012; Willim 2015; Klockmann 2015, 2017; Matushansky/Ionin 2016, and Witkoś et al. 2018): (1) a. Te trzy dziewczyny pracowały/*pracowało tam. theseNOM threeNOM girlsF.NOM.PL worked3PL.NON-VIR/3SG.N there ‘These three girls worked there.’ 1 Research on this paper was partly financed by the Polish National Science Centre, grant 2012/07/B/HS2/02308.
    [Show full text]