Pos, Morphology and Dependencies Annotation Guidelines for Arabic

Total Page:16

File Type:pdf, Size:1020Kb

Pos, Morphology and Dependencies Annotation Guidelines for Arabic PoS, Morphology and Dependencies Annotation Guidelines for Arabic Mohammed Attia, Tolga Kayadelen, Ryan Mcdonald, Slav Petrov Google Inc. May, 2017 Table of Contents 1. Introduction............................................................................................................................................2 2. Tokenization...........................................................................................................................................3 Arabic Clitic Table................................................................................................................................4 Special Cases.........................................................................................................................................4 3. POS Tagging..........................................................................................................................................8 POS Quick Table...................................................................................................................................8 POS Tags.............................................................................................................................................13 JJ: Adjective....................................................................................................................................13 JJR: Elative Adjective.....................................................................................................................14 DT: The Arabic Determiner System...............................................................................................14 PDT: Predeterminers.......................................................................................................................15 RB: Adverbs...................................................................................................................................15 ADP/IN: Adpositions......................................................................................................................16 PRP: Personal Pronouns.................................................................................................................17 WP: interrogative/adjectival pronouns...........................................................................................19 VBN: active and passive participles...............................................................................................19 VBG: masdar..................................................................................................................................20 RP: Particle.....................................................................................................................................20 UH: Interjection or hesitation.........................................................................................................21 SYM: Symbol.................................................................................................................................21 Specific Cases for POS........................................................................................................................22 4. Morphological feature tagging.............................................................................................................34 Guiding Principle................................................................................................................................35 Intent vs Production.............................................................................................................................35 Proper..................................................................................................................................................36 Specific Cases For Morphology..........................................................................................................41 Plurality and Numerals...................................................................................................................41 Pluralia Tantum...............................................................................................................................41 Ambiguity.......................................................................................................................................42 Gender Representation....................................................................................................................42 Definiteness....................................................................................................................................44 Personal Names..............................................................................................................................45 Idafa vs Apposition.........................................................................................................................45 Tagging Foreign Words...................................................................................................................46 Tagging Dialectical Words..............................................................................................................46 The Unspecified Tag.......................................................................................................................48 1 5. Dependencies.......................................................................................................................................49 5.1 Dependency Quick Table..............................................................................................................49 5.2 Dependency Labels.......................................................................................................................62 5.2.1 Root.......................................................................................................................................62 5.2.2 Auxiliary................................................................................................................................63 5.2.3 Arguments..............................................................................................................................63 5.3 Specific Issues with Dependency..................................................................................................87 MWE List.......................................................................................................................................87 xcomp.............................................................................................................................................89 Prep / Mark.....................................................................................................................................90 Dates and Time...............................................................................................................................90 Light verb constructions.................................................................................................................92 Quantifiers: predet vs. head............................................................................................................92 Interrogative pronouns....................................................................................................................92 Multi-token subordinating conjunctions.........................................................................................94 Range expressions..........................................................................................................................94 Locutions: mwe..............................................................................................................................94 Relative pronouns...........................................................................................................................95 Nouns with omitted relative pronouns............................................................................................96 Headless relative clauses................................................................................................................96 Parataxis vs. appos..........................................................................................................................97 Adjuncts: choice of the head...........................................................................................................97 97...............................................................................................................................لن ولكي Phrases Symbols in Dependency.................................................................................................................97 98................................................................................................يمكن، يعجب، يكفي :Verbs with csubj 98.................................................................................المر الذي Subordinate sentences starting with Definition of prepositional argument (CLR)..................................................................................99 Irregular Adjective Sequence........................................................................................................100 100.................................................................................................................ليس Other functions of Case for Nouns Modified by Numbers.........................................................................................100 Case for Words of non-Arabic Origin...........................................................................................100
Recommended publications
  • Rethinking Social Action. Core Values in Practice | RSACVP 2017 | 6-9 April 2017 | Suceava – Romania Rethinking Social Action
    Available online at: http://lumenpublishing.com/proceedings/published-volumes/lumen- proceedings/rsacvp2017/ 8th LUMEN International Scientific Conference Rethinking Social Action. Core Values in Practice | RSACVP 2017 | 6-9 April 2017 | Suceava – Romania Rethinking Social Action. Core Values in Practice The Convertor Status of Article of Case and Number in the Romanian Language Diana-Maria ROMAN* https://doi.org/10.18662/lumproc.rsacvp2017.62 How to cite: Roman, D.-M. (2017). The Convertor Status of Article of Case and Number in the Romanian Language. In C. Ignatescu, A. Sandu, & T. Ciulei (eds.), Rethinking Social Action. Core Values in Practice (pp.682-694). Suceava, Romania: LUMEN Proceedings https://doi.org/10.18662/lumproc.rsacvp2017.62 © The Authors, LUMEN Conference Center & LUMEN Proceedings. Selection and peer-review under responsibility of the Organizing Committee of the conference 8th LUMEN International Scientific Conference Rethinking Social Action. Core Values in Practice | RSACVP 2017 | 6-9 April 2017 | Suceava – Romania The Convertor Status of Article of Case and Number in the Romanian Language Diana-Maria ROMAN1* Abstract This study is the outcome of research on the grammar of the contemporary Romanian language. It proposes a discussion of the marked nominalization of adjectives in Romanian, the analysis being focused on the hypostases of the convertor (nominalizer). Within this area of research, contemporary scholarly treaties have so far accepted solely nominalizers of the determinative article type (definite and indefinite) and nominalizers of the vocative desinence type, in the singular. However, with regard to the marked conversion of an adjective to the large class of nouns, the phenomenon of enclitic articulation does not always also entail the individualization of the nominalized adjective, as an expression of the category of definite determination.
    [Show full text]
  • Possessive Agreement Turned Into a Derivational Suffix Katalin É. Kiss 1
    Possessive agreement turned into a derivational suffix Katalin É. Kiss 1. Introduction The prototypical case of grammaticalization is a process in the course of which a lexical word loses its descriptive content and becomes a grammatical marker. This paper discusses a more complex type of grammaticalization, in the course of which an agreement suffix marking the presence of a phonologically null pronominal possessor is reanalyzed as a derivational suffix marking specificity, whereby the pro cross-referenced by it is lost. The phenomenon in question is known from various Uralic languages, where possessive agreement appears to have assumed a determiner-like function. It has recently been a much discussed question how the possessive and non-possessive uses of the agreement suffixes relate to each other (Nikolaeva 2003); whether Uralic definiteness-marking possessive agreement has been grammaticalized into a definite determiner (Gerland 2014), or it has preserved its original possessive function, merely the possessor–possessum relation is looser in Uralic than in the Indo-Europen languages, encompassing all kinds of associative relations (Fraurud 2001). The hypothesis has also been raised that in the Uralic languages, possessive agreement plays a role in organizing discourse, i.e., in linking participants into a topic chain (Janda 2015). This paper helps to clarify these issues by reconstructing the grammaticalization of possessive agreement into a partitivity marker in Hungarian, the language with the longest documented history in the Uralic family. Hungarian has two possessive morphemes functioning as a partitivity marker: -ik, an obsolete allomorph of the 3rd person plural possessive suffix, and -(j)A, the productive 3rd person singular possessive suffix.
    [Show full text]
  • On Compounds, Noun Phrases and Domains Gisli R
    University of Connecticut OpenCommons@UConn Doctoral Dissertations University of Connecticut Graduate School 6-13-2017 Cycling Through Grammar: On Compounds, Noun Phrases and Domains Gisli R. Hardarson University of Connecticut, [email protected] Follow this and additional works at: https://opencommons.uconn.edu/dissertations Recommended Citation Hardarson, Gisli R., "Cycling Through Grammar: On Compounds, Noun Phrases and Domains" (2017). Doctoral Dissertations. 1570. https://opencommons.uconn.edu/dissertations/1570 Cycling Through Grammar: On Compounds, Noun Phrases and Domains Gísli Rúnar Harðarson, PhD University of Connecticut, 2017 In this dissertation, I address the question of domains within grammar: i.e. how domains are defined, whether different components of grammar make references to the same boundaries (or at least boundary definers), and whether these boundaries are uniform with respect to different processes. I address these questions in two case studies. First, I explore compound nouns in Icelandic and restrictions on their composition, where inflected non-head elements are structurally peripheral to uninflected ones. I argue that these effects are due to a matching condition which requires elements within compounds to match their attachment site in terms of size/type. Following that I explore how morphophonology is regulated by the structure of the compound. I argue for a contextual definition of the domain of morphophonology, where the highest functional morpheme in the extended projection of the root marks the boundary. Under this approach a morphophonological domain can contain smaller domains analogous to phases in syntax. This allows for the morphosyntactic structure to be mapped directly to phonology while giving the impression of two contradicting structures.
    [Show full text]
  • 5 Adjectives and Adverbs
    5 ADJECTIVES AND ADVERBS 1 Choose the correct alternative in the sentences below. If you find both alternatives acceptable, explain any difference in meaning. a. All the venues are easy/easily accessible. b. There’s a possible/possibly financial problem. c. We admired the wonderful/wonderfully panorama. d. Our neighbours are (simple)/simply people who live near us. Although less likely in this case, the adjective “simple” could be used as a premodifier to describe the noun (that is, the neighbours are not sophisticated people). Since the sentence seems to be a more general description of neighbours, however, the adverb “simply” is the more likely choice. The adverb serves as a comment on the part of the speaker. e. They seemed happy/happily about George’s victory. f. Similar/Similarly teams of medical advisers were called upon. g. Something in here smells horrible/horribly. Normally an adjective will follow a linking verb to describe a quality of the subject ref- erent. However, the adverb “horribly” can be used to refer to the intensity of the smell. h. This cream will give you a beautiful/beautifully smooth complexion. i. Particular/Particularly groups such as recent immigrants felt their needs were being overlooked. INTRODUCING ENGLISH GRAMMAR, THIRD EDITION KEY TO EXERCISES 5 Adjectives and Adverbs The adjective “particular” premodifies or describes the noun “groups”, whereas the use of the adverb “particularly” functions as a comment on the part of the speaker. 2 Explain the difference in form and meaning between the members of each pair. a. 1 She is a natural blonde.
    [Show full text]
  • University of Education, Winneba College Of
    University of Education, Winneba http://ir.uew.edu.gh UNIVERSITY OF EDUCATION, WINNEBA COLLEGE OF LANGUAGES EDUCATION, AJUMAKO THE SYNTAX OF THE GONJA NOUN PHRASE JACOB SHAIBU KOTOCHI May, 2017 i University of Education, Winneba http://ir.uew.edu.gh UNIVERSITY OF EDUCATION, WINNEBA COLLEGE OF LANGUAGES EDUCATION, AJUMAKO THE SYNTAX OF THE GONJA NOUN PHRASE JACOB SHAIBU KOTOCHI 8150260007 A thesis in the Department of GUR-GONJA LANGUAGES EDUCATION, COLLEGE OF LANGUAGES EDUCATION, submitted to the school of Graduate Studies, UNIVERSITY OF EDUCATION, WINNEBA in partial fulfilment of the requirements for the award of the Master of Philosophy in Ghanaian Language Studies (GONJA) degree. ii University of Education, Winneba http://ir.uew.edu.gh DECLARATION I, Jacob Shaibu Kotochi, declare that this thesis, with the exception of quotations and references contained in published works and students creative writings which have all been identified and duly acknowledged, is entirely my own original work, and it has not been submitted, either in part or whole, for another degree elsewhere. Signature: …………………………….. Date: …………………………….. SUPERVISOR’S DECLARATION I, Dr. Samuel Awinkene Atintono, hereby declare that the preparation and presentation of this thesis were supervised in accordance with the guidelines for supervision of thesis as laid down by the University of Education, Winneba. Signature: …………………………….. Date: …………………………….. iii University of Education, Winneba http://ir.uew.edu.gh ACKNOWLEDGEMENT I wish to express my gratitude to Dr. Samuel Awinkene Atintono of the Department of Gur-Gonja Languages Education, College of Languages Education for being my guardian, mentor, lecturer and supervisor throughout my university Education and the writing of this research work.
    [Show full text]
  • Adjective Attribution (Studies in Diversity Linguistics 2)
    Michael Rießler. 2016. Adjective attribution (Studies in Diversity Linguistics 2). Berlin: Language Science Press. This title can be downloaded at: http://langsci-press.org/catalog © 2016, Michael Rießler Published under the Creative Commons Attribution 4.0 Licence (CC BY 4.0): http://creativecommons.org/licenses/by/4.0/ ISBN: 978-3-944675-65-7 (Digital) 978-3-944675-66-4 (Hardcover) 978-3-944675-49-7 (Softcover) 978-1-530889-34-1 (Softcover US) ISSN: 2363-5568 Cover and concept of design: Ulrike Harbort Typesetting: Felix Kopecky, Sebastian Nordhoff, Michael Rießler Proofreading: Martin Haspelmath, Joshua Wilbur Fonts: Linux Libertine, Arimo, DejaVu Sans Mono Typesetting software:Ǝ X LATEX Language Science Press Habelschwerdter Allee 45 14195 Berlin, Germany langsci-press.org Storage and cataloguing done by FU Berlin Language Science Press has no responsibility for the persistence or accuracy of URLs for external or third-party Internet websites referred to in this publication, and does not guarantee that any content on such websites is, or will remain, accurate or appropriate. За най-любимите ми Алма, Ива и Кристина Contents Preface This is a thoroughly revised version of my doctoral dissertation Typology and evolution of adjective attribution marking in the languages of northern Eurasia, which I defended at Leipzig University in January 2011 and published electroni- cally as riesler2011a I am indebted to my family members, friends, project collab- orators, data consultants, listeners, supporters, sources of inspiration, opponents and other people who assisted in completing my dissertation. I am very thankful to the series editors who accepted my manuscript for pub- lication with this prestigious open-access publisher, to the technical staff at Lan- guage Science Press, as well as to proofreaders and other individuals who have spent their valuable time producing of this book.
    [Show full text]
  • Syntactic Classes of the Arabic Active Participle and Their Equivalents in Translation: a Comparative Study in Two English Quranic Translations
    Syntactic Classes of the Arabic Active Participle and their Equivalents in Translation: A Comparative Study in Two English Quranic Translations Hassan A. H. Gadalla, Ph.D. in Linguistics, Assiut University, Egypt [Published in the Bulletin of the Faculty of Arts, Assiut University, Egypt. Vol. 18, Jan. 2005, pp. 1-37.] 0. Introduction: This study attempts to provide an analysis of the syntactic classes of Arabic active participle forms and discuss their translations based on a comparative study of two English Quranic translations by Ali (1934) and Pickthall (1930). It starts with a brief introduction to the active participle in Arabic and the Arab grammarians' discussion of its syntactic classes. Then, it explains the study aim and technique. The third section presents an analysis of the results of the study by discussing the various renderings of the Arabic active participle in the two English translations of the Quran. For the phonemic symbols used to transcribe Arabic data, see Appendix (1) and for the symbols and abbreviations employed in the study, see Appendix (2). 1 1. Syntactic Classes of the Arabic Active Participle: The active participle (AP) is a morphological form derived from a verb to refer to the person or animate being that performs the action denoted by the verb. In Classical and Modern Standard Arabic grammars, it is called /?ism-u l-faa9il/ 'noun of the agent' and it has two patterns; one formed from the primary triradical verb and the other from the derived triradical as well as the quadriradical verbs. The former has the form [Faa9iL], e.g.
    [Show full text]
  • The Syntactic Category of Deadjectival Nouns in German: Evidence from Relative Clause Formation
    The syntactic category of deadjectival nouns in German: Evidence from relative clause formation Göttingen, 20.01.2016 Eric Fuß, IDS Mannheim [email protected] 1. Introduction • German: Two strategies to derive (abstract) nouns from adjectives: (1) derivational suffixes (e.g. -heit/-keit/-tum); (2) conversion • Only (2) is Productive in present-day German (cf. lexical gaps like *Gutheit ‘good+HEIT’, *Hochheit ‘high+HEIT’, *Gekauftheit ‘buy++HEIT’ *Besserkeit ‘better+KEIT’ etc.). (2) can apPly to adjectives (including participles) and their comParative/superlative forms: (3) a. gut ‘good’ → das Gute ‘the good’, besser ‘better’→ das Bessere ‘the better’, best- ‘best’→ das Beste ‘the best’ b. sehend ‘seeing’ → der Sehende ‘the one who sees’, erlebt ‘exPerienced’→ das Erlebte ‘what has been exPerienced ‘, gekauft ‘bought’→ das Gekaufte ‘what has been bought’ • The forms in (3) tyPically refer to (instantiations) of proPerties, or to persons/things that are characterized by a certain proPerty. • The Products of (2) exhibit a set of sPecial proPerties that raise questions concerning their categorial status and internal syntactic structure (cf. Kester 1996a,b, Sleeman 2013, McNally & Swart 2015 on Dutch; Alexiadou 2011, 2015 on SPanish/Greek/English). Nominal properties • syntactic distribution tyPical of nouns • Presence of determiners/determiner-like elements, cf. (4) • adjectival modification, similar to nouns, cf. (5) (4) a. das/vieles/alles Gute the /much/all good (weak inflection) b. ein/viel/nichts Gutes a/much/nothing good (strong inflection) (5) a. das vermeintliche/einzige/vollständige Neue the alleged/only/comPlete new (one) b. das vermeintliche/einzige/vollständige OPfer the alleged/only/comPlete victim/sacrifice 2 Adjectival properties • alternation between strong and weak inflection, cf.
    [Show full text]
  • English for Practical Purposes 9
    ENGLISH FOR PRACTICAL PURPOSES 9 CONTENTS Chapter 1: Introduction of English Grammar Chapter 2: Sentence Chapter 3: Noun Chapter 4: Verb Chapter 5: Pronoun Chapter 6: Adjective Chapter 7: Adverb Chapter 8: Preposition Chapter 9: Conjunction Chapter 10: Punctuation Chapter 11: Tenses Chapter 12: Voice Chapter 1 Introduction to English grammar English grammar is the body of rules that describe the structure of expressions in the English language. This includes the structure of words, phrases, clauses and sentences. There are historical, social, and regional variations of English. Divergences from the grammardescribed here occur in some dialects of English. This article describes a generalized present-dayStandard English, the form of speech found in types of public discourse including broadcasting,education, entertainment, government, and news reporting, including both formal and informal speech. There are certain differences in grammar between the standard forms of British English, American English and Australian English, although these are inconspicuous compared with the lexical andpronunciation differences. Word classes and phrases There are eight word classes, or parts of speech, that are distinguished in English: nouns, determiners, pronouns, verbs, adjectives,adverbs, prepositions, and conjunctions. (Determiners, traditionally classified along with adjectives, have not always been regarded as a separate part of speech.) Interjections are another word class, but these are not described here as they do not form part of theclause and sentence structure of the language. Nouns, verbs, adjectives, and adverbs form open classes – word classes that readily accept new members, such as the nouncelebutante (a celebrity who frequents the fashion circles), similar relatively new words. The others are regarded as closed classes.
    [Show full text]
  • EERE Editorial Style Guide
    EERE Editorial Style Guide April 30, 2021 (This page intentionally left blank) EERE Editorial Style Guide April 30, 2021 ii Table of Contents EERE Style Guide Instructions ................................................................................................... 1 A► .................................................................................................................................................. 2 a, an ............................................................................................................................................. 2 abbreviations, acronyms, and initialisms .................................................................................... 2 abstract ........................................................................................................................................ 2 academic degrees ........................................................................................................................ 2 acknowledgments ........................................................................................................................ 3 acronyms ..................................................................................................................................... 3 addresses ..................................................................................................................................... 3 air conditioning ..........................................................................................................................
    [Show full text]
  • Workbook This Workbook Is to Accompany a Grammar Textbook
    The Missouri House of Representatives GRAMMAR BOOT CAMP • Workbook This workbook is to accompany a grammar textbook. Writing Good Sentences by Claude W. Faulkner inspired and influenced this workbook’s creation and is highly recommended for use as a grammar textbook. Antony LePage September 2018 Formatting by Ellen Misloski Contents BASIC SENTENCE PATTERNS ............................................................................................................6 Exercise One .......................................................................................................................................6 Substantives Exercise Two ....................................................................................................................................... 7 Verbs Exercise Three ..................................................................................................................................... 8 Subject – Verb – Object Subject – Linking Verb – Object Exercise Four ....................................................................................................................................10 Substantive Modifiers Restrictive and Non-Restrictive Substantive Modifiers Exercise Five......................................................................................................................................13 Adjective – Noun Combinations Exercise Six ........................................................................................................................................15 Modifiers
    [Show full text]
  • How to Format Your Paper
    (De)verbal Modifiers in Attribute + Noun Collocations and Compounds: Verbs, Deverbal Nouns or Suffixed Adjectives? Radek Vogel Address: Masaryk University, Faculty of Education, Department of English Language and Literature, Poříčí 9, 60300 Brno, Czech Republic. [email protected] Abstract: English as an analytic language particularly poor in inflections and relatively poor in derivational suffixes does not often mark word classes by specific morphemes. On the contrary, one form of a word can be used in several grammatical functions and an identical word form can even have several meanings in several word classes. The frequent occurrence of conversion between word classes thus allows one form, often the base or simplest one, to perform several roles, and, at the same time, makes identification of its grammatically and semantically defined word class difficult, especially in multiword phrases functioning as a whole. The most frequent type of a noun phrase, Attr+N phrase, can thus be realised in several ways, with different word classes performing the function of syntactic attribute. This paper looks into a less frequent subtype of this phrase in English, one which uses semantically (de)verbal attribute, and tries to establish rules governing the choice between the two main options, using either the base form or a derived one (e.g. call centre vs. writing paper). Shedding light on this issue has practical application in mastering appropriate formation as well as correct understanding of English multiword phrases or terms (mostly nominal), which is an important skill in non-native environment (especially in EAP) where English is used as a lingua franca.
    [Show full text]