Screening Procedures Annotation? COMPOUND NOUN

Total Page:16

File Type:pdf, Size:1020Kb

Screening Procedures Annotation? COMPOUND NOUN ! ! ! ! ! Lexical Semantic Analysis in Natural Language Text Nathan Schneider CMU-LTI-14-001 ! Language Technologies Institute School of Computer Science Carnegie Mellon University 5000 Forbes Ave., Pittsburgh, PA 15213 www.lti.cs.cmu.edu! ! ! Thesis Committee:! Noah A. Smith (chair), Carnegie Mellon University Chris Dyer, Carnegie Mellon University Eduard Hovy, Carnegie Mellon University Lori Levin, Carnegie Mellon University Timothy Baldwin, University! of Melbourne ! ! Submitted in partial fulfillment of the requirements for the degree of Doctor of Philosophy In Language and Information! Technologies ! ! © 2014, Nathan Schneider Lexical Semantic Analysis in Natural Language Text Nathan Schneider Language Technologies Institute School of Computer Science Carnegie Mellon University ◇ September 30, 2014 Submitted in partial fulfillment of the requirements for the degree of doctor of philosophy in language and information technologies Abstract Computer programs that make inferences about natural language are easily fooled by the often haphazard relationship between words and their meanings. This thesis develops Lexical Semantic Analysis (LxSA), a general-purpose framework for describing word groupings and meanings in context. LxSA marries comprehensive linguistic annotation of corpora with engineering of statistical natural lan- guage processing tools. The framework does not require any lexical resource or syntactic parser, so it will be relatively simple to adapt to new languages and domains. The contributions of this thesis are: a formal representation of lexical segments and coarse semantic classes; a well-tested linguistic annotation scheme with detailed guidelines for identifying multi- word expressions and categorizing nouns, verbs, and prepositions; an English web corpus annotated with this scheme; and an open source NLP system that automates the analysis by statistical se- quence tagging. Finally, we motivate the applicability of lexical se- mantic information to sentence-level language technologies (such as semantic parsing and machine translation) and to corpus-based linguistic inquiry. i I, Nathan Schneider, do solemnly swear that this thesis is my own work, and that everything therein is, to the best of my knowledge, true, accurate, and properly cited, so help me Vinken. ∗ Except for the footnotes that are made up. If you cannot handle jocular asides, please∗ recompile this thesis with the \jnote macro disabled. ii To my parents, who gave me language. § In memory of Charles “Chuck” Fillmore, one of the great linguists. iii Acknowledgments As a fledgling graduate student, one is confronted with obligatory words of advice, admonitions, and horror stories from those who have already embarked down that path. At the end of the day,2 though, I am forced to admit that my grad school experience has been perversely happy. Some of that can no doubt be attributed to my own bizarre priorities in life, but mostly it speaks to the qualities of those around me at CMU, who have been uniformly brilliant, committed, enthusiastic, and generous in their work. First and foremost, I thank my Ph.D. advisor, Noah Smith, for serving as a researcher role model, for building and piloting a re- splendent research group (ARK), and for helping me steer my re- search while leaving me the freedom to develop my own style. I thank the full committee (Noah, Chris, Lori, Ed, and Tim), not just for their thoughtful consideration and feedback on the thesis itself, but for their teaching and insights over the years that have influ- enced how I think about language and its relationship to technology. It was my fellow ARK students that had the biggest impact on my day-to-day experiences at CMU. Mike Heilman, Shay Cohen, Dipanjan Das, Kevin Gimpel, André Martins, Tae Yano, Brendan O’Connor, Dani Yogatama, Jeff Flanigan, David Bamman, Yanchuan 2“day” being a euphemism for 6 years v Sim, Waleed Ammar, Sam Thomson, Lingpeng Kong, Jesse Dodge, Swabha Swayamdipta, Rohan Ramanath, Victor Chahuneau, Daniel Mills, and Dallas Card—it was an honor. My officemates in GHC 5713—Dipanjan, Tae, Dani, Waleed, and (briefly) Sam and Lingpeng— deserve a special shout-out for making it an interesting place to work. It was a privilege to be able to work with several different ARK postdocs—Behrang Mohit, (pre-professor) Chris Dyer, and Fei Liu—and always a pleasure to talk with Bryan Routledge at group meetings. But oh, the list doesn’t stop there. At CMU I enjoyed working with amazing annotators—Spencer Onuffer, Henrietta Conrad, Nora Ka- zour, Mike Mordowanec, and Emily Danchik—whose efforts made this thesis possible, and with star undergraduates including De- sai Chen and Naomi Saphra. I enjoyed intellectual exchanges (in courses, reading groups, collaborations, and hallway conversations) with Archna Bhatia, Yulia Tsvetkov, Dirk Hovy, Meghana Kshirsagar, Chu-Cheng Lin, Ben Lambert, Marietta Sionti, Seza Do˘gruöz, Jon Clark, Michael Denkowski, Matt Marge, Reza Bosagh Zadeh, Philip Gianfortoni, Bill McDowell, Kenneth Heafield, Narges Razavian, Nesra Yannier, Wang Ling, Manaal Faruqui, Jayant Krishnamurthy, Justin Cranshaw, Jesse Dunietz, Prasanna Kumar, Andreas Zollmann, Nora Presson, Colleen Davy, Brian MacWhinney, Charles Kemp, Ja- cob Eisenstein, Brian Murphy, Bob Frederking, and many others. And I was privileged to take courses with the likes of Noah, Lori, Tom Mitchell, William Cohen, Yasuhiro Shirai, and Leigh Ann Sudol, and to have participated in projects with Kemal Oflazer, Rebecca Hwa, Ric Crabbe, and Alan Black. I am especially indebted to Vivek Srikumar, Jena Hwang, and everyone else at CMU and elsewhere who has contributed to the preposition effort described in ch. 5. The summer of 2012 was spent basking in the intellectual warmth vi of USC’s Information Sciences Institute, where I did an internship under the supervision of Kevin Knight, and had the good fortune to cross paths with Taylor Berg-Kirkpatrick (and his cats), Daniel Bauer, Bevan Jones, Jacob Andreas, Karl Moritz Hermann, Christian Buck, Liane Guillou, Ada Wan, Yaqin Yang, Yang Gao, Dirk Hovy, Philipp Koehn, and Ulf Hermjakob. That summer was the beginning of my involvement in the AMR design team, which includes Kevin and Ulf as well as Martha Palmer, Claire Bonial, Kira Griffitt, and several others; I am happy to say that the collaboration continues to this day. The inspiration that pushed me to study computational linguis- tics and NLP in the first place came from teachers and mentors at UC Berkeley, especially Jerry Feldman and Dan Klein. I am happy to have remained connected to that community during my Ph.D., and to have learned a great deal about semantics from the likes of Collin Baker, Miriam Petruck, Nancy Chang, Michael Ellsworth, and Eve Sweetser. Finally, I thank my family for being there for me and putting up with my eccentricities. The same goes for my friends men- tioned above, as well as from the All University Orchestra (Smitha Prasadh, Tyson Price, Deepa Krishnaswamy), from Berkeley linguis- tics (Stephanie Shih, Andréa Davis, Kashmiri Stec, Cindy Lee), and from NLP conferences. This thesis would not have been possible without the support of research funding sources;3 all who have contributed to the smooth functioning of CMU SCS and LTI, especially Jaime Carbonell (our fearless leader) and all the LTI staff; as well as Pittsburgh eating 3The research at the core of this thesis was supported in part by NSF CAREER grant IIS-1054319, Google through the Reading is Believing project at CMU, DARPA grant FA8750-12-2-0342 funded under the DEFT program, and grant NPRP-08-485- 1-083 from the Qatar National Research Fund (a member of the Qatar Foundation). The statements made herein are solely the responsibility of the authors. vii establishments, especially Tazza d’Oro, the 61C Café, and Little Asia. There are undoubtedly others that I should have mentioned above, so I hereby acknowledge them. If in this document contains any errors, omissions, bloopers, dangling modifiers, wardrobe malfunctions, etc., it is because I screwed up. It is emphatically not the fault of my committee or reviewers if a bug in my code caused low numbers to round up, high numbers to round down, or expletives to be inserted in phonologi- cally suspect syllable positions. viii Contents Abstract i Acknowledgments v 1 Setting the Stage 1 1.1 Task Definition ....................... 3 1.2 Approach .......................... 5 1.3 Guiding Principles ..................... 6 1.4 Contributions ....................... 8 1.5 Organization ........................ 8 2 General Background: Computational Lexical Semantics 11 2.1 The Computational Semantics Landscape . 12 2.2 Lexical Semantic Categorization Schemes . 14 2.3 Lexicons and Word Sense Classification . 15 2.4 Named Entity Recognition and Sequence Models . 20 2.5 Conclusion ......................... 24 I Representation & Annotation 25 3 Multiword Expressions 27 ix 3.1 Introduction ........................ 28 3.2 Linguistic Characterization ................ 31 3.3 Existing Resources ..................... 39 3.4 Taking a Different Tack . 46 3.5 Formal Representation . 50 3.6 Annotation Scheme .................... 52 3.7 Annotation Process .................... 54 3.8 The Corpus ......................... 58 3.9 Conclusion ......................... 61 4 Noun and Verb Supersenses 63 4.1 Introduction ........................ 64 4.2 Background: Supersense Tags . 64 4.3 Supersense Annotation for Arabic . 67 4.4 Supersense Annotation for English . 72 4.5 Related Work: Copenhagen Supersense Data
Recommended publications
  • A Grammar Research Guide for Ngwi Languages
    Language and Culture DigitalResources Documentation and Description 33 A Grammar Research Guide for Ngwi Languages Eric B. Drewry A Grammar Research Guide for Ngwi Languages Eric B. Drewry Azusa Pacific University in cooperation with SIL International—East Asia Group SIL International 2016 SIL Language and Culture Documentation and Description 33 ©2016 SIL International® ISSN 1939-0785 Fair Use Policy Documents published in the Language and Culture Documentation and Description series are intended for scholarly research and educational use. You may make copies of these publications for research or instructional purposes (under fair use guidelines) free of charge and without further permission. Republication or commercial use of a Language and Culture Documentation and Description or the documents contained therein is expressly prohibited without the written consent of the copyright holder. Managing Editor Eric Kindberg Series Editor Lana Martens Content Editor Lynn Frank Copy Editor Sue McQuay Compositor Bonnie Waswick Abstract This grammar research guide describes the range of syntactic variety found in a representative group of well-described Ngwi languages. This overview of syntactic variety should make the guide useful for field linguists preparing to describe any of the forty-eight Ngwi languages that were recognized for the first time in the sixteenth edition of the Ethnologue (Lewis 2009). This is done by giving examples of where and how widely the languages in this group vary even within the typical categories of the Ngwi languages, including sentence introducers, conjunctions, noun types, compounding, derivation, noun particles, postnominal clausal particles, classifiers and numerals, negation, adjectives, pronouns, adverbs, verb types, verb concatenations, preverbal and postverbal slots, verb particles, clause-final and sentence- final particles, simple sentences, compound sentences, and complex sentences.
    [Show full text]
  • THE BASIC STRUCTURE of the ZAIWA NOUN PHRASE Mark Wannemacher Payap University Linguistics Institute 1. INTRODUCTION Zaiwa Is A
    Linguistics of the Tibeto-Burman Area Volume 33.2 ― October 2010 THE BASIC STRUCTURE OF THE ZAIWA NOUN PHRASE Mark Wannemacher Payap University Linguistics Institute Abstract This paper provides a basic description of major grammatical features of the Zaiwa noun phrase. It includes a review of Zaiwa phonology, a discussion of word classes and consituent order typology, and a description of the noun phrase including noun types, nominal modifiers, conjunctions and aspects of discourse grammar related to the noun phrase. Keywords grammar, noun phrase, word class, Zaiwa, Tibeto-Burman 1. INTRODUCTION Zaiwa is a language of approximately 150,000 speakers in the Northern Burmic Branch of the Tibeto-Burman language family. It is spoken in parts of Eastern Kachin State and Northern Shan State in Myanmar and in parts of Southwest Yunnan in China. The Zaiwa have a close cultural relationship with the Jinghpaw people and borrow extensively from the Jinghpaw language, as well as from Burmese, Dehong Dai and Yunnanese Chinese. This paper will give a brief introduction to Zaiwa phonology, word classes and constituent order typology, and then move on to the primary focus of describing the Zaiwa noun phrase. The description includes nouns types, pronouns, nominal modifiers including the relative clause, demonstratives, postpositions, conjunctions, and discourse markers. Data for this paper were collected while teaching at Payap University in Chiang Mai, Thailand. I would like to thank Payap University for their kind assistance in the study of the languages of Southeast Asia.1 The grammatical data in this paper is from speakers from Sadon and Kengtung, Myanmar, as well as Janshi, China.
    [Show full text]
  • Dissertation Supervisor: Prof. Louisa Sadler
    Progressivity Expressions in Hassawi Dialect Hamdah Mohammad Al-Abdullah Registration no:1500875 Dissertation supervisor: Prof. Louisa Sadler A dissertation submitted for the degree of Master in linguistic studies Department of Language and Linguistics University of Essex September, 2016 1 To my father… you are always in my heart 2 Acknowledgment I would like to express my sincerest appreciation to a number of people who were great company this year as they were supportive enough that I could reach this point. My deepest gratitude goes to Pro. Louisa Sadler, my dissertation supervisor for her support, time, significant comments and guidance. I am forever grateful to my father Mohammad Al- Abdullah may Allah bless his soul and my mother shirifah Al-Salim without whom this journey could not have been completed. I am also very thankful to all my family members and friends for their constant support and love. I am particularly grateful for my brother Abdullah who was a great escort and a right hand in this trip to the UK where I could expand my horizon in the University of Essex and where I could meet with great minds, my tutors to whom I am in debt forever, and my new friends whom I will keep in heart forever. 3 Table of contents Abstract List of tables……………………..……………..……………………………………………..6 List of abbreviations………………………………………………………………………….7 Chapter 1……………..……………...……………………………………………..………..10 1.1 Introduction…………………………..…………………………………………………11 1.2 Basic facts about Al-Ahsa and the Hasswi dialect…………………..….………….....12 Chapter 2……………………………………..……..………………………………………16 2. Review of the Literature…………….….…………………..…………………………...17 2.1 Progressive in Europe languages………………………………….…………………..17 2.1.1 Progressive in English………………………..………………………………………17 2.1.2 Blansitte’s classification of the morphosyntactic expressions of the Progressive in Europe languages…………………………………………………………………..……….18 2.2 Progressive in Modern Standard Arabic (MSA)…………….…………….…………19 2.3 Progressive in the dialects of colloquial Arabic………………………………………21 Chapter 3………..………………………………………………….……………………….27 3.
    [Show full text]
  • What Is a Particle? on the Use and Abuse of the Term Particle in East and Southeast Asian Languages
    What is a particle? On the use and abuse of the term particle in East and Southeast Asian Languages With some modest recommendations for improving a mildly lamentable situation 1 Keith W. Slater SIL International, East Asia Group ABSTRACT The term particle is commonly used by grammar writers, but seems to have little or no status in typological works. In this paper, I detail the results of a study of grammatical descriptions of languages spoken in East and Southeast Asia. These grammatical descriptions all use the term particle, but there is very little consistency in their usage of the term. Furthermore, hardly anyone actually defines the term, leaving us with a very unclear picture of how to compare its uses. The paper concludes with some observations about what do seem to be the most common understandings of the term particle , and makes some recommendations for improving upon the current lack of consistency across the grammatical descriptions written within different language families. CONTENTS 1 WHAT ARE PARTICLES IN THEORY ? 1.1 DEFINITIONS 1.2 PARTICLES AND CLITICS 2 WHAT ARE PARTICLES IN PRACTICE ? 2.1 ASPECT /M ODALITY /(T ENSE ) 2.2 MOOD /I LLOCUTIONARY FORCE 2.3 QUOTATION AND EVIDENTIALITY 2.4 DISCOURSE ORGANIZATION/INTERPROPOSITIONAL RELATIONSHIPS 2.5 FOCUS , EMPHASIS , TOPICALIZATION 2.6 NOMINALIZERS (MAY ALSO BE COMPLEMENTIZERS ) 2.7 NOUN PARTICLES 2.8 AND THE KITCHEN SINK 2.9 MORPHOLOGICALLY COMPLEX PARTICLES 2.10 MULTIFUNCTIONAL PARTICLES 1Thanks to Lynn Conver for helping me find some great material about particles. 1
    [Show full text]
  • 10 Word Formation
    10 Word Formation TARO KAGEYAMA 0 Introduction A long-standing debate in generative grammar concerns the Lexicalist Hypo- thesis, the strongest form of which demands complete separation of morphol- ogy from syntax, thereby disallowing active interactions of word formation and syntactic operations (Di Sciullo and Williams 1987). Such a hypothesis confronts serious challenges from an agglutinative language like Japanese, where one suffix after another is productively added to a verb stem to give rise to more and more complex predicates, as in tabe-hazime(-ru) “eat-begin” = “begin to eat,” tabe-hazime-sase(-ru) “eat-begin-cause” = “make (someone) begin to eat,” and tabe-hazime-sase-ta(-i) “eat-begin-cause-want” = “want to make (someone) begin to eat.” This chapter will review issues in Japanese word formation which directly pertain to the evaluation of the Lexicalist Hypothesis. Included in my discussion are Verb+Verb compounds, Noun+Verbal Noun compounds, and Verbal Noun+suru compounds. Space limitations prevent me from looking into other topics of theoretical interest in the realm of lexical morphology, such as N–N compounding (e.g. inu-goya “dog-house”), A–N compounding (e.g. aka- boo “redcap”), nominalization (e.g. ame-huri “rainfall”), lexical prefixation and suffixation (e.g. sai-kakunin “re-assure,” niga-mi “bitterness”), clipping (e.g. siritu- daigaku “private universities” → si-dai), and reduplication (e.g. (biiru-o) nomi- nomi “while drinking beer”). For the topics that are not included in this chapter as well as the basics of Japanese morphology, the reader is referred to Kageyama (1982), Shibatani (1990: chapter 10), and Tsujimura (1996b: chapter 4).
    [Show full text]
  • Monograph Series on Languages and Linguistics 17Th Annual Round Table
    Monograph Series on Languages and Linguistics Number 19, 1966 edited by F. P. Dinneen, SJ 17th Annual Round Table Georgetown University School of Languages and Linguistics REPORT OF THE SEVENTEENTH ANNUAL ROUND TABLE MEETING ON LINGUISTICS AND LANGUAGE STUDIES FRANCIS P. DINNEEN, S.J. EDITOR GEORGETOWN UNIVERSITY PRESS Washington, D.C. 20007 ©Copyright 1966 GEORGETOWN UNIVERSITY PRESS SCHOOL OF LANGUAGES AND LINGUISTICS GEORGETOWN UNIVERSITY Library of Congress Catalog Card Number 58-31607 Lithographed in U.SA. by EDWARDS BROTHERS, INC. Ann Arbor, Michigan TABLE OF CONTENTS Page Foreword v WELCOMING REMARKS Rev. Frank L. Fadner, S.J., Regent Institute of Languages and Linguistics vii Robert Lado, Dean Institute of Languages and Linguistics ix I. PROBLEMS IN SEMANTICS John L. Fischer Interrogatives in Ponapean: Some Semantic and Grammatical Aspects 1 Charles J. Fillmore A Proposal Concerning English Prepositions 19 Gerhard Nickel Operational Procedures in Semantics, with Special Reference to Medieval English 35 James B. Fraser Some Remarks on the Verb-Particle Construction in English 45 DISCUSSION 63 II. FIRST LUNCHEON ADDRESS George L. Trager Linguistics as Anthropology 71 IIL HISTORY OF LINGUISTICS Karl V. Teeter The History of Linguistics: New Lamps for Old 83 iv / TABLE OF CONTENTS Hugo Mueller On Re-Reading von Humboldt 97 John Viertel Concepts of Language Underlying the 18th Century Controversy about the Origin of Language 109 Geoffrey Bursill-Hall Aspects of Modistic Grammar 133 DISCUSSION 149 IV. LINGUISTICS AND ENGLISH Stanley Sapon Shaping Productive Verbal Behavior in a Non-Speaking Child: A Case Report 157 Paul M. Postal On So-Called 'Pronouns' in English 177 Terence E.
    [Show full text]
  • A Description of Preverb and Particle Usage In
    A DESCRIPTION OF PREVERB AND PARTICLE USAGE IN INNU-AIMÛN NARRATIVE by ©Jane Bannister Bachelor of Arts Memorial University of Newfoundland (2000) A thesis submitted to the School of Graduate Studies in partial fulfillment of the requirements for the degree of Master of Arts Department of Linguistics Memorial University of Newfoundland April 2004 St. John’s Newfoundland and Labrador, Canada ABSTRACT Sentences with multiple preverbs and/or particles are examined in this thesis. The data sentences were collected from the first 18 stories of the Labrador Innu Text Project. Chapter 1 is an introduction to Innu-aimun grammar, with sections on previous research into word ordering, especially preverb ordering. Chapter 2 describes the patterning, use and co-occurrence of the ten most common preverbs in the data sentences. Preverbs are subdivided into modal preverbs, temporal preverbs, aspectual preverbs and other preverbs. Chapter 3 discusses 28 common particles in the data. These particles are also divided into smaller groups, including complementizers, focus particles, negative particles, adverbs, temporal and aspectual particles, particles of speaker opinion and particles with changed forms. Both chapters 2 and 3 include discussion of regular patterns of ordering of preverbs or particles. Chapter 4 is an analysis of the use of the independent or conjunct orders following negative particles. Optimality Theory is used to explain Innu data, and sentences are analyzed based on Brittain (2001, 1997). A general thesis conclusion ends chapter 4. ii ACKNOWLEDGMENTS Thanks to the many people who have helped me create and finish this thesis. Thanks to my supervisor Phil Branigan, for sensible suggestions and a calming demeanor.
    [Show full text]
  • Annotation Manual for the NPCMJ
    Annotation manual for the NPCMJ Stephen Wright Horn, Iku Nagasaki, Alastair Butler, and Kei Yoshimoto Contents 1 Introduction 5 2 Tags 6 2.1 General principles . 6 2.2 Part-of-speech tags . 6 2.3 Syntactic tags . 7 2.4 Tag extensions to specify clause linkage . 8 2.5 Other tags . 9 3 General parsing principles 9 3.1 Overview . 9 3.2 The schema of constituents . 9 3.3 Terminal nodes: . 10 3.4 Flat phrase-structure . 10 3.5 Endocentric structure and exceptions thereto . 10 4 Segmentation and part-of-speech annotation 11 5 Basic clause structure 13 6 Null elements 15 6.1 Null elements without indexing . 15 6.1.1 Null expletive . 15 6.1.2 Zero pronoun with generic impersonal reference . 17 6.1.3 Other zero pronouns . 17 6.1.4 Traces in relative clauses . 18 6.2 Null elements that are always indexed . 19 6.3 The position of null elements . 21 7 Annotation of grammatical roles 26 7.1 Core grammatical roles . 26 7.1.1 Explicitly marked arguments . 27 7.1.2 Implicitly marked arguments . 29 7.1.3 Omitted arguments . 30 7.2 Peripheral grammatical roles . 31 7.2.1 Explicitly marked adjuncts . 32 7.2.2 Implicitly marked adjuncts . 32 7.2.3 Adjunct traces . 36 1 8 Complementizer phrases (CPs) 37 8.1 Projection for sentence final particle (CP-FINAL) . 37 8.2 Questions (CP-QUE) . 38 8.3 Exclamative utterances (CP-EXL) . 40 8.4 Imperative clauses (CP-IMP) . 41 9 Adnominal clauses 45 9.1 Adnominal clauses with traces (IP-REL) .
    [Show full text]
  • Jp Grammar Guide.Pdf
    Tae Kim's Japanese guide to learning Japanese grammar file:///C:/Documents%20and%20Settings/Administrator/%E3%83%8... A Japanese guide to Japanese grammar Outline 1. The problem with conventional textbooks 2. A Japanese guide to Japanese grammar 3. What is not covered in this guide? 4. Suggestions 5. Requirements The problem with conventional textbooks The problem with conventional textbooks is that they often have the following goals. 1. They want readers to be able to use functional and polite Japanese as quickly as possible. 2. They don't want to scare readers away with terrifying Japanese script and Chinese characters. 3. They want to teach you how to say English phrases in Japanese. Traditionally with romance languages such as Spanish, these goals presented no problems or were nonexistent due to the similarities to English. However, because Japanese is different in just about every way down to the fundamental ways of thinking, these goals create many of the confusing textbooks you see on the market today. They are usually filled with complicated rules and countless number of grammar for specific English phrases. They also contain almost no kanji and so when you finally arrive in Japan, lo and behold, you discover you can't read menus, maps, or essentially anything at all because the book decided you weren't smart enough to memorize Chinese characters. The root of this problem lies in the fact that these textbooks try to teach you Japanese with English. They want to teach you on the first page how to say, "Hi, my name is Smith," but they don't tell you about all the arbitrary decisions that were made behind your back.
    [Show full text]
  • Korean Romanization and Word Division
    Korean Romanization and Word Division Romanization 1. General Practice The Library of Congress will continue to follow the McCune-Reischauer system to romanize Korean with the exceptions noted in this document. See: Romanization of the Korean Language: Based upon its Phonetic Structure by G.M. McCune and E.O. Reischauer ([S.l.: s.n., 1939?), reprinted from the Transactions of the Korea Branch of the Royal Asiatic Society. Full text of the original document is available online from the National Library of Australia Web site: http://www.nla.gov.au/librariesaustralia/cjk/download/ras_1939.pdf Note: A romanization table appears as Appendix 7, at the end of this document. 2. Authorities The Library of Congress will designate certain standard dictionaries [see Appendix 1] as final authorities to resolve questions of contemporary pronunciation. A word will be considered to be pronounced as indicated in those dictionaries, and romanized in such a way as to represent its pronunciation most accurately. 3. Conflict between Romanization Rule and Pronunciation When romanization rules conflict with the pronunciation of a word, prefer to represent the pronunciation. 1 라면 ramyŏn 漢字 Hancha 令狀 yŏngchang 서울대 Sŏuldae 월드컵 Wǒldŭk’ŏp 길잡이 kiljabi 말갈족 Malgaljok 값어치 kabŏch’i 싫증 silchŭng Note: Some dictionaries represent a reinforced medial consonant with a double consonant: 의과 [_꽈], 실시 [_씨]. However, the romanization would not necessarily show a double consonant: ǔikwa, silsi. 평가 p’yŏngka 문법 munpŏp 4. Hyphens (a) When sounds would normally change, according to McCune-Reischauer rules, sound change is indicated preceding or following a hyphen in forenames or pseudonyms that are preceded by family names, and in generic terms used as jurisdictions.
    [Show full text]
  • Parts of Speech in Novamorf, a New Morphological Annotation of Czech
    DOI 10.2478/jazcas-2019-0065 Parts OF SPEECH IN NovaMORF, a NEW MORPHOLOGICAL Annotation OF CZECH VLADIMÍR PETKEVIČ1 ‒ Jaroslava Hlaváčová2 ‒ KLÁRA OSOLSOBě3 ‒ Martin SVÁŠEK ‒ JOSEF ŠIMANDL 1 Institute of Theoretical and Computational Linguistics, Faculty of Arts, Charles University, Czech Republic 2 Institute of Formal and Applied Linguistics, Faculty of Mathematics and Physics, Charles University, Czech Republic 3 Institute of the Czech Language, Faculty of Arts, Masaryk University, Czech Republic PETKEVIČ, Vladimír – HLAVÁČOVÁ, Jaroslava – OSOLSOBě, Klára – SVÁŠEK, Martin – ŠIMANDL, Josef: Parts of speech in NovaMorf, a new morphological annotation of Czech. Journal of Linguistics, 2019, Vol. 70, No 2, pp. 358 – 369. Abstract: A detailed morphological description of word forms in any language is a necessary condition for a successful automatic processing of linguistic data. The paper focuses on a new description of morphological categories, mainly on the subcategorization of parts of speech in Czech within the NovaMorf project. NovaMorf focuses on the description of morphological properties of Czech word forms in a more compact and consistent way and with a higher explicative power than approaches used so far. It also aims at the unification of diverse approaches to morphological annotation of Czech. NovaMorf approach will be reflected in a new morphological dictionary to be exploited for a new automatic morphological analysis (and disambiguation) of corpora of contemporary Czech. Keywords: NovaMorf, morphological annotation, parts of speech, morphological categories, subcategorization 1 INTRODUcTION We present a repertoire of morphological categories and, mainly, the parts of speech (POS) and their subcategorization distinguished in NovaMorf, the project of an innovated description of Czech morphology as a linguistic base for a new morphological analysis and subsequent disambiguation of Czech texts.
    [Show full text]
  • An Erzyan Orthography Compatible with Compound Words, Issues
    An Erzyan orthography compatible with compound words, issues Klementeva, E.F. & J.M. Rueter Congressus Duodecimus Internationalis Fenno-Ugristarum Oulu, Finland, August 17th – 21st, 2015 Aug 19, 2015 Jack Rueter, Ph.D., University of Helsinki 1 An Erzyan orthography compatible with compound words, issues Compound words and their treatment in the Erzya orthography of 2012. Aug 19, 2015 Jack Rueter, Ph.D., University of Helsinki 2 Compound words and their treatment in the Erzya orthography ● Searching for a methodology ● Ideal: Orthographies should have methodologies, so no orthographic dictionary is really necessary. ● A methodology might be based on categories inherent in the individual language Aug 19, 2015 Jack Rueter, Ph.D., University of Helsinki 3 Orthography of 2012 ● Earlier Orthographic dictionaries: 1939, 1955, 1969, 1978 ● Terminology: adjective, adverb, affixoid, lexicalization, noun, particle, participle ● Principles: of Morphology, Phonetics and Tradition Aug 19, 2015 Jack Rueter, Ph.D., University of Helsinki 4 Affixoid This term is not explained separately!! Чи, пуло, пель, пря, ланго, алкс, пе, кирда, мезе, ни, In practice this is used for 3 phenomena: (1) Semantic value not attested in independent word (2) Semantic value without independent wordform (3) Base form for use in grammatical case of so called postpositions [relative spatial nouns] Aug 19, 2015 Jack Rueter, Ph.D., University of Helsinki 5 Morphology ● This is in fact a morphematic priniciple, whereby the parts of words show no variation in orthography, regardless of how they might be pronounced. ● Куз, кузонь, кузга hence: ● /kusne/ кузтнэ Aug 19, 2015 Jack Rueter, Ph.D., University of Helsinki 6 Morphology 2 ● Write compound words whose components have not been shortened.
    [Show full text]