Linguistic Issues in Encoding Sanskrit. Peter M. Scharf and Malcolm D. Hyman. 2010
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
75 Characters Maximum
Kannada Script LGR Proposal Introduction, Current Analysis and Next Steps Dr. U.B. Pavanaja NBGP F2F Meeting, Colombo 14 December 2017 | 1 Agenda 1 2 3 Introduction to Repertoire Analysis Within Script Kannada Script Variants 4 5 6 Cross-Script WLE Rules Current Status and Variants Next Steps for Completion | 2 Introduction to Kannada Script Population – there are about 60 million speakers of Kannada language which uses Kannada script. Geographical area - Kannada is spoken predominantly by the people of Karnataka State of India. It is also spoken by significant linguistic minorities in the states of Andhra Pradesh, Telangana, Tamil Nadu, Maharashtra, Kerala, Goa and abroad Languages written in Kannada script – Kannada, Tulu, Kodava (Coorgi), Konkani, Havyaka, Sanketi, Beary (byaari), Arebaase, Koraga | 3 Classification of Characters Swaras (vowels) Letter ಅ ಆ ಇ ಈ ಉ ಊ ಋ ಎ ಏ ಐ ಒ ಓ ಔ Vowel sign/ N/Aಾ ಾ ಾ ಾ ಾ ಾ ಾ ಾ ಾ ಾ ಾ ಾ matra Yogavahas In Kannada, all consonants Anusvara ಅಂ (vyanjanas) when written as ಕ (ka), ಖ (kha), ಗ (ga), etc. actually have a built-in vowel sign (matra) Visarga ಅಃ of vowel ಅ (a) in them. | 4 Classification of Characters Vargeeya vyanjana (structured consonants) voiceless voiceless aspirate voiced voiced aspirate nasal Velars ಕ ಖ ಗ ಘ ಙ Palatals ಚ ಛ ಜ ಝ ಞ Retroflex ಟ ಠ ಡ ಢ ಣ Dentals ತ ಥ ದ ಧ ನ Labials ಪ ಫ ಬ ಭ ಮ Avargeeya vyanjana (unstructured consonants) ಯ ರ ಱ (obsolete) ಲ ವ ಶ ಷ ಸ ಹ ಳ ೞ (obsolete) | 5 Repertoire Included-1 Sr. Unicode Glyph Character Name Unicode Indic Ref Widespread No. Code General Syllabic use ? Point Category Category [Yes/No] 1 0C82 ಂ KANNADA SIGN ANUSVARA Mc Anusvara Yes 2 0C83 ಂ KANNADA SIGN VISARGA Mc Visarga Yes 3 0C85 ಅ KANNADA LETTER A Lo Vowel Yes 4 0C86 ಆ KANNADA LETTER AA Lo Vowel Yes 5 0C87 ಇ KANNADA LETTER I Lo Vowel Yes 6 0C88 ಈ KANNADA LETTER II Lo Vowel Yes 7 0C89 ಉ KANNADA LETTER U Lo Vowel Yes 8 0C8A ಊ KANNADA LETTER UU Lo Vowel Yes KANNADA LETTER VOCALIC 9 0C8B ಋ R Lo Vowel Yes 10 0C8E ಎ KANNADA LETTER E Lo Vowel Yes | 6 Repertoire Included-2 Sr. -
Comparison, Selection and Use of Sentence Alignment Algorithms for New Language Pairs
Comparison, Selection and Use of Sentence Alignment Algorithms for New Language Pairs Anil Kumar Singh Samar Husain LTRC, IIIT LTRC, IIIT Gachibowli, Hyderabad Gachibowli, Hyderabad India - 500019 India - 500019 a [email protected] s [email protected] Abstract than 95%, and usually 98 to 99% and above). The evaluation is performed in terms of precision, and Several algorithms are available for sen- sometimes also recall. The figures are given for one tence alignment, but there is a lack of or (less frequently) more corpus sizes. While this systematic evaluation and comparison of does give an indication of the performance of an al- these algorithms under different condi- gorithm, the variation in performance under varying tions. In most cases, the factors which conditions has not been considered in most cases. can significantly affect the performance Very little information is given about the conditions of a sentence alignment algorithm have under which evaluation was performed. This gives not been considered while evaluating. We the impression that the algorithm will perform with have used a method for evaluation that the reported precision and recall under all condi- can give a better estimate about a sen- tions. tence alignment algorithm's performance, We have tested several algorithms under differ- so that the best one can be selected. We ent conditions and our results show that the per- have compared four approaches using this formance of a sentence alignment algorithm varies method. These have mostly been tried significantly, depending on the conditions of test- on European language pairs. We have ing. Based on these results, we propose a method evaluated manually-checked and validated of evaluation that will give a better estimate of the English-Hindi aligned parallel corpora un- performance of a sentence alignment algorithm and der different conditions. -
Technical Reference Manual for the Standardization of Geographical Names United Nations Group of Experts on Geographical Names
ST/ESA/STAT/SER.M/87 Department of Economic and Social Affairs Statistics Division Technical reference manual for the standardization of geographical names United Nations Group of Experts on Geographical Names United Nations New York, 2007 The Department of Economic and Social Affairs of the United Nations Secretariat is a vital interface between global policies in the economic, social and environmental spheres and national action. The Department works in three main interlinked areas: (i) it compiles, generates and analyses a wide range of economic, social and environmental data and information on which Member States of the United Nations draw to review common problems and to take stock of policy options; (ii) it facilitates the negotiations of Member States in many intergovernmental bodies on joint courses of action to address ongoing or emerging global challenges; and (iii) it advises interested Governments on the ways and means of translating policy frameworks developed in United Nations conferences and summits into programmes at the country level and, through technical assistance, helps build national capacities. NOTE The designations employed and the presentation of material in the present publication do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. The term “country” as used in the text of this publication also refers, as appropriate, to territories or areas. Symbols of United Nations documents are composed of capital letters combined with figures. ST/ESA/STAT/SER.M/87 UNITED NATIONS PUBLICATION Sales No. -
Names of Countries, Their Capitals and Inhabitants
United Nations Group of Experts on Geographical Names (UNGEGN) East Central and South-East Europe Division (ECSEED) ___________________________________________________________________________ The Nineteenth Session of the East Central and South-East Europe Division of the UNGEGN Zagreb, Croatia, 19 – 21 November 2008 Item 9 and 10 of the agenda Document Symbol: ECSEED/Session.19/2008/10 Names of countries, their capitals and inhabitants Submitted by Poland* ___________________________________________________________________________ * Prepared by Maciej Zych, Commission on Standardization of Geographical Names Outside the Republic of Poland, Poland. 19th Session of the East, Central and South-East Europe Division of the United Nations Group of Experts on Geographical Names Zagreb, 19 – 21 November 2008 Names of countries, their capitals and inhabitants Maciej Zych Commission on Standardization of Geographical Names Outside the Republic of Poland 1 Names of countries, their capitals and inhabitants In 1997 the Commission on Standardization of Geographical Names Outside the Republic of Poland published the first list of Names of countries, their capitals and inhabitants, comprising both independent countries as well as non-self-governing and autonomous territories. The second edition appeared in 2003. Numerous changes occurred in geographical names in the six years since the previous list was published. New countries and new non-self-governing and autonomous territories appeared, some countries changed their names, other their capital or its name, the Polish names for several countries and their capitals also changed as did the recommended principles for the Romanization of several languages using non-Roman systems of writing. The third edition of Names of countries, their capitals and inhabitants appeared in the end of 2007, the data it contained being updated for mid-July 2007. -
Headmost Accent Wins Headmost Accent Wins
Headmost Accent Wins Headmost Accent Wins Head Dominance and Ideal Prosodic Form in Lexical Accent Systems Proefschrift ter verkrijging van de graad van Doctor aan de Universiteit Leiden, op gezag van de Rector Magnificus Dr. W. A. Wagenaar, hoogleraar in de faculteit der Sociale Wetenschappen, volgens besluit van het College voor Promoties te verdedigen op woensdag 13 januari 1999 te klokke 16.15 uur door ANTHOULA REVITHIADOU geboren te Katerini (Griekenland) in 1971 Promotiecommissie promotor: Prof. dr. J. G. Kooij co-promotor: Dr. H. G. van der Hulst referent: Prof. dr. J. ItA, University of California, Santa Cruz overige leden: Prof. dr. G. E. Booij, Vrije Universiteit Amsterdam Prof. G. Drachman, Universit#t Salzburg Prof. dr. M. van Oostendorp, Universiteit van Amsterdam Dr. R. W. J. Kager, Rijksuniversiteit Utrecht ISBN 90-5569-059-7 © 1999 by Anthi Revithiadou. All rights reserved. Printed in The Netherlands *- ## ;J9 #/ )-# Martin Acknowledgments I am afraid that it is impossible to include in a few lines the names of all of those who have contributed in some way or other to the production of this thesis. Ironically, the rules concerning the public defense at Leiden University prevent me from mentioning the names of those who helped me most. Therefore I want to thank collectively all members at HIL for creating a friendly and stimulating environment. I would like, however, to mention a few who contributed to my development as a linguist and as a person during the last four years. I consider myself extremely fortunate for being surrounded by a great group of people in Leiden. -
Slides for My Lecture for the Texperience 2010
◦ DEVELOPMENTDEVELOPMENT OF OFxxındyındy◦ SORTSORT AND AND MERGE MERGE RULES RULES FORFOR INDIC INDIC LANGUAGES LANGUAGES ZdeněkZdeněk Wagner, Wagner, Praha, Praha, Česká Česká republika republika AnshumanAnshuman Pandey, Pandey, Univ. Univ. Michigan, Michigan, USA USA JayaJaya Saraswati, Saraswati, Mumbai, Mumbai, India India ◦ NoteNote on on pronunciation pronunciation of ofxxındy◦ındy ◦ NoteNote on on pronunciation pronunciation of ofxxındy◦ındy Czech: ks ◦ NoteNote on on pronunciation pronunciation of ofxxındy◦ındy Czech: ks English: usually as z ◦ NoteNote on on pronunciation pronunciation of ofxxındy◦ındy Czech: ks English: usually as z Hindi: as ks., l#mF can be transliterated either Lakshmi or Laxmi ◦ NoteNote on on pronunciation pronunciation of ofxxındy◦ındy Czech: ks English: usually as z Hindi: as ks., l#mF can be transliterated either Lakshmi or Laxmi Chinese: as sh ◦ NoteNote on on pronunciation pronunciation of ofxxındy◦ındy Czech: ks English: usually as z Hindi: as ks., l#mF can be transliterated either Lakshmi or Laxmi Chinese: as sh Russian: 娤¨ (meaning Hindi) ◦ NoteNote on on pronunciation pronunciation of ofxxındy◦ındy Czech: ks English: usually as z Hindi: as ks., l#mF can be transliterated either Lakshmi or Laxmi Chinese: as sh Russian: 娤¨ (meaning Hindi) x◦ındy sorts Hindi MakeIndexMakeIndex • version for English and German • CSIndex – version for Czech and Slovak • unpublished version for Sanskrit (Mark Csernel) Tables defining the sort algorithm are hard-wired in the pro- gram source code. Modification for other languages is difficult and leads rather to confusion than to development of a univer- sal tool. InternationalInternationalMakeIndexMakeIndex • Tables defining the sort algorithm present in external files. • Sort rules defined by regular expressions. -
Prakriya Documentation Release 0.0.7
prakriya Documentation Release 0.0.7 Dr. Dhaval Patel Dec 17, 2018 Contents 1 prakriya 3 1.1 Features..................................................3 1.2 Support..................................................3 1.3 Credits..................................................3 2 Installation 5 2.1 Stable release...............................................5 2.2 From sources...............................................5 3 Usage 7 4 Contributing 11 4.1 Types of Contributions.......................................... 11 4.2 Get Started!................................................ 12 4.3 Pull Request Guidelines......................................... 13 4.4 Tips.................................................... 13 5 Credits 15 5.1 Development Lead............................................ 15 5.2 Contributors............................................... 15 6 History 17 6.1 0.0.1 (2017-12-30)............................................ 17 6.2 0.0.2 (2018-01-01)............................................ 17 6.3 0.0.3 (2018-01-02)............................................ 17 6.4 0.0.4 (2018-01-03)............................................ 17 6.5 0.0.5 (2018-01-13)............................................ 17 6.6 0.0.6 (2018-01-16)............................................ 18 6.7 0.0.7 (2018-01-21)............................................ 18 6.8 0.1.0 (2018-12-17)............................................ 18 7 Indices and tables 19 Python Module Index 21 i ii prakriya Documentation, Release 0.0.7 Contents: Contents -
Tugboat, Volume 15 (1994), No. 4 447 Indica, an Indic Preprocessor
TUGboat, Volume 15 (1994), No. 4 447 HPXC , an Indic preprocessor for T X script, ...inMalayalam, Indica E 1 A Sinhalese TEXSystem hpx...inSinhalese,etc. This justifies the choice of a common translit- Yannis Haralambous eration scheme for all Indic languages. But why is Abstract a preprocessor necessary, after all? A common characteristic of Indic languages is In this paper a two-fold project is described: the first the fact that the short vowel ‘a’ is inherent to con- part is a generalized preprocessor for Indic scripts (scripts of languages currently spoken in India—except Urdu—, sonants. Vowels are written by adding diacritical Sanskrit and Tibetan), with several kinds of input (LATEX marks (or smaller characters) to consonants. The commands, 7-bit ascii, CSX, ISO/IEC 10646/unicode) beauty (and complexity) of these scripts comes from and TEX output. This utility is written in standard Flex the fact that one needs a special way to denote the (the gnu version of Lex), and hence can be painlessly absence of vowel. There is a notorious diacritic, compiled on any platform. The same input methods are called “vir¯ama”, present in all Indic languages, which used for all Indic languages, so that the user does not is used for this reason. But it seems illogical to add a need to memorize different conventions and commands sign, to specify the absence of a sound. On the con- for each one of them. Moreover, the switch from one lan- trary, it seems much more logical to remove some- guage to another can be done by use of user-defineable thing, and what is done usually is that letters are preprocessor directives. -
2 the Theory of Lexical Accents
2 The Theory of Lexical Accents 2.1. Introduction This chapter develops a theory for the lexical specification of morphemes. Underlying metrical information is given the name ‘lexical accent’ or just ‘mark’.1 In this thesis I argue that a lexical accent is an autosegmental feature that does not provide any clues about its phonetic manifestation. A lexical accent is liable to the demands of the phonological constraints of the grammar and, if qualified, it will be phonetically assigned duration, pitch and intensity. This chapter starts with an outline of the theory of marking. Based on empirical evidence, I establish that it is better to view lexical accents as autosegmental features and not as prosodic roles (McCarthy and Prince 1995, McCarthy 1995, 1997). I further propose that the mapping between lexical accents and the vocalic peaks that sponsor them is established in terms of universal constraints and, more specifically, in terms of faithfulness constraints. The chapter continues with a brief review of other approaches on marking. There is little consensus in the literature on the nature of lexical stress. Three mainstream theories of lexical specification are examined. First, there are theories that argue that the mark is a prosodic constituent that is assigned to a morpheme in the lexicon. Two approaches are examined, Inkelas’s (1994) theory of exceptional stress in Turkish and Alderete’s (1997) theory of lexical 1 The terms ‘mark’, ‘marking’ and ‘markedness’ in this thesis refer to the lexical accent and the property of some morphemes to have accents in their lexical representation. This use of the terminology should not be confused with the role that markedness theory plays in OT. -
Contrastive Modelling of the Intonation of Recapitulatory Echo Interrogative Sentences in Modern American English and Cuban Spanish
View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Nottingham ePrints Garcia Romero, Gilberto (2013) Contrastive modelling of the intonation of recapitulatory echo interrogative sentences in modern American English and Cuban Spanish. MA(Res) thesis, University of Nottingham. Access from the University of Nottingham repository: http://eprints.nottingham.ac.uk/13527/1/AN_INTONATIONAL_ANALYSIS_OF_THE_INTERR OGATIVE_SYSTEMS_IN_MODERN_ENGLISH_AND_CUBAN_SPANISH_2012.pdf Copyright and reuse: The Nottingham ePrints service makes this work by researchers of the University of Nottingham available open access under the following conditions. · Copyright and all moral rights to the version of the paper presented here belong to the individual author(s) and/or other copyright owners. · To the extent reasonable and practicable the material made available in Nottingham ePrints has been checked for eligibility before being made available. · Copies of full items can be used for personal research or study, educational, or not- for-profit purposes without prior permission or charge provided that the authors, title and full bibliographic details are credited, a hyperlink and/or URL is given for the original metadata page and the content is not changed in any way. · Quotations or similar reproductions must be sufficiently acknowledged. Please see our full end user licence at: http://eprints.nottingham.ac.uk/end_user_agreement.pdf A note on versions: The version presented here may differ from the published version or from the version of record. If you wish to cite this item you are advised to consult the publisher’s version. Please see the repository url above for details on accessing the published version and note that access may require a subscription. -
Constraints on Structual Borrowing in a Multilingual Contact Situation
University of Pennsylvania ScholarlyCommons IRCS Technical Reports Series Institute for Research in Cognitive Science 5-1-2005 Constraints on Structual Borrowing in a Multilingual Contact Situation Tara S. Sanchez University of Pennsylvania, [email protected] Follow this and additional works at: https://repository.upenn.edu/ircs_reports Part of the Linguistics Commons Sanchez, Tara S., "Constraints on Structual Borrowing in a Multilingual Contact Situation" (2005). IRCS Technical Reports Series. 4. https://repository.upenn.edu/ircs_reports/4 University of Pennsylvania Institute for Research in Cognitive Science Technical Report No. IRCS-05-01 This paper is posted at ScholarlyCommons. https://repository.upenn.edu/ircs_reports/4 For more information, please contact [email protected]. Constraints on Structual Borrowing in a Multilingual Contact Situation Abstract Many principles of structural borrowing have been proposed, all under qualitative theories. Some argue that linguistic conditions must be met for borrowing to occur (‘universals’); others argue that aspects of the socio-demographic situation are more relevant than linguistic considerations (e.g. Thomason and Kaufman 1988). This dissertation evaluates the roles of both linguistic and social factors in structural borrowing from a quantitative, variationist perspective via a diachronic and ethnographic examination of the language contact situation on Aruba, Bonaire, and Curaçao, where the berian creole, Papiamentu, is in contact with Spanish, Dutch, and English. Data are fro m texts (n=171) and sociolinguistic interviews (n=129). The progressive, the passive construction, and focus fronting are examined. In addition, variationist methods were applied in a novel way to the system of verbal morphology. The degree to which borrowed morphemes are integrated into Papiamentu was noted at several samplings over a 100-year time span. -
SOUTHERN LISU DICTIONARY Qaaaqrc Qbq[D @^J Hell Ebll Ell
STEDT Monograph Series, No. 4 James A. Matisoff, general editor SOUTHERN LISU DICTIONARY QaaaqRc Qbq[d @^j Hell Ebll ell David Bradley with Edward Reginald Hope, James Fish and Maya Bradley Sino-Tibetan Etymological Dictionary and Thesaurus Project Center for Southeast Asia Studies University of California, Berkeley 2006 © 2005 David Bradley All Rights Reserved ISBN 0-944613-43-8 Volume #4 in the STEDT Monograph Series Sino-Tibetan Etymological Dictionary and Thesaurus Project <http://stedt.berkeley.edu/> Department of Linguistics research unit in International and Area Studies University of California, Berkeley Sino-Tibetan Etymological Dictionary and Thesaurus Monograph Series General Editor JAMES A. MATISOFF University of California, Berkeley Previous Titles in the STEDT Monograph Series: STEDT MONOGRAPH NO. 1A: Bibliography of the International Conferences on Sino-Tibetan Languages and Linguistics I-XXV (second edition) STEDT MONOGRAPH NO. 2: Annotated Directory of Tibeto-Burman Languages and Dialects (revised) STEDT MONOGRAPH NO. 3: Phonological Inventories of Tibeto- Burman Languages Author’s Dedication: for my Lisu friends CONTENTS Series Editor’s Introduction vii Introduction xv The Lisu xv Lisu Phonology xviii Lisu Orthographies xxv Lisu Syntax xxviii Acknowledgements xxix References xxxi Hel Bck Ubl (Lisu Introduction) xxxiii List of Abbreviations xxxiv @ b 1 @\ bj 14 A p 17 A\ pj 31 B pæ 33 B\ pæj 42 C d 45 D t 56 E tæ 70 F g 80 G k 87 H kæ 101 I dÔ 112 J tΔ 121 K tΔæ 133 L dz 146 M ts 155 N tsæ 163 O m 173 O\ mj 194 P n 198