Annotating Sanskrit Corpus: Adapting IL-POSTS
Total Page:16
File Type:pdf, Size:1020Kb

Load more
Recommended publications
-
OSU WPL # 27 (1983) 140- 164. the Elimination of Ergative Patterns Of
OSU WPL # 27 (1983) 140- 164. The Elimination of Ergative Patterns of Case-Marking and Verbal Agreement in Modern Indic Languages Gregory T. Stump Introduction. As is well known, many of the modern Indic languages are partially ergative, showing accusative patterns of case- marking and verbal agreement in nonpast tenses, but ergative patterns in some or all past tenses. This partial ergativity is not at all stable in these languages, however; what I wish to show in the present paper, in fact, is that a large array of factors is contributing to the elimination of partial ergativity in the modern Indic languages. The forces leading to the decay of ergativity are diverse in nature; and any one of these may exert a profound influence on the syntactic development of one language but remain ineffectual in another. Before discussing this erosion of partial ergativity in Modern lndic, 1 would like to review the history of what the I ndian grammar- ians call the prayogas ('constructions') of a past tense verb with its subject and direct object arguments; the decay of Indic ergativity is, I believe, best envisioned as the effect of analogical develop- ments on or within the system of prayogas. There are three prayogas in early Modern lndic. The first of these is the kartariprayoga, or ' active construction' of intransitive verbs. In the kartariprayoga, the verb agrees (in number and p,ender) with its subject, which is in the nominative case--thus, in Vernacular HindOstani: (1) kartariprayoga: 'aurat chali. mard chala. woman (nom.) went (fern. sg.) man (nom.) went (masc. -
Recognition of Online Handwritten Gurmukhi Strokes Using Support Vector Machine a Thesis
Recognition of Online Handwritten Gurmukhi Strokes using Support Vector Machine A Thesis Submitted in partial fulfillment of the requirements for the award of the degree of Master of Technology Submitted by Rahul Agrawal (Roll No. 601003022) Under the supervision of Dr. R. K. Sharma Professor School of Mathematics and Computer Applications Thapar University Patiala School of Mathematics and Computer Applications Thapar University Patiala – 147004 (Punjab), INDIA June 2012 (i) ABSTRACT Pen-based interfaces are becoming more and more popular and play an important role in human-computer interaction. This popularity of such interfaces has created interest of lot of researchers in online handwriting recognition. Online handwriting recognition contains both temporal stroke information and spatial shape information. Online handwriting recognition systems are expected to exhibit better performance than offline handwriting recognition systems. Our research work presented in this thesis is to recognize strokes written in Gurmukhi script using Support Vector Machine (SVM). The system developed here is a writer independent system. First chapter of this thesis report consist of a brief introduction to handwriting recognition system and some basic differences between offline and online handwriting systems. It also includes various issues that one can face during development during online handwriting recognition systems. A brief introduction about Gurmukhi script has also been given in this chapter In the last section detailed literature survey starting from the 1979 has also been given. Second chapter gives detailed information about stroke capturing, preprocessing of stroke and feature extraction. These phases are considered to be backbone of any online handwriting recognition system. Recognition techniques that have been used in this study are discussed in chapter three. -
Sanskrit Alphabet
Sounds Sanskrit Alphabet with sounds with other letters: eg's: Vowels: a* aa kaa short and long ◌ к I ii ◌ ◌ к kii u uu ◌ ◌ к kuu r also shows as a small backwards hook ri* rri* on top when it preceeds a letter (rpa) and a ◌ ◌ down/left bar when comes after (kra) lri lree ◌ ◌ к klri e ai ◌ ◌ к ke o au* ◌ ◌ к kau am: ah ◌ं ◌ः कः kah Consonants: к ka х kha ga gha na Ê ca cha ja jha* na ta tha Ú da dha na* ta tha Ú da dha na pa pha º ba bha ma Semivowels: ya ra la* va Sibilants: sa ш sa sa ha ksa** (**Compound Consonant. See next page) *Modern/ Hindi Versions a Other ऋ r ॠ rr La, Laa (retro) औ au aum (stylized) ◌ silences the vowel, eg: к kam झ jha Numero: ण na (retro) १ ५ ॰ la 1 2 3 4 5 6 7 8 9 0 @ Davidya.ca Page 1 Sounds Numero: 0 1 2 3 4 5 6 7 8 910 १॰ ॰ १ २ ३ ४ ६ ७ varient: ५ ८ (shoonya eka- dva- tri- catúr- pancha- sás- saptán- astá- návan- dásan- = empty) works like our Arabic numbers @ Davidya.ca Compound Consanants: When 2 or more consonants are together, they blend into a compound letter. The 12 most common: jna/ tra ttagya dya ddhya ksa kta kra hma hna hva examples: for a whole chart, see: http://www.omniglot.com/writing/devanagari_conjuncts.php that page includes a download link but note the site uses the modern form Page 2 Alphabet Devanagari Alphabet : к х Ê Ú Ú º ш @ Davidya.ca Page 3 Pronounce Vowels T pronounce Consonants pronounce Semivowels pronounce 1 a g Another 17 к ka v Kit 42 ya p Yoga 2 aa g fAther 18 х kha v blocKHead -
Tugboat, Volume 15 (1994), No. 4 447 Indica, an Indic Preprocessor
TUGboat, Volume 15 (1994), No. 4 447 HPXC , an Indic preprocessor for T X script, ...inMalayalam, Indica E 1 A Sinhalese TEXSystem hpx...inSinhalese,etc. This justifies the choice of a common translit- Yannis Haralambous eration scheme for all Indic languages. But why is Abstract a preprocessor necessary, after all? A common characteristic of Indic languages is In this paper a two-fold project is described: the first the fact that the short vowel ‘a’ is inherent to con- part is a generalized preprocessor for Indic scripts (scripts of languages currently spoken in India—except Urdu—, sonants. Vowels are written by adding diacritical Sanskrit and Tibetan), with several kinds of input (LATEX marks (or smaller characters) to consonants. The commands, 7-bit ascii, CSX, ISO/IEC 10646/unicode) beauty (and complexity) of these scripts comes from and TEX output. This utility is written in standard Flex the fact that one needs a special way to denote the (the gnu version of Lex), and hence can be painlessly absence of vowel. There is a notorious diacritic, compiled on any platform. The same input methods are called “vir¯ama”, present in all Indic languages, which used for all Indic languages, so that the user does not is used for this reason. But it seems illogical to add a need to memorize different conventions and commands sign, to specify the absence of a sound. On the con- for each one of them. Moreover, the switch from one lan- trary, it seems much more logical to remove some- guage to another can be done by use of user-defineable thing, and what is done usually is that letters are preprocessor directives. -
Inflectional Morphology Analyzer for Sanskrit
Inflectional Morphology Analyzer for Sanskrit Girish Nath Jha, Muktanand Agrawal, Subash, Sudhir K. Mishra, Diwakar Mani, Diwakar Mishra, Manji Bhadra, Surjit K. Singh [email protected] Special Centre for Sanskrit Studies Jawaharlal Nehru University New Delhi-110067 The paper describes a Sanskrit morphological analyzer that identifies and analyzes inflected noun- forms and verb-forms in any given sandhi-free text. The system which has been developed as java servlet RDBMS can be tested at http://sanskrit.jnu.ac.in (Language Processing Tools > Sanskrit Tinanta Analyzer/Subanta Analyzer) with Sanskrit data as unicode text. Subsequently, the separate systems of subanta and ti anta will be combined into a single system of sentence analysis with karaka interpretation. Currently, the system checks and labels each word as three basic POS categories - subanta, ti anta, and avyaya. Thereafter, each subanta is sent for subanta processing based on an example database and a rule database. The verbs are examined based on a database of verb roots and forms as well by reverse morphology based on Pāini an techniques. Future enhancements include plugging in the amarakosha ( http://sanskrit.jnu.ac.in/amara ) and other noun lexicons with the subanta system. The ti anta will be enhanced by the k danta analysis module being developed separately. 1. Introduction The authors in the present paper are describing the subanta and ti anta analysis systems for Sanskrit which are currently running at http://sanskrit.jnu.ac.in . Sanskrit is a heavily inflected language, and depends on nominal and verbal inflections for communication of meaning. A fully inflected unit is called pada . -
Devan¯Agar¯I for TEX Version 2.17.1
Devanagar¯ ¯ı for TEX Version 2.17.1 Anshuman Pandey 6 March 2019 Contents 1 Introduction 2 2 Project Information 3 3 Producing Devan¯agar¯ıText with TEX 3 3.1 Macros and Font Definition Files . 3 3.2 Text Delimiters . 4 3.3 Example Input Files . 4 4 Input Encoding 4 4.1 Supplemental Notes . 4 5 The Preprocessor 5 5.1 Preprocessor Directives . 7 5.2 Protecting Text from Conversion . 9 5.3 Embedding Roman Text within Devan¯agar¯ıText . 9 5.4 Breaking Pre-Defined Conjuncts . 9 5.5 Supported LATEX Commands . 9 5.6 Using Custom LATEX Commands . 10 6 Devan¯agar¯ıFonts 10 6.1 Bombay-Style Fonts . 11 6.2 Calcutta-Style Fonts . 11 6.3 Nepali-Style Fonts . 11 6.4 Devan¯agar¯ıPen Fonts . 11 6.5 Default Devan¯agar¯ıFont (LATEX Only) . 12 6.6 PostScript Type 1 . 12 7 Special Topics 12 7.1 Delimiter Scope . 12 7.2 Line Spacing . 13 7.3 Hyphenation . 13 7.4 Captions and Date Formats (LATEX only) . 13 7.5 Customizing the date and captions (LATEX only) . 14 7.6 Using dvnAgrF in Sections and References (LATEX only) . 15 7.7 Devan¯agar¯ıand Arabic Numerals . 15 7.8 Devan¯agar¯ıPage Numbers and Other Counters (LATEX only) . 15 1 7.9 Category Codes . 16 8 Using Devan¯agar¯ıin X E LATEXand luaLATEX 16 8.1 Using Hindi with Polyglossia . 17 9 Using Hindi with babel 18 9.1 Installation . 18 9.2 Usage . 18 9.3 Language attributes . 19 9.3.1 Attribute modernhindi . -
NMHS Fourth Quarterly Report-Jan to Dec 2020
Report National Mission on Himalayan Studies (NMHS) PERFORMA FOR THE QUARTERLY PROGRESS REPORT (Reporting Period from January to March 2020) 1. Project Information Project ID NMHS/2018-19/MG54/05 Project Title Improving capacity and strengthening wildlife conservation for sustainable livelihoods in Kashmir Himalaya Project Proponent Dr. Karthikeyan Vasudevan CSIR-Centre for Cellular and Molecular Biology (CCMB) Laboratory for Conservation of Endangered Species (LaCONES), Attapur Hyderguda, Hyderabad-500048 2. Objectives • To implement programs for the conservation of Hangul (Cervus elaphus hanglu) using assisted reproduction technologies. • To assess the carnivore population (leopards and black bear) using non-invasive DNA based methods by involving the management team and local people • To screen livestock and wild ungulates for diseases and develop protocols for surveillance and monitoring • Provide training to scientists, forest department staff and skills to the local institutions to serve as referral centres for wildlife forensics (both plant and animals) using DNA based technologies 3. General Conditions • The project proponent will train selected scientists from BNHS, SACON, ZSI, WII and WWF under capacity building part. • Training of the state officials from line agencies should also be attempted in consultation with NMHS-PMU, GBPNIHSED. • A report based on baseline data should be submitted by the project proponent in the 1st quarter of the project. • A Certificate should be provided that this work is not the repeat of earlier work (as a mandatory exercise). • The roles and responsibilities of each implementing partners should be delineated properly with their budget. The budget allocations to partners should be done in accordance with the MoEF & CC guidelines (Max. -
Download Download
Phonotactic frequencies in Marathi KELLY HARPER BERKSON1 and MAX NELSON Indiana University Bloomington, Indiana ABSTRACT: Breathy sonorants are cross-linguistically rare, occurring in just 1% of the languages indexed in the UCLA Phonological Segment Inventory Database (UPSID) and 0.2% of those in the PHOIBLE database (Moran, McCloy and Wright 2014). Prior work has shed some light on their acoustic properties, but little work has investigated the language-internal distribution of these sounds in a language where they do occur, such as Marathi (Indic, spoken mainly in Maharashtra, India). With this in mind, we present an overview of the phonotactic frequencies of consonants, vowels, and CV-bigrams in the Marathi portion of the EMILLE/CIIL corpus. Results of a descriptive analysis show that breathy sonorants are underrepresented, making up fewer than 1% of the consonants in the 2.2 million-word corpus, and that they are disfavored in back vowel contexts. 1 Introduction Languages are rife with gradience. While some sounds, morphological patterns, and syntactic structures occur with great frequency in a language, others—though legal—occur rarely. The same is observed crosslinguistically: some elements and structures are found in language after language, while others are quite rare. For at least the last several decades, cognitive and psycholinguistic investigation of language processing has revealed that language users are sensitive not only to categorical phonological knowledge—such as, for example, the sounds and sound combinations that are legal and illegal in one’s language— but also to this kind of gradient information about phonotactic probabilities. This has been demonstrated in a variety of experimental tasks (Ellis 2002; Frisch, Large, and Pisoni 2000; Storkel and Maekawa 2005; Vitevitch and Luce 2005), which indicate that gradience influences language production, comprehension, and processing phenomena (Ellis 2002; Vitevitch, Armbrüster, and Chu 2004) for infants as well as adults (Jusczyk and Luce 1994). -
Downloaded from Brill.Com09/24/2021 10:59:14PM Via Free Access 2 Chapter 1
chapter 1 Introduction: Towards a History of Indic Philology Philology was everywhere and nowhere in premodern India, and this is a problem that demands the attention of anyone interested in the global his- tory of this form of knowledge. That it was everywhere can be established by a broad set of criteria: among them, the evidence of manuscript production and reproduction; the millennia-long history of the disciplines of language science and hermeneutics; and a commonly-held set of textual and interpre- tive practices seen in the works of authors who lived and worked in disparate times, places, languages, and fields. That it was nowhere is equally appar- ent: the civilization of classical and medieval India—that time-deep cultural and social complex whose principal but not exclusive linguistic medium was Sanskrit—produced no self-conscious account of philology (indeed, it lacked a word for it altogether) and, compared to other Eurasian culture-areas like Western Europe, the Arabic ecumene, or the Sinitic world, never witnessed any sort of crisis of textual knowledge which would issue into a set of gen- eral theory of textual authenticity and reliability. On this view, Indic civi- lization produced literati and scholars in great abundance, but no philolo- gists. An anecdote perfectly captures this apparent asymmetry. When Georg Büh- ler, arguably the greatest Indologist of the Victorian period, was in the midst of his tour in search for manuscripts in the valley of Kashmir in 1875, he encoun- tered a “most objectionable habit,” in which manuscripts were “not unfre- quently [sic] ‘cooked,’ i.e. -
An Introduction to Indic Scripts
An Introduction to Indic Scripts Richard Ishida W3C [email protected] HTML version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.html PDF version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.pdf Introduction This paper provides an introduction to the major Indic scripts used on the Indian mainland. Those addressed in this paper include specifically Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. I have used XHTML encoded in UTF-8 for the base version of this paper. Most of the XHTML file can be viewed if you are running Windows XP with all associated Indic font and rendering support, and the Arial Unicode MS font. For examples that require complex rendering in scripts not yet supported by this configuration, such as Bengali, Oriya, and Malayalam, I have used non- Unicode fonts supplied with Gamma's Unitype. To view all fonts as intended without the above you can view the PDF file whose URL is given above. Although the Indic scripts are often described as similar, there is a large amount of variation at the detailed implementation level. To provide a detailed account of how each Indic script implements particular features on a letter by letter basis would require too much time and space for the task at hand. Nevertheless, despite the detail variations, the basic mechanisms are to a large extent the same, and at the general level there is a great deal of similarity between these scripts. It is certainly possible to structure a discussion of the relevant features along the same lines for each of the scripts in the set. -
Direct and Indirect Speech Interrogative Sentences Rules
Direct And Indirect Speech Interrogative Sentences Rules Naif Noach renovates straitly, he unswathes his commandoes very doughtily. Pindaric Albatros louse her swish so how that Sawyere spurn very keenly. Glyphic Barr impassion his xylophages bemuse something. Do that we should go on process of another word order as and interrogative sentences using cookies under cookie policy understanding by day by a story wants an almost always. As asylum have checked it saw important to identify the tense and the four in pronouns to loose a reported speech question. He watched you playing football. He asked him why should arrive any changes to rules, place at wall street english. These lessons are omitted and with an important slides you there. She said she had been teaching English for seven years. Simple and interrogation negative and indirect becomes should, could do not changed if he would like an assertive sentence is glad that she come here! He cried out with sorrow that he was a great fool. Therefore l Rules for changing direct speech into indirect speech 1 Change. We paid our car keys. Direct and Indirect Speech Rules examples and exercises. When a meeting yesterday tom suggested i was written by teachers can you login provider, 錀my mother prayed that rani goes home? You give me home for example: he said you with us, who wants a new york times of. What catch the rules for interrogative sentences to reported. Examples: Jack encouraged me to look for a new job. Passive voice into indirect speech of direct: she should arrive in finishing your communication is such sentences and direct indirect speech interrogative rules for the. -
Types of Interrogative Sentences an Interrogative Sentence Has the Function of Trying to Get an Answer from the Listener
Types of Interrogative Sentences An interrogative sentence has the function of trying to get an answer from the listener, with the premise that the speaker is unable to make judgment on the proposition in question. Interrogative sentences can be classified from various points of view. The most basic approach to the classification of interrogative sentences is to sort out the reasons why the judgment is not attainable. Two main types are true-false questions and suppletive questions (interrogative-word questions). True-false questions are asked because whether the proposition is true or false is not known. Examples: Asu, gakkō ni ikimasu ka? ‘Are you going to school tomorrow?’; Anata wa gakusei? ‘Are you a student?’ One can answer a true-false question with a yes/no answer. The speaker asks a suppletive question because there is an unknown component in the proposition, and that the speaker is unable to make judgment. The speaker places an interrogative word in the place of this unknown component, asking the listener to replace the interrogative word with the information on the unknown component. Examples: Kyō wa nani o taberu? ‘What are we going to eat today?’; Itsu ano hito ni atta no? ‘When did you see him?” “Eating something” and “having met him” are presupposed, and the interrogative word expresses the focus of the question. To answer a suppletive question one provides the information on what the unknown component is. In addition to these two main types, there are alternative questions, which are placed in between the two main types as far as the characteristics are concerned.