Morphological Analyzer for Sanskrit Language.Pdf

Total Page:16

File Type:pdf, Size:1020Kb

Morphological Analyzer for Sanskrit Language.Pdf The Journey of Indian Languages: Perpectives on Culture and Society ISBN : 978-81-938282-6-7 Morphological Analyzer for Sanskrit Language Jaideepsinh K. Raulji Dr. Jatinderkumar R. Saini 1Lecturer, Ahmedabad University, 1Professor & I/C Director, Narmada Ahmedabad, Gujarat, India. College of Computer Application, 2 Bharuch, Gujarat, India Research Scholar, Dr. Babasaheb Ambedkar Open University, 2Research Supervisor, Dr. Babasaheb Ahmedabad, Gujarat, India Ambedkar Open University, Ahmedabad, Gujarat, India Email: [email protected] Email: [email protected] Word level processing requires knowledge of structure and formation of words. The branch of linguistics which is concerned with formation or creation of words in a language from morphemes in a systematic way is termed as Morphology. Sanskrit is a predecessor of most Indian languages and also a family of Indo-European branch. Knowledge of Sanskrit language may help to understand structure and architecture of other Indo-Aryan languages spoken widely in Indian sub-continent. Here morphological analysis of Sanskrit words through its strong feature of postposition and preposition markers is carried out in a lucid manner. After tokenizing Sanskrit sentence, the retrieved words are compared to indeclinable and pronoun database. The remaining unmatched words are looked up for post and pre position markers for identifying verb forms and then noun forms. The generated results definitely form basis for understanding surface structure of sentence and can be utilized for further improvement of related systems like Information Retrieval, Part of Speech Taggers, Machine Translation etc. Keywords : Morphology, Inflection, Declension, Subanta, Tinanta. 1. INTRODUCTION : Sanskrit belongs to Indo-European family of languages and is considered as a primary language of Vedic civilization. It is one of the 22 languages listed in the Eight Schedule of the Constitution of India. The literary work on Sanskrit grammar – Ashtadhyayi is a treatise from Sage Panini. In Linguistics, morphology is a branch that deals with word formation, analysis and generation. Computational Morphology (CM) is an application of morphological rules in the field of computational linguistics. Morphological analysis is vital for building any basic NLP application and for an inflectionally rich language like Sanskrit, it provides ample information of word with its syntactic and semantic role played in a sentence. Grammatical information like gender, number, person, tense, etc is marked through the inflectional suffixes.[4] Computational Morphology deals with the processing of words in their graphemic and phonemic forms. Its most basic task can be defined as taking a string of characters or phonemes as input and delivering an analysis as output. Sanskrit has two fold morphology, nominal (subanta forms) and verbal (tinanta forms). Sanskrit is rich in inflections. Due to inflection morphology two kinds of padas viz subanta padas (nominal words) and tinanta padas (verbal forms) are formed. The Paninian analysis for Sanskrit has categorized each and every usable word under these two categories(subanta Volume:3| 2019 www.baou.edu.in Page 198 The Journey of Indian Languages: Perpectives on Culture and Society ISBN : 978-81-938282-6-7 and tinanta). With respect to inflection no clear difference between nouns and adjectives is identified. 2. RELATED WORK : Sanskrit, being a free word order language; its syntacto-semantic relations solely depend on word inflections. Hence for analysis of Sanskrit syntax dependency parsers are more suitable by Pawan G, et al[1]. Not being strictly positional, sentential discourse requires strong morphological analysis. To develop algorithm for Sanskrit parser Shashank S and Raghav A[2] used Morphological Analyzer as Sanskrit words have rich case endings. They converted Devanagri format to ISCII format. Using DFA, root word along with its attributes are retrieved. Each word is checked against avyaya, pronoun, verb and noun tree sequentially. The whole analysis is done on basis of paradigm table. Amba K and Devanand S [3] graphically represented Sanskrit morphology described by Panini. They built FST for analyzing Sanskrit inflectional forms. Namrata T and Suresh J [5] built a rule based POS tagger, where rules are stored in the database and the word is compared to database after suffix stripping. They also introduced parsing of Sanskrit sentences using Lexical Functional Grammar [6][8]. Akshar B, et al [8] built morphological analyzer using modular approach of programming paradigm and included modules for Sandhi - Samasa analyzer and formation, Subanta, Tinanta and Kridanta Analyzer. Morphogical and comparative study of Sanskrit and English was carried by Promila B[9], et al in their framework for English to Sanskrit MT. A survey by Sulabh B [12], et al, on Sanskrit Tagsets, Part of Speech tagging methods, techniques and issues in implementing statistical methods due to scarce availability of Sanskrit corpora is discussed. A survey focusing on Sanskrit Grammar, models that are used for POS tagging, NLP analysis methods is done by Sharadha, A, et al [15]. A nice piece of work is carried out by Girish Nath Jha, et al [20] in analyzing inflections morphology. The recognition of avyayas is with the help of avyaya database, recognition of verb is with verb database wherein most common 450 verb root‘s inflectional forms are stored in verb dictionary for matching and subanta recognition through database pattern matching [20]. 3. SANSKRIT MORPHOLOGY : Natural Language Processing is a scientific study of languages with computational perspective[17]. Panini‘s grammar consists of nearly 4000 rules divided into 8 chapters. It describes entire Sanskrit language with all detailed structure of grammar. It is a peculiarity of Panini‘s word formation that he recognizes derivation by suffixes only. Even Panini‘s grammar begins with the alphabet arranged on scientific principles. Morphology also refers to grammatical information hidden in the word. Inflections with words are inbuilt, hence in most scenarios auxillary verb is not required. In Sanskrit, word with complete inflection is independent of expressing itself to various grammatical units. Sanskrit is rich in inflections. Due to inflection morphology two kinds of padas viz subanta padas (nominal words) and tinanta padas (verbal forms) are formed. 3.1 NOUN FORMS / SUBANTA PADA : Inflectional forms or Declension of Nouns, Substantive and Adjectives are considered as Subanta‟s in Sanskrit Language. Morphologically nouns and adjectives behave in similar way. The basic form of noun is called a Pratipadika. A noun has 3 genders namely masculine, feminine and neutral and 3 numbers namely Singular (only one), Dual (only two), Plural (more than two). There are 8 cases in each number namely Nominative, Accusative, Instrumental, Dative, Ablative, Genitive, Locative and Vocative. The case markers remain almost same for substantives and adjectives as nouns. Case Markers (Vibhakti) for a verb Volume:3| 2019 www.baou.edu.in Page 199 The Journey of Indian Languages: Perpectives on Culture and Society ISBN : 978-81-938282-6-7 gives information about Tense, Aspect and Modality (TAM). Vibhakit‟s are so important to nouns endings that though word sequencing is changed the meaning remains same. But if case markings are changed the whole sentence semantics is altered. Hence the vibhaktis are crucial in determining the semantic roles. Karaka defines relationship between Nominal and Verbal root. There are 3 persons in Sanskrit 1. Uttam Purush (First Person) : It refers to myself eg (अहम ् ग楍छामम) I am going. 2. Madhyam Purush (Second Person) : It refers to yourself eg.( 配वम ् ग楍छमि) You are going. 3. Pratham Purush (Third Person) : It refers to they. Eg. (िः गचछति) He is going. Following are the case markers (Karaka System) for Nouns forms. Table 1 (below) – Nominal Case Markings for Gender - Masculine Vibhakti [Cases] Masculine Singular Dual Plural Nominative ःः, ःा, ः ःः, ः , ः , ः , ःाःः, नः, ःः, ः ःः, ःान ् यः, वः Accusative म ् ः , ः , ः , ःान,् न ् , ः न ् , ः न ्, ःः Instrumental ः न, ः ण , ःा , ःा땍याम ् , 땍याम ् ः ःः , 땍यः , म ः ना , , याम ्, Dative ःाय , ः , य ःा땍याम ् , 땍याम ् ः 땍यः , 땍यः , , याम ् यः Ablative ःाि ् , ःः ःा땍याम"् , ः 땍यः, 땍यः , , ः ःः , ः ःः 땍याम ्, याम ् यः Genitive य़ , ःः , ः ःः य ः , ः ःः , व ः ःानाम ्, ःाणाम ् , ः ःः , नः , , न ः , , णाम ् , नाम ् , ः नाम ्, ःाम ् Locative ः , िः , ः , य ः , ः ःः , व ः ः षु , षु , क्षु , िु तन , व , न ः Volume:3| 2019 www.baou.edu.in Page 200 The Journey of Indian Languages: Perpectives on Culture and Society ISBN : 978-81-938282-6-7 Table 2 (below) – Nominal Case Markings for Gender - Feminine Vibhakti [Cases] Feminine Singular Dual Plural Nominative ःा , ः ःः , ः , य , ः , ः , ःः , यः , ठः , ः ःः , ः ःः , वः उः Accusative म ् ः , य , ः , ः , ःः , ः ःः Instrumental या , याम ्, वा , 땍याम ्, याम ् म ः , िःःः ःा Dative य , य , व , व 땍याम ्, याम ् 땍यः , ः Ablative याः , ः ःः , 땍याम ्, 땍याम 땍यः ः ःः , वाः , न ः Genitive याः , वाः य ः , व ः नाम ्, ःाम ् Locative याम ् , ःाम ् , य ः , व ः , ः ःः िु , षु , क्षु िः Table 3 (below) – Nominal Case Markings for Gender - Neutral Vibhakti [Cases] Neuter Singular Dual Plural Nominative म ् , िः , ठः ः , ः , िःण , ःातन , णण , , ःु ण , न रीणण , तन , िति , िः Accusative म ्, िः , ःु , ः , िःण , न ःातन , णण , Volume:3| 2019 www.baou.edu.in Page 201 The Journey of Indian Languages: Perpectives on Culture and Society ISBN : 978-81-938282-6-7 ः णण , तन , , िति , िः Instrumental ः न , णा , ना , 땍याम ्, 땍याम ः ःः , म ः Dative ःाय , ण , न 땍याम ्, 땍याम 땍यः Ablative ःाि ्, णणः , नः 땍याम ् ः 땍यः , 땍यः Genitive य़ , णः , नः य ः , ण ः , न ः ःानाम ् , नाम ् , णाम ् Locative ः , तन , णण य ः , ण ः , न ः षु , क्षु 3.2 PRONOUN All categories of Pronoun (Personal, Demonstrative, Relative, Interrogative, Reflexive, Indefinite, Correlative, Reciprocal, Possessive, Pronominal) with their cases, numbers and genders are directly added to pronoun database; initially the tokenized word is compared to pronoun, if it is positive(true) it is declared the same without continuing further in the algorithm.
Recommended publications
  • “The Role of Specific Grammar for Interpretation in Sanskrit”
    Quest Journals Journal of Research in Humanities and Social Science Volume 9 ~ Issue 2 (2021)pp: 107-187 ISSN(Online):2321-9467 www.questjournals.org Research Paper “The Role of Specific Grammar for Interpretation in Sanskrit” Dr. Shibashis Chakraborty Sact-I Depatment of Sanskrit, Panskura Banamali College Wb, India. Abstract: Sanskrit enjoys a place of pride among Indian languages in terms of technology solutions that are available for it within India and abroad. The Indian government through its various agencies has been heavily funding other Indian languages for technology development but the funding for Sanskrit has been slow for a variety of reasons. Despite that, the work in the field has not suffered. The following sections do a survey of the language technology R&D in Sanskrit and other Indian languages. The word `Sanskrit’ means “prepared, pure, refined or prefect”. It was not for nothing that it was called the `devavani’ (language of the Gods). It has an outstanding place in our culture and indeed was recognized as a language of rare sublimity by the whole world. Sanskrit was the language of our philosophers, our scientists, our mathematicians, our poets and playwrights, our grammarians, our jurists, etc. In grammar, Panini and Patanjali (authors of Ashtadhyayi and the Mahabhashya) have no equals in the world; in astronomy and mathematics the works of Aryabhatta, Brahmagupta and Bhaskar opened up new frontiers for mankind, as did the works of Charak and Sushrut in medicine. In philosophy Gautam (founder of the Nyaya system), Ashvaghosha (author of Buddha Charita), Kapila (founder of the Sankhya system), Shankaracharya, Brihaspati, etc., present the widest range of philosophical systems the world has ever seen, from deeply religious to strongly atheistic.
    [Show full text]
  • On the Architecture of P¯An.Ini's Grammar
    On the Architecture of P¯an.ini’s Grammar∗ Paul Kiparsky Stanford University 1 Organization of the Grammar 1.1 Introduction P¯an. ini’s grammar is universally admired for its insightful analysis of Sanskrit. In addition, some of its features have a more specialized appeal. Sanskritists prize the completeness of its descriptive coverage of the spoken standard language (bh¯as.¯a) of P¯an. ini’s time, and the often unique information it provides on Vedic, regional and even sociolinguistic usage.1 Theoretical linguists of all persuasions are in addition impressed by its remarkable conciseness, and by the rigorous consistency with which it deploys its semi-formalized metalanguage, a grammatically and lexically regimented form of Sanskrit. Empiricists like Bloomfield also admired it for another, more specific reason, namely that it is based on nothing but very general principles such as simplicity, without prior commitments to any scheme of “universal grammar”, or so it seems, and proceeds from a strictly synchronic perspective. Generative linguists for their part have marveled especially at its ingenious technical devices, and at intricate system of conventions governing rule application and rule interaction that it presupposes, which seem to uncannily anticipate ideas of modern linguistic theory (if only because many of them were originally borrowed from P¯an. ini in the first place). This universal admiration of P¯an. ini poses a problem. Why do linguists who don’t approve of each other nevertheless agree in extolling P¯an. ini? Each school of linguistics seems to fashion its own portrait of P¯an. ini. In the following pages I propose to reconcile the Bloomfieldian portrait of P¯an.
    [Show full text]
  • Evidentiality in Kalmyk1
    1 Evidentiality in Kalmyk1 Elena Skribnik, Olga Seesing 1 Kalmyk Kalmyk is an endangered2 Western Mongolic language spoken in the Republic of Kalmykia (Russian Federation, lower Volga region). The number of native speakers of Kalmyk is ca. 180 000 (census 2010). Kalmyk3 is an agglutinative language with AOV/SV basic constituent order, postfixes and postpositions, and vowel (backness and rounding) harmony. The nominal system comprises nine cases and possessivity (both personal and reflexive). The verbal system includes ten moods: indicative, precative, voluntative, optative (both simple and expanded), benedictive, concessive, permissive, dubitative and potential (Bläsing 2003: 241); in the indicative, eight simple (synthetic) temporal, aspectual and evidential forms are used (see below). The category of voice has derivational character and expresses the passive with the marker -gd-, causative with -Ul4-, reciprocal with -ld- and sociative with -lc-. Negation involves several negative particles: the prepositional prohibitive bičä with imperative forms, the postpositional contrastive biš with nominal predicates, and prepositional es and postpositional uga with participles and converbs. Indicative finite forms, with one exception, cannot be combined with negation, and instead special constructions based on participles and converbs with uga are used (asymmetric negation). Clausal subordination in Kalmyk is based on an extensive system of non-finite forms: ten converbs and seven polyfunctional forms traditionally labelled as participles. Subject agreement of finite predicates is based on the personal predicative paradigm derived from personal pronouns (with the exception of the unmarked third person); with non-finite dependent predicates possessive affixes are used (personal possessive in different-subject 2 sentences, reflexive-possessive in same-subject ones).
    [Show full text]
  • Mandukya Upanishad: Word-For-Word Translation With
    MᾹṆḌU ̄KYA UPANIṢAD Word-for-Word Translation with Transliteration and Grammatical Notes Stephanie Simoes Please email me if you have questions or corrections: [email protected]. 22 Introduction Sanskrit is an inherently vague language: not only are there various possible renderings for individual words, but the word order is quite loose. Because of this, many translations are often possible for a giv en passage. This translation allows readers with no knowledge of Sanskrit to explore the different possible meanings of the text with the help of the Monier-Williams Sanskrit-English Dictionary, which can be accessed online. The format of this translation closely follows that of Winthrop Sargeant’s translation of the Bhagavad-Gītā.. At the top of the left-hand column is the transliterated text, and beneath this is a suggested translation. In the right-hand column, each Sanskrit word is listed without sandhi11, followed by bracketed grammatical notes and a few possible renderings. The grammatical notes are structured as follows: After nouns, I have listed the gender, case, and number, followed by the stem (or, for pronouns, the base). It is generally the noun stem (with or without prefixes) that must be entered into the online Monier-Williams Dictionary in order to obtain results. In many cases, I have listed the verbal root from which the stem is derived, and have written any prefixes or suffixes separately. After verbs, I have listed the person, number, mood, voice, and, when applicable, secondary conjugations. This is followed by the root (again separated from prefixes and suffixes). Often, more than one case, gender, etc.
    [Show full text]
  • A Morphological Study of Sinhalese by Adikary
    A MORPHOLOGICAL STUDY OF SINHALESE BY ADIKARY ARACHCHIGE ABHAYASINGHE THESIS SUBMITTED FOR THE DEGREE OF DOCTOR OF PHILOSOPHY IN THE UNIVERSITY OF YORK DEPARTMENT OF LANGUAGE 1975 IMAGING SERVICES NORTH Boston Spa, Wetherby West Yorkshire, LS23 7BQ www.bl.uk mry an mry BRITISH BEST COPY AVAILABLE. VARIABLE PRINT QUALITY AND DAMAGED PAGES ACKN OWLEDGEMENTS I would like to put on record my gratitude to the following : Professor R.B.Le Page : for general guidance Dr. M.W.S.De Silva : for supervising my work Mr. M.K.Verma : for his comments and suggestions Staff members of the Department of Language The secretaries of the Department of Language Staff members of the J.B.Morrell Library of the University of York My Ceylonese colleagues at the University of York Mr. Clive Warlow : for suggesting many improvements in my English My wife, Wimala : for her constant encouragement and help The Vidyalankara University of Ceylon s for granting leave and financing me throughout. March 1973 A .A.Abhayas inghe iii CONTENTS Acknowledgements ii Abbreviations and symbols ix Introduction 1 1. The Sinhalese language 1 2. The Diglossic situation 1 J. Prose-poetry distinction in literary Sinhalese 2 4. Differences in the spoken variety 2 5. The language of this study 3 6. Informants and data 4 7. Morphology 6 8. The word as a linguistic uhit 9 9. The modem trend 16 10. Transformational Grammar 18 11. Sinhalese Morphology: The proposed description 21 12. Citation of examples 25 PART ONE : THE NOUN PHRASE 26 I. Introduction 27 Chapter 1 jq 1.0.
    [Show full text]
  • The Mongolic Languages Routledge Language Family Series
    THE MONGOLIC LANGUAGES ROUTLEDGE LANGUAGE FAMILY SERIES Each volume provides a detailed, reliable account of every member language, or representative language of a particular family. Each account is a reliable source of data, arranged according to the natural system of classification: phonology, mor- phology, syntax, lexicon, semantics, dialectology and socio-linguistics. Each volume is designed to be the essential source of reference for a particular linguistic commu- nity, as well as for linguists working on typology and syntax. The Austronesian Languages of Asia The Manchu-Tungusic Languages and Madagascar Edited by Alexander Vovin Edited by Nikolaus Himmelmann & The Mongolic Languages Sander Adelaar Edited by Juha Janhunen The Bantu Languages The Oceanic Languages Edited by Derek Nurse & Edited by John Lynch, Malcolm Ross & Gérard Philippson Terry Crowley The Celtic Languages The Romance Languages Edited by Martin Ball & James Fife Edited by Martin Harris & Nigel The Dravidian Languages Vincent Edited by Sanford B. Steever The Semitic Languages The Germanic Languages Edited by Robert Hetzron Edited by Johan van der Anwera & The Sino-Tibetan Languages Ekkehard König Edited by Graham Thurgood & Randy The Indo-Aryan Languages LaPolla Edited by George Cardona & Dhanesh The Slavonic Languages Jain Edited by Bernard Comrie & Greville The Indo-European Languages B. Corbett Edited by Paolo Ramat & Anna The Turkic Languages Giacalone Edited by Lars Johanson & Eva Csato The Iranian Languages The Uralic Languages Edited by Gernot Windfuhr Edited by Daniel Abondolo The Khoesan Languages Edited by Raïner Vossen THE MONGOLIC LANGUAGES Edited by Juha Janhunen First published 2003 by Routledge 11 New Fetter Lane, London EC4P 4EE Simultaneously published in the USA and Canada by Routledge 29 West 35th Street, New York, NY 10001 This edition published in the Taylor & Francis e-Library, 2005.
    [Show full text]
  • Mongolian London Oriental and African Language Library
    Mongolian London Oriental and African Language Library The LONDON ORIENTAL AND AFRICAN LANGUAGE LIBRARY aims to make available reliable and up-to-date analyses of the grammatical structure of the major Oriental and African languages, in a form readily accessible to the non-specialist. With this in mind, the language material in each volume is in Roman script, and fully glossed and translated. The series is based at the School of Oriental and African Studies of the University of London, Europe’s largest institution specializing in the study of the languages and cultures of Africa and Asia. Each volume is written by an acknowledged expert in the field who has carried out original research on the language and has first-hand knowledge of the area in which it is spoken. For an overview of all books published in this series, please see http/benjamins.com/catalog/loall Editors Theodora Bynon David C. Bennett School of Oriental and African Studies University of London Masayoshi Shibatani Kobe University, Japan Rice University, Houston, Texas, USA Advisory Board James Bynon James A. Matisoff Bernard Comrie Christopher Shackle Gilbert Lazard Andrew Simpson Christian Lehmann Volume 19 Mongolian by Juha A. Janhunen Mongolian Juha A. Janhunen University of Helsinki John Benjamins Publishing Company Amsterdam / Philadelphia TM The paper used in this publication meets the minimum requirements of 8 the American National Standard for Information Sciences – Permanence of Paper for Printed Library Materials, ansi z39.48-1984. Cover illustration: Detail of a thangka at the Bogdo Gegen’s summer palace, Ulan Bator. Photo: Juha Janhunen, 2010. Library of Congress Cataloging-in-Publication Data Janhunen, Juha Antero.
    [Show full text]
  • The Mongolic Languages Routledge Language Family Series
    THE MONGOLIC LANGUAGES ROUTLEDGE LANGUAGE FAMILY SERIES Each volume provides a detailed, reliable account of every member language, or representative language of a particular family. Each account is a reliable source of data, arranged according to the natural system of classification: phonology, mor- phology, syntax, lexicon, semantics, dialectology and socio-linguistics. Each volume is designed to be the essential source of reference for a particular linguistic commu- nity, as well as for linguists working on typology and syntax. The Austronesian Languages of Asia The Manchu-Tungusic Languages and Madagascar Edited by Alexander Vovin Edited by Nikolaus Himmelmann & The Mongolic Languages Sander Adelaar Edited by Juha Janhunen The Bantu Languages The Oceanic Languages Edited by Derek Nurse & Edited by John Lynch, Malcolm Ross & Gérard Philippson Terry Crowley The Celtic Languages The Romance Languages Edited by Martin Ball & James Fife Edited by Martin Harris & Nigel The Dravidian Languages Vincent Edited by Sanford B. Steever The Semitic Languages The Germanic Languages Edited by Robert Hetzron Edited by Johan van der Anwera & The Sino-Tibetan Languages Ekkehard König Edited by Graham Thurgood & Randy The Indo-Aryan Languages LaPolla Edited by George Cardona & Dhanesh The Slavonic Languages Jain Edited by Bernard Comrie & Greville The Indo-European Languages B. Corbett Edited by Paolo Ramat & Anna The Turkic Languages Giacalone Edited by Lars Johanson & Eva Csato The Iranian Languages The Uralic Languages Edited by Gernot Windfuhr Edited by Daniel Abondolo The Khoesan Languages Edited by Raïner Vossen THE MONGOLIC LANGUAGES Edited by Juha Janhunen First published 2003 by Routledge 11 New Fetter Lane, London EC4P 4EE Simultaneously published in the USA and Canada by Routledge 29 West 35th Street, New York, NY 10001 This edition published in the Taylor & Francis e-Library, 2005.
    [Show full text]