Morphological Analyzer for Sanskrit Language.Pdf

Morphological Analyzer for Sanskrit Language.Pdf

The Journey of Indian Languages: Perpectives on Culture and Society ISBN : 978-81-938282-6-7 Morphological Analyzer for Sanskrit Language Jaideepsinh K. Raulji Dr. Jatinderkumar R. Saini 1Lecturer, Ahmedabad University, 1Professor & I/C Director, Narmada Ahmedabad, Gujarat, India. College of Computer Application, 2 Bharuch, Gujarat, India Research Scholar, Dr. Babasaheb Ambedkar Open University, 2Research Supervisor, Dr. Babasaheb Ahmedabad, Gujarat, India Ambedkar Open University, Ahmedabad, Gujarat, India Email: [email protected] Email: [email protected] Word level processing requires knowledge of structure and formation of words. The branch of linguistics which is concerned with formation or creation of words in a language from morphemes in a systematic way is termed as Morphology. Sanskrit is a predecessor of most Indian languages and also a family of Indo-European branch. Knowledge of Sanskrit language may help to understand structure and architecture of other Indo-Aryan languages spoken widely in Indian sub-continent. Here morphological analysis of Sanskrit words through its strong feature of postposition and preposition markers is carried out in a lucid manner. After tokenizing Sanskrit sentence, the retrieved words are compared to indeclinable and pronoun database. The remaining unmatched words are looked up for post and pre position markers for identifying verb forms and then noun forms. The generated results definitely form basis for understanding surface structure of sentence and can be utilized for further improvement of related systems like Information Retrieval, Part of Speech Taggers, Machine Translation etc. Keywords : Morphology, Inflection, Declension, Subanta, Tinanta. 1. INTRODUCTION : Sanskrit belongs to Indo-European family of languages and is considered as a primary language of Vedic civilization. It is one of the 22 languages listed in the Eight Schedule of the Constitution of India. The literary work on Sanskrit grammar – Ashtadhyayi is a treatise from Sage Panini. In Linguistics, morphology is a branch that deals with word formation, analysis and generation. Computational Morphology (CM) is an application of morphological rules in the field of computational linguistics. Morphological analysis is vital for building any basic NLP application and for an inflectionally rich language like Sanskrit, it provides ample information of word with its syntactic and semantic role played in a sentence. Grammatical information like gender, number, person, tense, etc is marked through the inflectional suffixes.[4] Computational Morphology deals with the processing of words in their graphemic and phonemic forms. Its most basic task can be defined as taking a string of characters or phonemes as input and delivering an analysis as output. Sanskrit has two fold morphology, nominal (subanta forms) and verbal (tinanta forms). Sanskrit is rich in inflections. Due to inflection morphology two kinds of padas viz subanta padas (nominal words) and tinanta padas (verbal forms) are formed. The Paninian analysis for Sanskrit has categorized each and every usable word under these two categories(subanta Volume:3| 2019 www.baou.edu.in Page 198 The Journey of Indian Languages: Perpectives on Culture and Society ISBN : 978-81-938282-6-7 and tinanta). With respect to inflection no clear difference between nouns and adjectives is identified. 2. RELATED WORK : Sanskrit, being a free word order language; its syntacto-semantic relations solely depend on word inflections. Hence for analysis of Sanskrit syntax dependency parsers are more suitable by Pawan G, et al[1]. Not being strictly positional, sentential discourse requires strong morphological analysis. To develop algorithm for Sanskrit parser Shashank S and Raghav A[2] used Morphological Analyzer as Sanskrit words have rich case endings. They converted Devanagri format to ISCII format. Using DFA, root word along with its attributes are retrieved. Each word is checked against avyaya, pronoun, verb and noun tree sequentially. The whole analysis is done on basis of paradigm table. Amba K and Devanand S [3] graphically represented Sanskrit morphology described by Panini. They built FST for analyzing Sanskrit inflectional forms. Namrata T and Suresh J [5] built a rule based POS tagger, where rules are stored in the database and the word is compared to database after suffix stripping. They also introduced parsing of Sanskrit sentences using Lexical Functional Grammar [6][8]. Akshar B, et al [8] built morphological analyzer using modular approach of programming paradigm and included modules for Sandhi - Samasa analyzer and formation, Subanta, Tinanta and Kridanta Analyzer. Morphogical and comparative study of Sanskrit and English was carried by Promila B[9], et al in their framework for English to Sanskrit MT. A survey by Sulabh B [12], et al, on Sanskrit Tagsets, Part of Speech tagging methods, techniques and issues in implementing statistical methods due to scarce availability of Sanskrit corpora is discussed. A survey focusing on Sanskrit Grammar, models that are used for POS tagging, NLP analysis methods is done by Sharadha, A, et al [15]. A nice piece of work is carried out by Girish Nath Jha, et al [20] in analyzing inflections morphology. The recognition of avyayas is with the help of avyaya database, recognition of verb is with verb database wherein most common 450 verb root‘s inflectional forms are stored in verb dictionary for matching and subanta recognition through database pattern matching [20]. 3. SANSKRIT MORPHOLOGY : Natural Language Processing is a scientific study of languages with computational perspective[17]. Panini‘s grammar consists of nearly 4000 rules divided into 8 chapters. It describes entire Sanskrit language with all detailed structure of grammar. It is a peculiarity of Panini‘s word formation that he recognizes derivation by suffixes only. Even Panini‘s grammar begins with the alphabet arranged on scientific principles. Morphology also refers to grammatical information hidden in the word. Inflections with words are inbuilt, hence in most scenarios auxillary verb is not required. In Sanskrit, word with complete inflection is independent of expressing itself to various grammatical units. Sanskrit is rich in inflections. Due to inflection morphology two kinds of padas viz subanta padas (nominal words) and tinanta padas (verbal forms) are formed. 3.1 NOUN FORMS / SUBANTA PADA : Inflectional forms or Declension of Nouns, Substantive and Adjectives are considered as Subanta‟s in Sanskrit Language. Morphologically nouns and adjectives behave in similar way. The basic form of noun is called a Pratipadika. A noun has 3 genders namely masculine, feminine and neutral and 3 numbers namely Singular (only one), Dual (only two), Plural (more than two). There are 8 cases in each number namely Nominative, Accusative, Instrumental, Dative, Ablative, Genitive, Locative and Vocative. The case markers remain almost same for substantives and adjectives as nouns. Case Markers (Vibhakti) for a verb Volume:3| 2019 www.baou.edu.in Page 199 The Journey of Indian Languages: Perpectives on Culture and Society ISBN : 978-81-938282-6-7 gives information about Tense, Aspect and Modality (TAM). Vibhakit‟s are so important to nouns endings that though word sequencing is changed the meaning remains same. But if case markings are changed the whole sentence semantics is altered. Hence the vibhaktis are crucial in determining the semantic roles. Karaka defines relationship between Nominal and Verbal root. There are 3 persons in Sanskrit 1. Uttam Purush (First Person) : It refers to myself eg (अहम ् ग楍छामम) I am going. 2. Madhyam Purush (Second Person) : It refers to yourself eg.( 配वम ् ग楍छमि) You are going. 3. Pratham Purush (Third Person) : It refers to they. Eg. (िः गचछति) He is going. Following are the case markers (Karaka System) for Nouns forms. Table 1 (below) – Nominal Case Markings for Gender - Masculine Vibhakti [Cases] Masculine Singular Dual Plural Nominative ःः, ःा, ः ःः, ः , ः , ः , ःाःः, नः, ःः, ः ःः, ःान ् यः, वः Accusative म ् ः , ः , ः , ःान,् न ् , ः न ् , ः न ्, ःः Instrumental ः न, ः ण , ःा , ःा땍याम ् , 땍याम ् ः ःः , 땍यः , म ः ना , , याम ्, Dative ःाय , ः , य ःा땍याम ् , 땍याम ् ः 땍यः , 땍यः , , याम ् यः Ablative ःाि ् , ःः ःा땍याम"् , ः 땍यः, 땍यः , , ः ःः , ः ःः 땍याम ्, याम ् यः Genitive य़ , ःः , ः ःः य ः , ः ःः , व ः ःानाम ्, ःाणाम ् , ः ःः , नः , , न ः , , णाम ् , नाम ् , ः नाम ्, ःाम ् Locative ः , िः , ः , य ः , ः ःः , व ः ः षु , षु , क्षु , िु तन , व , न ः Volume:3| 2019 www.baou.edu.in Page 200 The Journey of Indian Languages: Perpectives on Culture and Society ISBN : 978-81-938282-6-7 Table 2 (below) – Nominal Case Markings for Gender - Feminine Vibhakti [Cases] Feminine Singular Dual Plural Nominative ःा , ः ःः , ः , य , ः , ः , ःः , यः , ठः , ः ःः , ः ःः , वः उः Accusative म ् ः , य , ः , ः , ःः , ः ःः Instrumental या , याम ्, वा , 땍याम ्, याम ् म ः , िःःः ःा Dative य , य , व , व 땍याम ्, याम ् 땍यः , ः Ablative याः , ः ःः , 땍याम ्, 땍याम 땍यः ः ःः , वाः , न ः Genitive याः , वाः य ः , व ः नाम ्, ःाम ् Locative याम ् , ःाम ् , य ः , व ः , ः ःः िु , षु , क्षु िः Table 3 (below) – Nominal Case Markings for Gender - Neutral Vibhakti [Cases] Neuter Singular Dual Plural Nominative म ् , िः , ठः ः , ः , िःण , ःातन , णण , , ःु ण , न रीणण , तन , िति , िः Accusative म ्, िः , ःु , ः , िःण , न ःातन , णण , Volume:3| 2019 www.baou.edu.in Page 201 The Journey of Indian Languages: Perpectives on Culture and Society ISBN : 978-81-938282-6-7 ः णण , तन , , िति , िः Instrumental ः न , णा , ना , 땍याम ्, 땍याम ः ःः , म ः Dative ःाय , ण , न 땍याम ्, 땍याम 땍यः Ablative ःाि ्, णणः , नः 땍याम ् ः 땍यः , 땍यः Genitive य़ , णः , नः य ः , ण ः , न ः ःानाम ् , नाम ् , णाम ् Locative ः , तन , णण य ः , ण ः , न ः षु , क्षु 3.2 PRONOUN All categories of Pronoun (Personal, Demonstrative, Relative, Interrogative, Reflexive, Indefinite, Correlative, Reciprocal, Possessive, Pronominal) with their cases, numbers and genders are directly added to pronoun database; initially the tokenized word is compared to pronoun, if it is positive(true) it is declared the same without continuing further in the algorithm.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    12 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us