(IJCSIS) International Journal of Computer Science and Information Security, Vol. 12, No. 8, 2014 Computational Algorithms Based on the Paninian System to Process Euphonic Conjunctions for Word Searches

Kasmir Raja S. V. Rajitha V. Meenakshi Lakshmanan Dean – Research Department of Computer Science Department of Computer Science SRM University Meenakshi College for Women Meenakshi College for Women Chennai, India Chennai, India Chennai, India & [email protected] Research Scholar Mother Teresa Women’s University Kodaikanal, India

Abstract – Searching for words in Sanskrit E-text is a problem and hence finding actual occurrences of words is of great that is accompanied by complexities introduced by features of consequence to not only Sanskrit scholars but to also Sanskrit such as euphonic conjunctions or ‘sandhis’. A word researchers from various other disciplines ranging from could occur in an E-text in a transformed form owing to the philosophy, theology, the arts and the physical and life sciences operation of rules of sandhi. Simple word search would not yield to sociology, medicine and astronomy. these transformed forms of the word. Further, there is no search engine in the literature that can comprehensively search for II. THE PROBLEM words in Sanskrit E-texts taking euphonic conjunctions into As stated above, there are complexities involved in account. This work presents an optimal binary representational schema for letters of the Sanskrit alphabet along with algorithms searching comprehensively for words in a Sanskrit E-text. One to efficiently process the sandhi rules of Sanskrit grammar. The of the major contributors to this complexity is the operation of work further presents an algorithm that uses the sandhi euphonic conjunctions or ‘sandhis’. A sandhi is a point in a processing algorithm to perform a comprehensive word search word or between words, at which adjacent letters coalesce and on E-text. transform [3]. This is a common feature in many Indian languages as against European languages, and has far-reaching Keywords – Sanskrit; euphonic conjunction; sandhi; linguistics; consequences in Sanskrit. The transformation caused by the Panini; Sanskrit word search; E-text search. application of rules of sandhi in Sanskrit can be significant I. INTRODUCTION enough to alter the word itself to such a degree that the transformed word would not show up in a simple word search. Word search in Sanskrit E-texts is a problem that is beset with complexities, unlike in the case of English E-texts. The For example, the word ‘asamardhiḥ’ (meaning of problem assumes relevance in the context of the availability of unmatched affluence), can be transformed into ‘āsamardhiḥ’ rapidly increasing numbers of ancient Sanskrit texts [5-9] in the because of the operation of a euphonic conjunction with a word electronic format. The importance of n-gram analysis of ending in ‘a’ preceding it, or ‘āsamardhir’ or ‘āsamardhis’ in Sanskrit texts for scholars and the tremendous utility of combination with words occurring after it or ‘asamarddhiḥ’ or locating specific words in a variety of texts to aid the scholastic ‘asamardddhiḥ’ by internal transformation. Clearly, simply process can hardly be overemphasized. searching for the word asamardhiḥ would not yield the occurrences of the same word as ‘asamarddhir’, Dating of a text, fixing its authorship with certainty, and ‘āsamardddhir’, or other alternative forms. As such, a normal analysis of the writing style of an author of a text, are some of text-search using a Unicode text editor would not suffice. Other the areas in which n-grams assume criticality especially in the search engines currently used for Sanskrit [13] too do not context of ancient Sanskrit works. Quoting from authoritative provide for such comprehensive searching. texts is imperative in scholarly works, and word searches can provide crucial help in this regard. Locating the portion in a In order to achieve such an exhaustive search, all possible text or texts in which a particular usage or word is found is of forms of the word resulting from the euphonic conjunctions great importance to scholars who write explanatory treatises of that would become operative in its case must be generated and Sanskrit-based works in English and other languages. Semantic searched for in the given text. analysis and understanding of texts are facilitated by finding The authors have already presented a new schema for fast occurrences of words and studying them in different contexts. sandhi processing in earlier work [4]. The present work extends In fact, ancient Sanskrit works are universally acknowledged as the application of that schema to other sandhi rules including being mines of information on a whole spectrum of disciplines,

64 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 12, No. 8, 2014 consonant-based and visarga-based sandhis as well as derivational and known for their mathematical precision in important rules with respect to exceptional cases. It further spite of dealing with the nuances of the language at various presents a complete computational algorithm to process all levels including morphology, syntax, semantics, phonology and sandhis, and an algorithm to apply this sandhi-processing pragmatics. Owing to the cryptic nature of the Aṣṭādhyāyī, one procedure to generate all word forms to enable comprehensive or more of the commentaries on it are required to get a clear searching. understanding of its contents. A. Language Representation The current work deals with Pāṇini’s sandhi-related The Unicode hexadecimal range 0900 - 097F is used to aphorisms with the help of the recognized commentaries, represent Sanskrit characters in Devanāgarī script. The Siddhānta-kaumudī [1] and Kāśikā [2]. Both these characters used to represent Sanskrit letters in English script commentaries are accepted by Sanskrit scholars as authoritative are found in the Basic Latin (0000-007F), Latin-1 Supplement works on Pāṇinian grammar. (0080-00FF), Latin Extended-A (0100-017F) and Latin Pāṇini’s statements of grammatical rules are expressed on Extended Additional (1E00 – 1EFF) Unicode ranges. the basis of the Māheśvara-sūtras, or the ‘aphorisms of The Latin character set has been employed in this work to Maheśvara’. These aphorisms provide a list of all the letters of represent Sanskrit letters as E-text. As such, the schema and the Sanskrit alphabet ordered in a specific sequence. The algorithms presented do not use Devanāgarī script. To use the Māheśvara aphorisms are given below: algorithms for text that is in Devanāgarī script, the text needs 1. a-i-u-ṇ to first be converted to Latin text. 2. ṛ-ḷ-k B. Terminology 3. e-o-ṅ 4. ai-au-c The terminology employed in this work for certain groups 5. --va--ṭ of letters of the Sanskrit alphabet is given in Table 1. 6. -ṇ Table 1: Terminology 7. ña-ma-ṅa-ṇa--m Term Description / Notation 8. jha-bha-ñ Vowel a,ā,i,ī,u,ū,ṛ,ṝ,ḷ,e,ai,o,au 9. gha-ḍha--ṣ Semi-vowel y, v, r, l 10. ---ḍa--ś Consonant k,kh,g,gh,ṅ,c,ch,j,jh,ñ,ṭ,ṭh,ḍ,ḍh,ṇ,t,th,d,dh, 11. kha-pha-cha-ṭha---ṭa--v n,p,ph,b,bh,m,ś,ṣ,s,h 12. --y Guttural k, kh, g, gh, ṅ 13. śa-ṣa--r Palatal c, ch, j, jh, ñ 14. ha-l Cerebral ṭ, ṭh, ḍ, ḍh, ṇ Dental t, th, d, dh, n The last letter in each of these aphorisms is only a place- Labial p, ph, b, bh, m holder. The first four aphorisms list only the short forms of all Nasal ṅ, ñ, ṇ, n, m the vowels, while the rest list the semi-vowels and consonants; Aspirate h the latter list has the vowel ‘a’ appended to each letter only to Sibilant ś, ṣ, s enable pronunciation of the aphorism. Column1 k, c, ṭ, t, p A. The Approach Column2 ṭ kh, ch, h, th, ph The present work is based on earlier work by the authors, Column3 ḍ g, j, , d, b which directly codifies Pāṇini’s rules in a novel way using Column4 ḍ gh, jh, h, d, bh binary representations [4]. The unique data representation ḥ Visarga devised by the authors has been further refined in this work and Anusvāra ṁ consonant-based, visarga-based sandhi rules, as well as some Hard consonant Column1, Column2, Sibilants special sandhi rules have been included in this work. Soft consonant Column3, Column4, Nasals, Aspirate Hard guttural k, kh Rule representation has been simplified to minimal binary Hard labial p, ph set-unset operations. Further, the sūtra ordering has been done Mutes Column1, Column2, Column3, Column4, after acquiring a thorough understanding of the operation of Nasals Pāṇini’s sandhi-related aphorisms. As such, this work presents Jihvāmūlīya ⋏ (pronounced as the end of ‘kah’) a significant extension, refinement and closure of the earlier Upadhmānīya ⋎(pronounced as the end of ‘paf’’) work of the authors. Moreover, it provides a clear understanding of the rules governing sandhi as laid down by III. THE BASIS OF THE WORK Pāṇini, in a comprehensive and simplified way, hitherto not The renowned ancient Sanskrit linguist, Pāṇini, codified the encountered in the literature. extant grammar of Sanskrit into terse aphorisms (‘sūtras’) and B. The Binary Schema organized these aphorisms into eight chapters. This work is the authoritative Aṣṭādhyāyī (literally meaning ‘work in eight The following is an extract from already published work by chapters’) and is universally acknowledged as the most the authors [4] and is included here for completeness of the comprehensive codification of the grammar of any language. presentation. The grammatical rules that make up Pāṇini’s Aṣṭādhyāyī are

65 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 12, No. 8, 2014 A point of sandhi is denoted by # Letters 푥 + 푦 45 ' where 푥 and 푦 denote the sandhi letters and the symbol ‘+’ 46 # denotes adjacency. The variable 푋 denotes the sequence of In Table 2, x is the jihvāmūliyā and f is the upadhmānīya letters culminating in 푥; the variable 푌 denotes the sequence of mentioned in Table 1; the # symbol stands for ‘ru’ a special letters starting with 푦. The notations 푋 and 푌 are used to depict intermediary form of the semi-vowel r. special conditions that pertain to an entire word or sequence of Any letter of the alphabet is represented in two parts: Part 1 letters involved in the sandhi rule. The letter immediately denotes the category to which a letter belongs (zero-based preceding 푥 and the letter immediately succeeding 푦 are serial number in Table 2), and Part 2 denotes the zero-based denoted respectively as 푢 and 푤 respectively. term number within the series that the letter is or fits into. In The refined schematic developed in this work to represent any letter representation, Part 1 is a binary string of fixed letters of the Sanskrit alphabet is given in Table 2. length 46, in which the set bit denotes the category number, while Part 2 is a binary string of maximum length 6 in which Table 2: Binary representation scheme the set bit indicates which particular letter is being represented. # Letters It is clear that one letter has many representations under this 0 a,ā,i,ī,u,ū,ṛ,ṝ,ḷ,e,ai,o,au scheme. 1 y,r,l,v,yaṁ,vaṁ,laṁ 2 k,kh,g,gh,ṅ,c,ch,j,jh,ñ,ṭ,ṭh,ḍ,ḍh,ṇ,t,th,d,dh,n,p,ph,b,bh,m,ś,ṣ,s,h The first four shaded rows of Table 2 stand for overall 3 ṁ,ḥ,',#,x,f categories, viz. vowels, semi-vowels, consonants and special 4 a characters respectively. One of these four bits have to be set in 5 ā any letter representation. There is no corresponding Part 2 6 u,i value for the bits 0, 1, 2 and 3 of Part 1. 7 ū,ī For simplicity of presentation, rules use the 8 u,i,ṛ,ḷ,a sandhi following notation: 푥 (푛) = 1 indicates that the 푛th bit of Part 푖 9 ū,ī,ṝ,ṝ,ā 푖 of the variable 푥 is set, where 푖 = 1, 2. In the implementation, 10 u,i,ṛ,ḷ the checks for bit set can be done by simply using the XOR 11 ū,ī,ṝ operation. 12 ṛ,ṝ,ḷ 13 o,e IV. SANDHI PROCESSING UNDER THE PĀṆINIAN SYSTEM 14 au,ai 15 o,au,e,ai Each of the eight chapters of Pāṇini’s Aṣṭādhyāyī is divided 16 o,e,ar,al into four parts or pādas. Overall, the work is defined by Pāṇini 17 ār,ār,āl as consisting of two parts, the sapādasaptādhyāyī (the 18 av,āv,ay,āy aphorisms of Chapters 1.1 to 8.1), and the tripādī (the 19 ava aphorisms of Chapters 8.2 to 8.4). 20 v,y,r,l The order in which the rules should be visited was arrived 21 r at in this work after a thorough study of Pāṇini’s aphorisms 22 ṁv,ṁy,r,ṁl with respect to euphonic conjunctions. As a result of the study, 23 ṣ the set of sandhi rules has been split into two in this work: Set 24 s 1, having all the relevant aphorisms of the sapādasaptādhyāyī 25 ś as well as a few specific rules from the tripādī, and Set 2, 26 h having all the remaining relevant aphorisms of the tripādī. 27 ṅ,ṇ,n,ñ,m 28 ṅ,ṇ The order of parsing is as follows: rules in Set 1 are parsed 29 n in reverse order of their Aṣṭādhyāyī order; rules in Set 2 are 30 m parsed in the Aṣṭādhyāyī order itself. (The sūtra number as it 31 k,gh,kh,g,ṅ appears in the Aṣṭādhyāyī is indicated in the algorithm between 32 c,jh,ch,j,ñ,ś double pipe symbols given after the sūtra.)This parsing order is 33 ṭ,ḍh,ṭh,ḍ,ṇ,ṣ adopted so that no rule already parsed has to be parsed again. 34 t,dh,th,d,n,s As such, the flow of the program is just from top to bottom. 35 p,bh,ph,b,m There are exceptions to the above parsing orders in both sets 36 k,ṭ,t,c,p that arise because of certain overruling sūtras that appear 37 kh,ṭh,th,ch,ph earlier / later respectively in the two sets. The ordering is 38 g,ḍ,d,j,b changed to accommodate such rules in such a way as to parse 39 gh,ḍh,dh,jh,bh them before the main rule. 40 ch,ṭh,th,c,ṭ,t Assuming that the rules are ordered in the above manner in 41 k,kh,p,ph the two sets, the following general algorithm for parsing rules 42 x,x,f,f is presented. The is a list of the alternative word-pair 43 ṁ word_list outputs generated, and represents the output at the end of the 44 ḥ algorithm.

66 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 12, No. 8, 2014 Algorithm SandhiRulesParser The algorithm SandhiProcessor processes the rules { pertaining to all the major sandhis in Sanskrit grammar, in while Set 1 rules have not been fully visited accordance with the processing scheme presented in Algorithm { SandhiRulesParser. Set 1 and Set2 sandhis have been Try the next rule in Set 1; incorporated here one below the other and the rules have been if the rule applies codified as per the schema presented above. The vowel sandhis { presented in [4] have been modified in accordance to the Apply the rule and store the output; reduced schema and included here for completeness. if the rule is optional When 푥푖, 푦푖, etc., for i = 1,2, are assigned new values by { setting bits, it is assumed that their initial values are first unset. Add the current word pair to the word_list; Also, if either part of a variable is not set, it is assumed to Continue checking from the next rule for all remain unchanged. Further, it is also assumed that a category word pairs stored in the word_list; change caused by a sandhi will cause an automatic change in } the first four bits of the letter representation and that all bit else representations for the changed letter will be generated { thereafter. Hence, these aspects are not explicitly stated for each rule in this algorithm. Process internal sandhi rules of Set 2; } The speed of processing is increased by going into a rule } only if overall conditions are satisfied. For instance, if the rule } is a vowel sandhi rule, where both x and y are required to be while Set 2 rules have not been fully visited vowels, then the check if 푥1(0) and 푦1(0) are 1 is first made. If { this bit-check is not true for the input words, then a whole set Try the next rule in Set 2; of vowel sandhis is omitted from the parse, thus increasing the if the rule applies efficiency of the algorithm. These overall checks have not been { shown in the algorithm presented below, in order to make the Apply the rule and transform the given words; presentation more simple. if the rule is optional Algorithm SandhiProcessor (푋,푌) { { Add the current word pair to the word_list; //1.svaujasamauṭchaṣṭābhyāṁbhisṅebhyāṁbhyasṅasi Continue checking from the next rule for all //bhyāṁbhyasṅasosāmṅyossup || 4.1.2 || word pairs stored in the word_list; //If there is a visarga (ḥ) at the end of 푋, then the visarga is } //changed to ‘s’. } 푖푓 푥1(44) } { } 푥1(24) = 1; There are a few exceptions that would apply to the general } processing order prescribed by the above algorithm. For example, in Set 1 the output produced by an optionally //2. sasajuṣo ruḥ || 8.2.66 || applying rule does, in certain cases, have to pass through a rule // Common name: visarga-rutva sandhi appearing below and undergo further transformation as a result, //If last letter of 푋 is , then s is replaced by ‘#’ which as it happens in Set 2. Also, it is found that a few rare rules of s Set 2 have to be processed before Set 1. Further, in Set 2, all //stands for the particle ‘ru’, interpreted as ‘r’. rules that form exceptions to a particular rule are stated after it //This rule is incorporated here though it belongs to the by Pāṇini, but clearly, have to be processed before the rule by //sapādasaptādhyāyī . the algorithm. 푖푓 푥1(24) { V. THE SANDHI PROCESSOR x1(46) = 1; The key to symbols used in rule coding and algorithm } specification is as follows: //3. avaṅ sphoṭāyanasya || 6.1.123 || (vowel sandhi)  // means single-line explanatory comment // Common name: avaṅādeśa sandhi  { } are block or set indicators //If the word go is followed by a vowel, then the o of go  ∧ denotes 푎푛푑 //is optionally replaced by ava.  ∨ denotes 표푟 푖푓 (푥1(15) ∧ 푥2(0)) ∧ (푢1(31) ∧ 푢2(3))  ¬ denotes 푛표푡 {  ⨁ denotes 푥표푟 Add X|Y to word_list;  | denotes word concatenation 푥1 (19) = 1; }

67 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 12, No. 8, 2014 //4. haśi ca || 6.1.114 || //and y. //If # (ru) or r at the end of 푋 is preceded by the vowel a //i) If a or ā is followed by eti or edhati, then vṛddhi letter //and followed by aspirate, semi-vowel, nasal, Column3 or //ai replaces both //Column4, then last letter of 푋 is replaced with the vowel //ii) If the preposition pra is followed by eṣa or eṣy, then //‘u’ and shifted to Y to become the first letter of Y //vṛddhi letter ai replaces both 푖푓 (푥 (46) ∨ 푥 (21)) ∧ 푢 (4) ∧ (푦 (1) ∨ 푦 (26) ∨ //iii) If a or ā is followed by ūh, then vṛddhi letter au 1 1 1 1 1 //replaces both 푦 (27) ∨ 푦 (38) ∨ 푦 (39)) 1 1 1 //iv) If preposition is followed by ūḍ , then ṛ { pra h v ddhi //letter au replaces both x (6) = 1; 1 //v) If word sva is followed by īr, then vṛddhi letter ai 푥2 = 푢2; //replaces both Shift 푥 from the end of 푋 to the beginning of 푌; 푖푓 (푥1(4) ∨ 푥1(5)) //푥 is ‘a’ or ‘ā’ } {

푖푓 (푦1(13) ∧ 푦2(1)) //푦 is ‘e’ //5. ato roraplutādaplute || 6.1.113 || { //Common name: visarga-rutva sandhi 푖푓 푌 starts with {et, edhat} //(i) //If # (ru) or r at the end of 푋 is followed and preceded by { //a, then the # or r is replaced with the vowel ‘u’ and shifted delete 푥; //to Y to become the first letter of Y 푦1(14) = 1; 푖푓 (푥1(46) ∨ 푥1(21)) ∧ 푢1(4) ∧ 푦1(4) } { 푒푙푠푒푖푓 푋 = ‘pra’ ∧ 푌 starts with {eṣ, eṣy} //(ii) { 푥1(6) = 1; delete 푥; 푥2 = 푢2; Shift 푥 from the end of 푋 to the beginning of 푌; 푦1(14) = 1; } } } 푒푙푠푒푖푓 (푦 (7) ∧ 푦 (0)) //푦 is ‘ū’ //6. eṅaḥ padāntādati || 6.1.109 || (vowel sandhi) 1 2 { // Common name: pūrvarūpa sandhi 푖푓 푤 (26) // (iii) //If e or o at the end of a word is followed by a, then e or o 1 { //remains, and the avagraha (') replaces a. delete 푥; 푖푓 푥1(13) ∧ 푦1(4) { 푦1(14) = 1; } 푦1(45) = 1; } 푒푙푠푒푖푓 푋 = ‘pra’ ∧ 푌 starts with {ūḍh} //(iv) { delete 푥; //7. ḥ ṇe dīrghaḥ || 6.1.101 || (vowel ) aka savar sandhi 푦 (14) = 1; //Common name: ṇadīrgha sandhi 1 savar } //If one of ṛ or ḷ or their long equivalents ā, ī, ū and ṝ a, i, u, } //is followed by the short or long form of the same letter, 푒푙푠푒푖푓 푋 = ‘ ’ ∧ (푦 (7) ∧ 푦 (1)) ∧ 푤 (21)//(v) //then the corresponding long letter replaces both. sva 1 2 1 { 푖푓 (푥 (8) ∨ 푥 (9)) ∧ (푦 (8) ∨ 푦 (9)) ∧ ¬(푥 ⨁푦 ) 1 1 1 1 2 2 delete 푥; { 푦 (14) = 1; delete 푦; 1 } 푥 (9) = 1; 1 } return 푋|푌; } //10. eṅi pararūpaṁ || 6.1.94 || (vowel sandhi) Common name: pararūpa sandhi //8. omāṅośca || 6.1.95 || (vowel sandhi) // // Common name: pararūpa sandhi //If a or ā at the end of a preposition is followed by e or // , then the or replaces both. //If a or ā is followed by o of the word om or oṁ, then o o e o //replaces both. //Note: The prepositions that qualify are: pra, ava, apa, //upa, parā. 푖푓(푥 (4) ∨ 푥 (5)) ∧ 푌 ∈ {om, oṁ} 1 1 푖푓 푋 ∈ {pra, ava, apa, upa, parā} ∧ 푦 (13) { 1 { delete 푥; delete 푥; } }

//9. etyedhatyūṭhsu || 6.1.89 || (vowel sandhi) //11. upasargādṛti dhātau || 6.1.91 || (vowel sandhi) //Common name: vṛddhi sandhi //Common name: ṛ //For this rule, in all cases the resultant letter replaces x v ddhi sandhi //i) If a or ā at the end of a preposition is followed by ṛ,

68 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 12, No. 8, 2014 //ṝ or ḷ, then vṛddhi letter ār, ār or āl respectively //16. che ca || 6.1.73 || //replaces both. Note: The prepositions that qualify are: //Common name: tugāgama sandhi //pra, parā, apa, ava, upa //If a short vowel is followed by the consonant ch, then //ii) If the word vatsara, kambala, vasana, daśa, ṛṇa is //t is added. //followed by the word ṛṇ , then ṛ letter ār a v ddhi 푖푓 푥1(8) ∧ (푦1(40) ∧ 푦2(0)) //replaces both. { //Note: This rule clashes with 6.1.87 (guṇa sandhi), and z (34) = 1; //takes precedence. 1 z = y ; 푖푓 푋 ∈ {pra, ava, apa, upa, parā} ∧ 푦 (12) 2 2 1 Add z to the end of 푋; { delete 푥; }

푦1(17) = 1; } //17. āṅmāṅośca || 6.1.74 || 푖푓 푋 ∈ {vatsara, kambala, vasana, daśa, ṛṇa} ∧ //Common name: tugāgama sandhi //If the particle ā or word mā is followed by ch, (푦1(12) ∧ 푦2(0)) ∧ 푌 = ‘ṛṇa’ { //then t is added. delete 푥; 푖푓 푋 ∈ {ā, mā} ∧ (푦1(40) ∧ 푦2(0)) 푦1(17) = 1; { } 푧1(34) = 1; z2 = y2; //12. vṛddhireci || 6.1.88 || (vowel sandhi) Add z to the end of 푋; // Common name: vṛddhi sandhi } //If a or ā is followed by e, o, ai or au, then the //corresponding vṛddhi letter ai or au replaces both. //18. dīrghāt || 6.1.75 || 푖푓 (푥1(4) ∨ 푥1(5)) ∧ ((푦1(13) ∨ 푦1(14)) // padāntādvā || 6.1.76 || { //Common name: tugāgama sandhi delete 푥; //If a long vowel is followed by ch, then t is added. 푦 (14) = 1; 1 푖푓 푥1(9) ∧ (푦1(40) ∧ 푦2(0)) } {

푧1(34) = 1; //13. ādguṇaḥ || 6.1.87 || (vowel sandhi) z2 = y2; // uraṇ raparaḥ || 1.1.51 || Add z to the end of 푋; // Common name: guṇa sandhi } //If a or ā is followed by i, ī, u, ū, ṛ, ṝ or ḷ, then the //corresponding guṇa letter e, o, ar or al replaces both. //19. saṁyogāntasya lopaḥ || 8.2.23 || 푖푓 (푥 (4) ∨ 푥 (5)) ∧ ((푦 (10) ∨ 푦 (11)) 1 1 1 1 //If the final consonant of 푋 is preceded by a { delete 푥; //consonant, then the last consonant is dropped. 푖푓 푥1(2) ∧ 푢1(2) 푦1(16) = 1; } { delete 푥; //14. ecoyavāyāvaḥ || 6.1.78 || (vowel sandhi) } // Common name: ayāyāvāvādeśa sandhi //If e, o, ai or au is followed by a vowel, then ay, av, āy, //20. jhalām jaśo'nte || 8.2.39 || //āv replace the first respectively. //Common name: jaśtva sandhi 푖푓 푥1(15) ∧ 푦1(0) //If 푥 is Column1, Column2, Column3, Column4, sibilant { //or aspirate, then 푥 is replaced by the corresponding 푥1(18) = 1; //Column3. } //Note: The rule sasajuṣo ruḥ || 8.2.66 || debars the //application of this rule for words ending in sibilants, and //15. iko yaṇaci || 6.1.77 || (vowel sandhi) //has been incorporated earlier in Set 1 rules itself. // Common name: yaṇādeśa sandhi 푖푓 푥1(36) ∨ 푥1(37) ∨ 푥1(38) ∨ 푥1(39) //If i, ī, u, ū, ṛ, ṝ or ḷ is followed by a vowel, then the { //corresponding semi-vowel ( ) replaces the first. y, v, r, l 푥1(38) = 1; 푖푓 (푥1(10) ∨ 푥1(11)) ∧ 푦1(0) } { ( ) 푥1 20 = 1; //21. pumaḥ khayyampare ||8.3.6|| } // atrānunāsikaḥ pūrvasya tu vā || 8.3.2 ||

//If 푋 is the word ‘pum’ or ‘puṁ’ and is followed by

69 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 12, No. 8, 2014 //Column1 or Column2, which is in turn followed by a } //vowel, semi-vowel or a nasal, then 푥 is replaced by # } //(ru) and the preceding vowel is made nasal using the //anusvāra. //25. kharavasānayorvisarjanīyaḥ || 8.3.15 || 푖푓 푋 ∈ {pum, puṁ} ∧ (y1(36) ∨ y1(37)) ∧ (w1(0) ∨ //If 푥 is # (ru) or r and is followed by a hard consonant, w1(1) ∨ w1(27)) //then 푥 is replaced with visarga. { 푖푓 (푥1(46) ∨ 푥1(21)) ∧ (푦1(36) ∨ 푦1(37) ∨ 푦1(23) ∨ 푥1(43) = 1; 푦1(24) ∨ 푦1(25)) 푧1(46) = 1; { 푥2(0) = 푧2(0) = 1; x1(44) = 1; } }

//22. naśchavyapraśān || 8.3.7 || //26. bhobhagoaghoapūrvasya yo’śi || 8.3.17 || // atrānunāsikaḥ pūrvasya tu vā || 8.3.2 || //If 푥 is # (ru) or r and is preceded by bho, bhago, agho, //Common name: satva sandhi //a or ā, and is followed by a vowel, semi-vowel or soft //If the final n of a word, except for the word praśān, //consonant, then 푥 is replaced by the consonant ‘y’. //followed by ch, ṭh, th, c, ṭ or t which is in turn 푖푓 (푥1(46) ∨ 푥1(21)) ∧ (X – {x}) in {bho, bhago, agho, a, //followed by a vowel, semi-vowel or nasal, then ā} ∧ (푦1(0) ∨ 푦1(1) ∨ 푦1(38) ∨ 푦1(39) ∨ 푦1(26) ∨ //n is replaced with # (ru) and the preceding vowel is 푦1(27)) //made nasal using the anusvāra. { 푖푓 ¬(푋 = ‘praśān’) ∧ (푥1(27) ∧ 푥2(2)) ∧ 푦1(40)) ∧ 푥1(20) = 1; (푤1(0) ∨ 푤1(1) ∨ 푤1(27)) 푥2(1) = 1; { u = x; } 푥1(43) = 1; 푧1(46) = 1; //27. lopaḥ śākalyasya || 8.3.19 || 푥2(0) = 푧2(0) = 1; //If the consonant ‘y’ or ‘v’ is preceded by a or ā and is } //followed by a vowel, semi-vowel or soft consonant, //then the ‘y’ or ‘v’ is dropped. //23. nṝnpe || 8.3.10 || 푖푓 (푥1(20) ∧ (푥2(0) ∨ 푥2(1))) ∧ (푢1(4) ∨ 푢1(5)) ∧ // atrānunāsikaḥ pūrvasya tu vā || 8.3.2 || (푦1(0) ∨ 푦1(1) ∨ 푦1(38) ∨ 푦1(39) ∨ 푦1(26) ∨ 푦1(27)) //If 푋 = ‘nṝn’ and 푦 is the letter ‘p’ then 푥 is { //optionally replaced with # (ru) and the preceding delete 푥; //vowel is made nasal using the anusvāra. } 푖푓 푋 = ‘nṝn’ ∧ (y1(35) ∧ y2(0)) { //28. oto gārgyasya || 8.3.20 || Add X|Y to word_list; //If the consonant ‘y’ is preceded by the vowel ‘o’ and u = x; //followed by a vowel, semi-vowel or soft consonant, 푥1(43) = 1; //then the ‘y’ is dropped. 푧1(46) = 1; 푖푓 (푥1(20) ∧ 푥2(1)) ∧ (푢1(13) ∧ 푢2(0)) ∧ (푦1(0) ∨ 푥2(0) = 푧2(0) = 1; 푦1(1) ∨ 푦1(38) ∨ 푦1(39) ∨ 푦1(26) ∨ 푦1(27)) } { delete 푥; //24. ro ri || 8.3.14 || } // ḍhralope pūrvasya dīrgho’ṇaḥ ||6.3.111|| //If r or # (ru) is followed by r and is preceded by a, i or u, //29. uñi ca pade || 8.3.21 || //then one r is dropped and the short vowel is made long. //If the consonant ‘y’ or ‘v’ is preceded by a or ā and is 푖푓 (푥1(46) ∨ 푥1(21)) ∧ 푦1(21) //followed by the word ‘u’, then the ‘y’ or ‘v’ is dropped. { 푖푓 (푥1(20) ∧ (푥2(0) ∨ 푥2(1))) ∧ (푢1(4) ∨ 푢1(5)) ∧ 푌 = delete 푥; ‘u’ 푖푓 푢1(6) { { delete 푥; 푢1(7) = 1; } } 푖푓푢1(4) //30. hali sarveṣāṁ || 8.3.22 || { //If the consonant ‘y’ is followed by a semi-vowel or 푢1(5) = 1; //consonant, then the ‘y’ is dropped.

70 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 12, No. 8, 2014 푖푓 (푥1(20) ∧ 푥2(1)) ∧ (푦1(1) ∨ 푦1(2)) //35. mo'nusvāraḥ || 8.3.23 || { //Common Name: anusvāra sandhi delete 푥; //If m at the end of a word is followed by any consonant, } //then m is replaced by anusvāra. 푖푓 푥1(30) ∧ 푦1(2) //31. he mapare vā || 8.3.26 || { //i) If m is followed by h at the end of a word which is in 푥1(43) = 1; //turn followed by m, then the first m is optionally changed } //to anusvāra. //ii) If m is followed by h which is in turn followed by //36. ṅaṇoḥ kuk ṭuk śari || 8.3.28 || //consonants ‘y’, ‘l’, or ‘v’, then the m is optionally replaced //If ṅ or ṇ is followed by a sibilant, then k or ṭ is optionally //by the nasal forms of ‘y’, ‘l’, or ‘v’ respectively. //added respectively. 푖푓 푥1(30) ∧ 푦1(26) 푖푓 푥1(28) ∧ (푦1(23) ∨ 푦1(24) ∨ 푦1(25)) { { 푖푓 푤1(30) Add X|Y to word_list; { 푧1(36) = 1; Add X|Y to word_list; 푧2 = 푥2; 푥1(43) = 1; Add 푧 to the end of 푋; } } 푒푙푠푒푖푓 푤1(20) { //37. ḍaḥ si dhuṭ || 8.3.29 || 푥1(22) = 1; //Common name: dhuḍāgama sandhi 푥2 = 푤2; //If ḍ is followed by s, then dh is added optionally } 푖푓 (푥1(38) ∧ 푥2(1)) ∧ 푦1(24) } { Add X|Y to word_list; //32. napare naḥ || 8.3.27 || 푧1(34) = 1; //If m is followed by h at the end of a word which is in turn 푧2 = 푥2; //followed by n, then the m is optionally replaced by n. Add 푧 to the end of 푋; 푖푓 푥1(30) ∧ 푦1(26) ∧ (푤1(27) ∧ 푤2(2)) } { Add X|Y to word_list; //38. naśca || 8.3.30 || 푥 = 푤; //Common name: dhuḍāgama sandhi } //If n is followed by s, then dh is added optionally. 푖푓 (푥1(27) ∧ 푥2(2)) ∧ 푦1(24) //33. naścāpadāntasya jhali || 8.3.24 || { //If n is followed by Column1, Column2, Column3, Add X|Y to word_list; //Column4, sibilant or aspirate not at the end of a word, 푧1(39) = 1; //then the n is replaced by anusvāra. 푧2 = 푥2; 푓표푟 (each letter 푥 in a word and its succeeding letter 푦) Add 푧 to the end of 푋; { } 푖푓 (푥1(29)) ∧ (푦1(36) ∨ 푦1(37) ∨ 푦1(38) ∨ 푦1(39) ∨ 푦1(23) ∨ 푦1(24) ∨ 푦1(25) ∨ 푦1(26)) //39. śi tuk || 8.3.31 || { //Common name: tugāgama sandhi 푥1(43) = 1; //If n is followed by ś, then t is optionally added. } 푖푓 (푥1(27) ∧ 푥2(2)) ∧ 푦1(25) } { Add X|Y to word_list; //34. mo rāji samaḥ kvau || 8.3.25 || 푧1(36) = 1; //If m of the word ‘sam’ or ‘sām’ is followed by a word 푧2 = 푥2; //starting with ‘rāj’ or ‘rāṭ’ or ‘ṛāñ’, then the m remains Add 푧 to the end of 푋; //unchanged. } 푖푓 푋 ∈ {sam, sām} ∧ 푌 ∈ {rāj, rāṭ, ṛāñ} { //40. ṅamo hrasvādaci ṅamuṇnityaṁ ||8.3.32|| //Skip Rule 35 //Common name: ṅamuḍāgama sandhi Continue processing from Rule 36; //If ṅ, ṇ or n is preceded by a short vowel and succeeded by } //a vowel, then the ṅ, ṇ or n gets duplicated.

71 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 12, No. 8, 2014 푖푓 (푥1(27) ∧ (푥2(0) ∨ 푥2(1) ∨ 푥2(2))) ∧ 푢1(8) ∧ 푦1(0) Add X|Y to word_list; { 푥1(24) = 1; Add another 푥 to the end of 푋; } } //47. idudupadhasya cā'pratyayasya || 8.3.41|| //41. śarpare visarjanīyaḥ || 8.3.35 || //If visarga is preceded by ‘i’ or ‘u’ and is at the end of any //If visarga is followed by a hard consonant which is in turn //of niḥ, duḥ, bahiḥ, āviḥ, catuḥ, prāduḥ and is followed by //followed by a sibilant, then the visarga is retained. //a hard guttural or hard labial, then the visarga is replaced 푖푓 푥1(44) ∧ (푦1(36) ∨ 푦1(37) ∨ 푦1(23) ∨ 푦1(24) ∨ //by ṣ. 푦1(25)) ∧ (푤1(23) ∨ 푤1(24) ∨ 푤1(25)) 푖푓 푋 ∈ {niḥ, duḥ, bahiḥ, āviḥ, catuḥ, prāduḥ} ∧ 푦1(41) { { Store result with no change 푥1(23) = 1; } }

//42. vā śari || 8.3.36 || //48. tiraso’nyatarasyām || 8.3.42 || //If visarga is followed by a sibilant, then the visarga is //If the word ‘tiraḥ’ is followed by a hard guttural or hard //optionally retained. //labial, then the visarga is optionally replaced by s. 푖푓 푥1(44) ∧ (푦1(23) ∨ 푦1(24) ∨ 푦1(25)) 푖푓 푋 = ‘tiraḥ’ ∧ 푦1(41) { { Add 푋|Y to word_list; Add 푋|Y to the word_list; delete x; x1(24) = 1; } }

//43. kupvoḥ ⋏k ⋎pau ca || 8.3.37 || //49. dvistriścaturiti kṛtvorthe || 8.3.43 || //If visarga is followed by a hard guttural or hard labial, //If the words dviḥ, triḥ or catuḥ are followed by a hard //then it is replaced optionally by ⋏ (pronounced as at the //guttural or hard labial, then the visarga is optionally //end of ‘kah’) or ⋎ (pronounced as at the end of ‘paf’) //replaced by ṣ. 푖푓 푥1(44) ∧ 푦1(41) 푖푓 푋 ∈ {dviḥ, triḥ, catuḥ} ∧ 푦1(41) { { Add 푋|푌 to word_list; Add 푋|Y to the word_list; 푥1(42) = 1; 푥1(23) = 1; 푥2 = 푦2; } Flag_kupvoh_sutra_fired = true //flag is set } //50. ataḥ kṛkamikaṁsakumbhapātrakuśā //karṇīṣvanavyayasya || 8.3.46 || //44. so’padādau || 8.3.38 || //If visarga is preceded by a and followed by a form of kṛ //If visarga is followed by pāśa, kalpa, ka or kāmya, then //or kam or by the words kaṁsa, kumbha, pātra, kuśā or //the visarga is replaced by s. //karṇī, then the visarga is replaced by s. 푖푓 푥1(44) ∧ 푌 begins with {pāśa, kalpa, ka, kāmya} 푖푓 푥1(44) ∧ 푢1(4) ∧ 푌 begins with {kṛ, kar, kur, kam, kām, { kaṁsa, kumbha, pātra, kuśā, karṇī} 푥1(24) = 1; { } 푥1(24) = 1; } //45. iṇaḥ ṣaḥ || 8.3.39 || //If visarga is preceded by ‘i’ or ‘u’ and followed by pāśa, //51. adhaḥ śirasī pade || 8.3.47 || //kalpa, ka or kāmya, then the visarga is replaced by ṣ. //If the word adhaḥ or śiraḥ is followed by the word ‘pada’, 푖푓 푥1(44) ∧ 푢1(6) ∧ 푌 begins with {pāśa, kalpa, ka, //then the visarga is replaced by s optionally kāmya} 푖푓 푋 ∈ {adhaḥ, śiraḥ} ∧ Y begins with {pad} { { 푥1(23) = 1; Add X|Y to word_list; } 푥1(24) = 1; } //46. namaspurasorgatyoḥ || 8.3.40 || //If ‘namaḥ’ or ‘puraḥ’ is followed by a hard guttural or //Rules 41 to 51 form exceptions to the following rule, Rule //hard labial, then the visarga is replaced by s optionally. //52, and hence have been handled before it. 푖푓 푋 ∈ {namaḥ, puraḥ} ∧ 푦1(41) {

72 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 12, No. 8, 2014 //52. visarjanīyasya saḥ || 8.3.34 || //specific cerebral ṭ is followed by nām, navati or nagarī, or //If visarga is followed by a hard consonant, then s replaces //if a cerebral is followed by a dental, then the dental is //the visarga. //replaced by the corresponding cerebral. if Flag_kupvoh_sutra_fired = false //8.3.37 is not fired 푖푓 (푥1(34) ∧ ¬푥2(5)) ∧ (푦1(33) ) { { 푖푓 푥1(44) ∧ (푦1(36) ∨ 푦1(37) ∨ 푦1(23) ∨ 푦1(24) ∨ 푥1(33) = 1; 푦1(25)) } { 푒푙푠푒푖푓 (푥1(33) ∧ 푥2(0)) ∧ 푌 ∈ {nām, navat, nagar} 푥1(24) = 1; { } 푦1(33) = 1; } } 푒푙푠푒푖푓 푥1(33) ∧ 푦1(34) //53. raṣābhyāṁ no ṇaḥ samānapade || 8.4.1 || { // aṭkupvāṅnumvyavāye'pi || 8.4.2 || 푦1(33) = 1; // padāntasya || 8.4.37 || } //If n is preceded by ṛ, ṝ, r or ṣ within the same word, and a //palatal, cerebral, dental, l, ś or s does not lie between the //56. yaro'nunāsike’nunāsiko vā || 8.4.45 || //two, and n is not the last letter of the word, then n is //Common name: anunāsikā sandhi //replaced by ṇ. //If a consonant other than ‘h’ is followed by a nasal, then 푓표푟 (each letter 푦 in 푋 where 푦 is not the last letter) //the consonant is optionally replaced by the corresponding { //nasal. The rule is obligatory if the second word is 푖푓 (푦1(27) ∧ 푦2(2)) //푦 is ‘n’ //‘mayam’ or ‘mātram’. { 푖푓(푥1(36) ∨ 푥1(37) ∨ 푥1(38) ∨ 푥1(39)) 푖푓 ∃ a letter 푥 in 푋 preceding 푦 where ((푥1(12) ∧ { ¬푥2(2)) ∨ 푥1(21) ∨ 푥1(23)) //푥 is ṛ, ṝ, r or ṣ 푖푓 푦1(27) { { 푖푓 ∄ any letter q between 푥 and 푦 where 푞1(32) Add 푋|Y to the word_list; ∨ 푞1(33) ∨ 푞1(34) ∨ (푞1(20) ∧ 푞2(3)) ∨ 푥1(27) = 1; 푞1(24) ∨ 푞1(25) } { 푒푙푠푒푖푓 푌 ∈ {maya, mātra} y2(1) = 1; { } 푥1(27) = 1; } } } } } //Repeat the above for Y //57. aco rahābhyāṁ dve || 8.4.46 || //If r or h is followed by any consonant other than h and //54. stoḥ ścunāḥ ścuḥ || 8.4.40 || //preceded by a vowel, then the consonant is duplicated // śāt || 8.4.44 || //within a word. //Common name: ścutva sandhi 푓표푟 (each set of consecutive letters u,x,y in X) //If a palatal other than ś is followed by a dental, or a dental { //is followed by a palatal, then the dental is replaced by the 푖푓 (푥1(21) ∨ 푥1(26)) ∧ (푦1(2) ∧ ¬푦1(26)) ∧ 푢1(0) //corresponding palatal. { 푖푓 (푥1(32) ∧ ¬푥2(5)) ∧ (푦1(34)) 푧 = 푦; { Add 푧 after 푥; 푦1(32) = 1; } } } 푒푙푠푒푖푓 푥1(34) ∧ 푦1(32) 푓표푟 (each set of consecutive letters u,x,y in Y) { { 푥1(32) = 1; 푖푓 (푥1(21) ∨ 푥1(26)) ∧ (푦1(2) ∧ ¬푦1(26)) ∧ 푢1(0) } { 푧 = 푦; //55. ṣṭunāḥ ṣṭuḥ || 8.4.41 || Add 푧 after 푥; // na padāntāṭṭoranāṁ || 8.4.42 || } // toḥ ṣi || 8.4.43 || } //Common name: ṣṭutva sandhi //If a dental is followed by a cerebral except ṣ, or if the

73 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 12, No. 8, 2014 //58. anaci ca || 8.4.47 || //60. khari ca || 8.4.55 || // dīrghādācāryāṇām || 8.4.52 || //Common name: cartva sandhi //If any consonant other than h is preceded by a short vowel //If a non-nasal mute or sibilant is followed by a //and followed by anything other than a vowel, then the //hard consonant, then the first letter is replaced by the //consonant is doubled within a word. //corresponding Column1 letter. 푓표푟 (each set of consecutive letters x,y,w in X) 푖푓 (푥1(36) ∨ 푥1(37) ∨ 푥1(38) ∨ 푥1(39) ∨ 푥1(23) ∨ { 푥1(24) ∨ 푥1(25)) ∧ (푦1(36) ∨ 푦1(37) ∨ 푦1(23) ∨ 푖푓 푥1(8) ∧ (푦1(2) ∧ ¬푦1(26)) ∧ (푤1(1) ∨ 푤1(2) ∨ 푦1(24) ∨ 푦1(25)) 푤1(3)) { { 푥1(36) = 1; 푧 = 푦; } Add 푧 after 푥; } //Check internally in each word too } 푓표푟 (each set of consecutive letters x, y in X) 푓표푟 (each set of consecutive letters x,y,w in Y) { { 푖푓 푦1(36) ∨ 푦1(37) ∨ 푦1(23) ∨ 푦1(24) ∨ 푦1(25) 푖푓 푥1(8) ∧ (푦1(2) ∧ ¬푦1(26)) ∧ (푤1(1) ∨ 푤1(2) ∨ { 푤1(3)) 푖푓 (푥1(36) ∨ 푥1(37) ∨ 푥1(38) ∨ 푥1(39)) { { 푧 = 푦; 푥1(36) = 1; Add 푧 after 푥; } } 푒푙푠푒푖푓 푥1(23) ∨ 푥1(24) ∨ 푥1(25) } { 푥1(36) = 1; //59. jhalām jaś jhaśi || 8.4.53 || 푖푓 푥1(23) //Common name: jaśtva sandhi { //If a non-nasal mute, sibilant or aspirate is followed by 푥2(1) = 1; //Column3 or Column4, then the first letter is replaced by } //the corresponding Column3 letter. 푒푙푠푒푖푓 푥1(24) 푓표푟 (each set of consecutive letters x, y in X) { { 푥2(2) = 1; 푖푓 (푦1(38) ∨ 푦1(39)) } { 푒푙푠푒푖푓 푥1(25) 푖푓 푥1(36) ∨ 푥1(37) ∨ 푥1(38) ∨ 푥1(39) ∨ 푥1(26) { { 푥2(3) = 1; 푥1(38) = 1; } } } 푒푙푠푒푖푓 푥1(23) ∨ 푥1(24) ∨ 푥1(25) } { } 푥1(38) = 1; //Repeat the above for Y 푖푓 푥1(23) { //61. anusvārasya yayi parasavarṇaḥ ||8.4.58|| 푥2(1) = 1; //Common name: parasavarṇa sandhi } //If anusvāra is followed by a semi-vowel or a mute, then 푒푙푠푒푖푓 푥1(24) //the anusvāra is replaced by the nasal equivalent of the { //second letter. 푥2(2) = 1; 푖푓 푥1(43) } { 푒푙푠푒푖푓 푥1(25) 푖푓 푦1(20) { { 푥2(3) = 1; 푥1(22) = 1; } 푥2 = 푦2; } } } 푒푙푠푒푖푓 푦1(36) ∨ 푦1(37) ∨ 푦1(38) ∨ 푦1(39) } { //Repeat the above for Y 푥1(27) = 1; 푥2 = 푦2;

74 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 12, No. 8, 2014 } //dropped, within a word } 푓표푟 (each set of consecutive letters u,x,y in X) { //62. torli || 8.4.60 || 푖푓 푢1(1) ∨ 푢1(2) //Common name: parasavarṇa sandhi { //i) If n is followed by l, then n is replaced by nasal l. 푖푓 (푥1(36) ∨ 푥1(37) ∨ 푥1(38) ∨ 푥1(39) ∨ //ii) If a dental other than n and s is followed by l, then 푥1(23) ∨ 푥1(24) ∨ 푥1(25)) ∧ (푥1 == 푦1) //the dental is replaced by l. { 푖푓 (푥1(27) ∧ 푥2(2)) ∧ (푦1(20) ∧ 푦2(3)) Add 푋|푌 to the word_list; { delete 푥; 푥1(22) = 1; } 푥2 = 푦2; } } } 푒푙푠푒푖푓 (푥1(34) ∧ ¬(푥2(4) ∨ 푥2(5))) ∧ (푦1(20) ∧ 푦2(3)) } { // Repeat the above for Y 푥 = 푦; } C. The Search Engine Algorithm SandhiProcessor may be used to generate //63. jhayoho'nyatarasyām || 8.4.62 || alternative forms of a given search word. The following //Common name: pūrvasavarṇa sandhi algorithm is used to generate all possible alternative forms of a //If a non-nasal mute is followed by h, then h is optionally given word, by providing possible word forms before and after //replaced by the Column4 letter corresponding to the non- the word so that sandhi rules get triggered. All these word //nasal mute. forms are searched for in the E-text. 푖푓 (푥 (36) ∨ 푥 (37) ∨ 푥 (38) ∨ 푥 (39)) ∧ 푦 (26) 1 1 1 1 1 Algorithm GenerateAllWordForms (푍) { { Add 푋|Y to the word_list; //Z is the search word. 푦 (39) = 1; 1 //{WordForms} denotes the set of word forms generated by 푦2 = 푥2; //the algorithm and is initially the null set. } Add 푍 to {WordForms};

푋 = 푍; //64. śaścho'ṭ || 8.4.63 || i 푓표푟 (each 푦 in {vowels, semi-vowels, consonants}) //Common name: chatva sandhi { //If a non-nasal mute is followed by ś which is in turn Add SandhiProcessor(푋, 푌) to {WordForms}; //followed by a vowel, semi-vowel or nasal, then ś is } //optionally replaced by . ch 푖푓(푥 (36) ∨ 푥 (37) ∨ 푥 (38) ∨ 푥 (39)) ∧ 푦 (25) ∧ 1 1 1 1 1 푓표푟 (each 푌 ∈ {WordForms}) (푤 (0) ∨ 푤 (1) ∨ 푤 (27)) 1 1 1 { { 푓표푟 (each 푋 in {vowels, semi-vowels, consonants, ḥ, ṁ, Add 푋|Y to the word_list; #}) 푦 (40) = 1; 1 { } Add SandhiProcessor(푋, 푌) to {WordForms}; //65. halo yamāṁ ḥ || 8.4.64 || yami lopa } //If a semi-vowel or nasal is preceded by a consonant and } //followed by the same semi-vowel or nasal letter, then one } //of the duplicate letters is dropped. 푖푓 푢1(2) VI. CONCLUSION { The schema developed in this work presents a simple yet 푖푓 (푥1(20) ∨ 푥1(27)) ∧ 푦 == x unique and efficient method to process the sandhi aphorisms of { Pāṇini. The letter representation scheme is binary, and hence delete 푥; all the checks are implemented as bit-level operations and } simple bit-set and bit-unset operations suffice to carry out the } sandhi transformation. The efficiency is further enhanced by the division of a letter representation into two parts and the //66. jharo jhari savarṇe || 8.4.65 || consequent reduction of the transformation process to a shifting //If a non-nasal mute or sibilant is preceded by a consonant of category. Further, this pattern of solving the sandhi //or semi-vowel and followed by a homogeneous mute or construction problem is unprecedented in the literature. Thus, //sibilant, then one of the duplicate letters is optionally representation schema and the results of the sandhi-processing

75 http://sites.google.com/site/ijcsis/ ISSN 1947-5500 (IJCSIS) International Journal of Computer Science and Information Security, Vol. 12, No. 8, 2014 algorithm represent an efficient computational model to process Sanskrit euphonic conjunctions. It must be mentioned here that some rules such as those with regard to prakṛtibhāva sandhi (non-transformational sandhi) have not been presented above. However, it is clear that their implementation is only an extension of the algorithm presented in this work that does not require any new schema. The representational schema has been reduced further from [4] in this work, since many more sandhi rules have been incorporated here. The optimality of the schema is clear from the simplicity of the rule representation. The final algorithm presented in this work, which uses this sandhi processor for word searches in E-texts is the first of its kind in the literature with regard to Sanskrit. The use of the sandhi processor for searching ensures comprehensiveness of the search, while the efficiency of the sandhi processing method presented in this work ensures that search speeds are not compromised due to the increase in the number of words to be searched for. This was confirmed in the implementation of the algorithm. The algorithms presented in this work have been tested for use with Sanskrit E-text in Devanāgarī script after a conversion engine converted Devanāgarī Unicode to Latin Unicode E-text.

REFERENCES [1] Dīkṣita Bhaṭṭoji, Siddhānta-kaumudī, Translated by Śrīśa Candra Vasu, Volume 1, Motilal Banarsidas Publishers, Delhi, 1962. [2] Vāmana & Jayāditya, Kāśikā, with the subcommentaries of Jinendrabuddhi, Haradatta Miśra and Dr. Jaya Shankar Lal Tripathi, Tara Book Agency, Varanasi, 1984. [3] Rama N., Meenakshi Lakshmanan, A New Computational Schema for Euphonic Conjunctions in Sanskrit Processing, International Journal of Computer Science Issues, Vol. 5, 2009, pp 43-51 (ISSN print: 1694-0814, ISSN online: 1694-0784). [4] Kasmir Raja S. V., Rajitha V., Meenakshi Lakshmanan, A Binary Schema and Computational Algorithms to Process Vowel-based Euphonic Conjunctions for Word Searches, International Journal of Applied Engineering Research, Volume 9,Number 20(2014) pp. 7127-7141 (ISSN print: 0973-4562, ISSN online: 1087-1090). Websites [5] Göttingen Register of Electronic Texts in Indian Languages (GRETIL), gretil.sub.uni- goettingen.de/gretil.htm [6] Sanskrit Documents, sanskritdocuments.org [7] Indology: Resources for Indological Scholarship, indology.info/etexts/archive/ etext [8] TITUS, titus.uni-frankfurt.de/indexe.htm [9] Muktabodha Indological Text Collection & Search Engine, muktalib5.org/digital_library_secure_entry.htm

76 http://sites.google.com/site/ijcsis/ ISSN 1947-5500