
Formal Sanskrit Syntax: A Specification for Programming Language K. Kabi Khanganba, Girish Nath Jha School of Sanskrit and Indic Studies, Jawaharlal Nehru University, New Delhi, India [email protected], [email protected] Abstract selected 6 common primary statements that any general programming language may have. By a The paper discusses the syntax of the statement, we refer to a syntactic unit regardless of primary statements of the Sanskritam, a its computational operations of variable programming language specification based declarations, program executions or evaluations of on natural Sanskrit under a doctoral thesis. By a statement, we mean a syntactic unit Boolean expressions etc. regardless of its computational operations The program statements are translated to the two of variable declarations, program basic types of natural language sentences of executions or evaluations of Boolean assertive and imperative in the present tense, active expressions etc. We have selected six voice. One of the core components in a program common primary statements of declaration, expression is the ‘operator’, which contributes to assignment, inline initialization, if-then- the semantic decision. Different keywords and else, for loop and while loop. The signs are used to indicate the operation of a specification partly overlaps the ideas of statement in semantic analysis. In the whole paper, natural language programming, Controlled Sanskritam refers to the language being specified Natural Language (Kunh, 2013), and while Sanskrit means the very natural Sanskrit. Natural Language subset. The practice and application of structured natural language India has had one of the oldest traditions of set in a discourse are deeply rooted in the linguistics, philosophy, logic, mathematics, and theoretical text tradition of Sanskrit, like many other disciplines which were almost all the sūtra-based disciplines and Navya- written in Sanskrit. The language of Western Nyāya (NN) formal language, etc. The symbolic logic is successfully adopted by a number effort is a kind of continuation and of the modern field of studies, including application of such traditions and their mathematics, computer science, computer techniques in the modern field of Sanskrit programming languages, formal language NLP. grammars, physical science and biology (Bhattacharya, 1990). The Indic logic and other 1 Introduction disciplines follow a linguistic model for their The paper is based on a non-English-based systems instead of a symbolic model. The programming language, called Sanskritam being technique or logic of such a linguistic model, be it 2 developed under a doctoral thesis. Instead of Sūtra or NN (an Indic philosophical school) adopting the syntax of programming languages language, had successfully gained popularity and highly inspired by the symbolic language of logic share among other Indic fields of studies also and mathematics, we have adopted a natural (Bhattacharya, 2006). They are widely accepted, language-based syntax like those of generic used, and taught in school/universities, and there is Sanskrit used to write the Aṣṭādhyāyī 1 (AD), a a good number of active communities. The grammar of the natural Sanskrit itself. We have artificial formal language known as Navya-Nāya- 1 A 5th BCE Sanskrit grammar written in a semi-formal Sometimes it again means an individual Sūtra many of which Sanskrit itself by Panini. constitute the whole Sūtra text. 2 A kind of logically arranged text that deals a discipline or science. Literally it means a string or thread; an aphorism. 72 Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop, pages 72–78 December 4 - 7, 2020. c 2020 Association for Computational Linguistics Bhāṣā or NN language is adopted by other Indic 2 Related Work disciplines to define their domain-specific terms. On the other hand, there is no wonder about the This work is related to formal language Sūtra technique of learning and teaching Sanskrit specification based on natural language for even today among the Sanskrit community. development of general programming languages. However, despite its relevance, usage and living The Inform 7 or Natural Inform is a highly domain community in the present, the Indic linguistic specific programming language based on natural 4 model of logic tool lacks adapting to the modern English for writers of interactive fiction . technical disciplines, unlike the evolution of However, our work focuses on specification of western symbolic language into electronic equivalent natural Sanskrit-based formal machine languages. statements for common primary statements of Sanskrit computational linguistics (SCL) has an general programming languages. Controlled active international research community (Jha, Natural Languages or CNLs are designed by 2010). Apart from this and more importantly, SCL selecting subsets of features like vocabulary, is taught as a post-graduate course or subject at morphology, syntax, semantics and pragmatics many Sanskrit departments in Indian universities from a particular natural language of the designer’s and IITs. There are more than seventeen Sanskrit choice (Kunh, 2013). CNLs have been proposed universities including three central Sanskrit for different applications like knowledge universities and hundreds of Sanskrit departments representations, rule systems, equerry interfaces in many central and state universities in the country and formal specifications. This concept partly where SCL can be introduced and taught. The post- overlaps with the Sanskritam specification that graduate Sanskrit students generally have different adapts the primary features of natural Sanskrit like non-science background like traditional gurukula3 syntax and morphology. education system or arts stream. When the students There is no such specification of Sanskrit or come to master level and opt for SCL, they Sanskrit-based programming language in which suddenly learn primary programming languages even the keywords are borrowed from Sanskrit. which use a number of punctuation marks, However, two Indic disciplines of Grammar mathematical notations and English-based (especially the Paninian School) and NN influence keywords etc. Due to non-multidisciplinary and the Sanskritam specification. inter-disciplinary mode of education system except Two major aspects of Paninian grammar an introduction (to multidisciplinary education concerned with our specification are 1) the system) in the National Education Policy 2020, the language of the grammar, a generic natural students are not familiar with symbolic logic which Sanskrit-based specification mostly formal (Huet, are commonly used in mathematics, computer 2016; Kadvany, 2016), and 2) the logic of the programming languages and general sciences. grammar. A subset out of the standard classical 5 However, by pre-master level, Sanskrit students Sanskrit is used for the AD grammar (Deshpande, are well introduced to formal linguistic model of 1991). The subset is a kind of bootstrapping used logic of traditional Paninian grammar etc. Sanskrit to define the natural Sanskrit itself (Kadvany, students of different traditional Sanskrit disciplines 2016). This subset exhibits most of the primary of sāhitya (literature and literary theory), darśaṇa features of the natural Sanskrit syntax, (philosophy), jyotiṣa (astronomy and calendar) etc. morphology, Sandhi, and Samāsa (compound). It have to study compulsorily and primarily the avoids using finite verbs, frequently uses abstract Panini grammar. They must better adapt a high and verbal nouns, and compounds with oblique level programming language based on such case relations between their members. The parsing traditional formal Sanskrit of sūtra or NN artificial of Sandhi and Samāsa is still in its inception stage language. in Sanskrit NLP, so we skip this feature for the time being in our specification. The AD is a list of rules called sūtra(s), logically arranged and related to each other. Like a program source code, the order of rules and their arrangement matter very much in 3 Traditional Indian institutional system. 5 AD as the primary text along with other auxiliary texts and 4 http://inform7.com/about/ lexicons constitute the Paninian grammatical system. 73 rule interpretation. Out of 7 types 6 of rules complement/s, an optionally omitted copula8, and (Sharma, 2002), the sañjñā (definition) rule finally terminates by a semicolon (;). The copula defines a term or variable before it is used cannot be in a passive or impersonal 9 elsewhere which overlaps the idea of statically construction. Since a complement represents a typed language. Vidhi (executive) rule does all predefined term or data type, it allows the subject types of operations or executions. Huet (2016) to occur before or after its complement highlights the possibility to reconsider the AD as a expression. For instance: high-level program compiled into some low level SAN: vṛddhih ād aic ; machine code. ENG: “ād (vowel ā) aic (diphthongs āi and au) The NN language is a completely generic (are) Vṛddhi.” natural Sanskrit-based formal language that SAN: ad eṅ guṇaḥ ; primarily deals with unambiguously presenting a ENG: “ad (vowel a) eṅ (diphthongs e and o) concept of an object by fully exploiting the (are) Guṇa.” abstraction feature of natural Sanskrit and The predicative expression Vṛddhi precedes its linguistics
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages7 Page
-
File Size-