<<

Formal Syntax: Specification for Programming Language

K. Kabi Khanganba, Girish Nath School of Sanskrit and Indic Studies, Jawaharlal Nehru University, New Delhi, India [email protected], [email protected]

Abstract selected 6 common primary statements that any general programming language may have. By a The paper discusses the syntax of the statement, we refer to a syntactic unit regardless of primary statements of the Sanskritam, a its computational operations of variable programming language specification based declarations, program executions or evaluations of on natural Sanskrit under a doctoral thesis. By a statement, we mean a syntactic unit Boolean expressions etc. regardless of its computational operations The program statements are translated to the two of variable declarations, program basic types of natural language sentences of executions or evaluations of Boolean assertive and imperative in the present tense, active expressions etc. We have selected six voice. One of the core components in a program common primary statements of declaration, expression is the ‘operator’, which contributes to assignment, inline initialization, if-then- the semantic decision. Different keywords and else, for loop and while loop. The signs are used to indicate the operation of a specification partly overlaps the ideas of statement in semantic analysis. In the whole paper, natural language programming, Controlled Sanskritam refers to the language being specified Natural Language (Kunh, 2013), and while Sanskrit means the very natural Sanskrit. Natural Language subset. The practice and application of structured natural language India has had one of the oldest traditions of set in a discourse are deeply rooted in the linguistics, philosophy, logic, mathematics, and theoretical text tradition of Sanskrit, like many other disciplines which were almost all the sūtra-based disciplines and Navya- written in Sanskrit. The language of Western Nyāya (NN) formal language, etc. The symbolic logic is successfully adopted by a number effort is a kind of continuation and of the modern field of studies, including application of such traditions and their mathematics, computer science, computer techniques in the modern field of Sanskrit programming languages, formal language NLP. grammars, physical science and biology (Bhattacharya, 1990). The Indic logic and other 1 Introduction disciplines follow a linguistic model for their The paper is based on a non-English-based systems instead of a symbolic model. The programming language, called Sanskritam being technique or logic of such a linguistic model, be it 2 developed under a doctoral thesis. Instead of Sūtra or NN (an Indic philosophical school) adopting the syntax of programming languages language, had successfully gained popularity and highly inspired by the symbolic language of logic share among other Indic fields of studies also and mathematics, we have adopted a natural (Bhattacharya, 2006). They are widely accepted, language-based syntax like those of generic used, and taught in school/universities, and there is Sanskrit used to write the Aṣṭādhyāyī 1 (AD), a a good number of active communities. The grammar of the natural Sanskrit itself. We have artificial formal language known as Navya-Nāya-

1 A 5th BCE Sanskrit grammar written in a semi-formal Sometimes it again means an individual Sūtra many of which Sanskrit itself by Panini. constitute the whole Sūtra text. 2 A kind of logically arranged text that deals a discipline or science. Literally it means a string or thread; an aphorism. 72 Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop, pages 72–78 December 4 - 7, 2020. c 2020 Association for Computational Linguistics Bhāṣā or NN language is adopted by other Indic 2 Related Work disciplines to define their domain-specific terms. On the other hand, there is no wonder about the This work is related to formal language Sūtra technique of learning and teaching Sanskrit specification based on natural language for even today among the Sanskrit community. development of general programming languages. However, despite its relevance, usage and living The Inform 7 or Natural Inform is a highly domain community in the present, the Indic linguistic specific programming language based on natural 4 model of logic tool lacks adapting to the modern English for writers of interactive fiction . technical disciplines, unlike the evolution of However, our work focuses on specification of western symbolic language into electronic equivalent natural Sanskrit-based formal machine languages. statements for common primary statements of Sanskrit computational linguistics (SCL) has an general programming languages. Controlled active international research community (Jha, Natural Languages or CNLs are designed by 2010). Apart from this and more importantly, SCL selecting subsets of features like vocabulary, is taught as a post-graduate course or subject at morphology, syntax, semantics and pragmatics many Sanskrit departments in Indian universities from a particular natural language of the designer’s and IITs. There are more than seventeen Sanskrit choice (Kunh, 2013). CNLs have been proposed universities including three central Sanskrit for different applications like knowledge universities and hundreds of Sanskrit departments representations, rule systems, equerry interfaces in many central and state universities in the country and formal specifications. This concept partly where SCL can be introduced and taught. The post- overlaps with the Sanskritam specification that graduate Sanskrit students generally have different adapts the primary features of natural Sanskrit like non-science background like traditional gurukula3 syntax and morphology. education system or arts stream. When the students There is no such specification of Sanskrit or come to master level and opt for SCL, they Sanskrit-based programming language in which suddenly learn primary programming languages even the keywords are borrowed from Sanskrit. which use a number of marks, However, two Indic disciplines of Grammar mathematical notations and English-based (especially the Paninian School) and NN influence keywords etc. Due to non-multidisciplinary and the Sanskritam specification. inter-disciplinary mode of education system except Two major aspects of Paninian grammar an introduction (to multidisciplinary education concerned with our specification are 1) the system) in the National Education Policy 2020, the language of the grammar, a generic natural students are not familiar with symbolic logic which Sanskrit-based specification mostly formal (Huet, are commonly used in mathematics, computer 2016; Kadvany, 2016), and 2) the logic of the programming languages and general sciences. grammar. A subset out of the standard classical 5 However, by pre-master level, Sanskrit students Sanskrit is used for the AD grammar (Deshpande, are well introduced to formal linguistic model of 1991). The subset is a kind of bootstrapping used logic of traditional Paninian grammar etc. Sanskrit to define the natural Sanskrit itself (Kadvany, students of different traditional Sanskrit disciplines 2016). This subset exhibits most of the primary of sāhitya (literature and literary theory), darśaṇa features of the natural Sanskrit syntax, (philosophy), jyotiṣa (astronomy and calendar) etc. morphology, Sandhi, and Samāsa (compound). It have to study compulsorily and primarily the avoids using finite verbs, frequently uses abstract Panini grammar. They must better adapt a high and verbal nouns, and compounds with oblique level programming language based on such case relations between their members. The parsing traditional formal Sanskrit of sūtra or NN artificial of Sandhi and Samāsa is still in its inception stage language. in Sanskrit NLP, so we skip this feature for the time being in our specification. The AD is a list of rules called sūtra(s), logically arranged and related to each other. Like a program source code, the order of rules and their arrangement matter very much in

3 Traditional Indian institutional system. 5 AD as the primary text along with other auxiliary texts and 4 http://inform7.com/about/ lexicons constitute the Paninian grammatical system. 73

rule interpretation. Out of 7 types 6 of rules complement/s, an optionally omitted copula8, and (Sharma, 2002), the sañjñā (definition) rule finally terminates by a semicolon (;). The copula defines a term or variable before it is used cannot be in a passive or impersonal 9 elsewhere which overlaps the idea of statically construction. Since a complement represents a typed language. Vidhi (executive) rule does all predefined term or data type, it allows the subject types of operations or executions. Huet (2016) to occur before or after its complement highlights the possibility to reconsider the AD as a expression. For instance: high-level program compiled into some low level SAN: vṛddhih ād aic ; machine code. ENG: “ād ( ā) aic (diphthongs āi and au) The NN language is a completely generic (are) Vṛddhi.” natural Sanskrit-based formal language that SAN: ad eṅ guṇaḥ ; primarily deals with unambiguously presenting a ENG: “ad (vowel a) eṅ (diphthongs and ) concept of an object by fully exploiting the (are) Guṇa.” abstraction feature of natural Sanskrit and The predicative expression Vṛddhi precedes its linguistics theories of Paninian system subject ād (and) aic representing program (Bhattacharya, 2006). Since it is based on natural variables in the first statement while the subject language, a NN statement or sentence may be ad (and) eṅ precedes the complement in the practically cumbersome and resemble a long second statement. The copula is omitted in both version of a symbolic formula. This is more suited of the statements. for object oriented programming specification which we may apply later on. 3.1 The Declaration Statement 3 Formal Sanskrit Declaration statement is optionally a zero copular sentence that consists of a subject NP as the In this section, we will discuss the syntax of six identifier (the name of a variable) and the name of basic statements of Sanskritam specification. We a data type as the subject complement. The copula selected the primary statements of general agrees with the subject NP. The subject NP may programming languages, mostly inspired by Java include one or more singular number nouns (Schildt, 2019). We use the IAST 7 , a Latin separated by a comma to support declaration of alphabet-based transliteration for Sanskrit texts. To more than one variable of the specified type in a represent the formal syntactic structure, we use the single statement. Dual and plural paradigm of grammar notation of ANTLR metalanguage nouns are not presently defined and reserved to specification (Parr, 2013). The Sanskritam represent array variables. The conjunction grammar has its terminal symbols and tokens meaning and can also optionally join two or more defined by its lexical grammar, representing variables. The positions of the NP and the program keywords, the nominal and verbal roots, complement may be interchanged. indeclinable and other required word classes of the declaration : subj (‘,’ subj)+ complement copula? natural Sanskrit. ‘;’ ; ′ 푠푡푎푡푒푚푒푛푡 ∶ 푛푝 푐표푚푝푙푒푚푒푛푡 푐표푝푢푙푎? ; ′ where subj is a noun ending with nominative case ′ ′ | 푐표푚푝푙푒푚푒푛푡 푛푝 푐표푝푢푙푎? ; ; marker. asti, staḥ and santi – the three paradigms The tokens in lower case not closed by single of the copula ‘as’ meaning to be are defined as the inverted commas represent non-terminals. The keywords for the declaration of one, two, and left-hand side of the colon represents a rule name more variables respectively within the same which can have one or more alternate rules statement. separated by bar (|) and terminated by a colon. SAN: chātraḥ al ; A Sanskritam statement is a simple copular ENG: “chātra (is a) String.” sentence that constitutes the NPs (of subject) and SAN: chātraḥ al asti ;

6 The rules in AD are divided into 7 categories on the basis 8 Sanskrit has two basic copula roots called as and bhū. They of their operations. are inflected for tree tenses and seven moods like other 7 https://en.wikipedia.org/wiki/International_Alphabet_of_S common Sanskrit verbal roots. anskrit_Transliteration 9 Sanskrit verb has three inflectional voices: active, passive and impersonal 74

ENG: “chātra is (a) String.” Thus, the following two alternative constructions SAN: chātraḥ, viṣayaḥ al ; are possible: ENG: “chātra, viṣaya (are) String.” SAN: chātrasya sthāne nāma ; SAN: chātraḥ, viṣayaḥ al staḥ ; ENG: “nāma in the place of chātra.” SAN: chātrasya sthāne nāma syāt ; ENG: “chātra, viṣaya are String.” ENG: “nāma must be in the place of chātra.” SAN: chātraḥ, viṣayaḥ ca al ; A string literal defined within double quotes ENG: “chātra and viṣaya (are) String.” can take the role the subject NP followed by the SAN: chātraḥ, viṣayaḥ ca al staḥ ; direct marker iti. ENG: “chātra and viṣaya are String.” SAN: chātrasya sthāne “dīpakaḥ” iti ; ENG: “the “dīpaka” must be in the place of 3.2 Assignment Statement: Assigning a chātra.” Value to a Declared Variable This statement allocates a value to a variable. After the declaration of a variable, a value is The assignment statement constitutes two assigned to it with an assignment statement. In juxtapositional interchangeable NPs: a subject NP general programming languages the equal sign = (NPS) and an NP of locational complement is the key token used to indicate the assignment (NPLC). The imperative paradigms (syāt, syātām, operator. syuḥ) of the copula ‘as’ are optionally omitted. The copula occurs at the end of both of the NPs. 3.3 Declaration and Assignment: (Inline The noun in NLCP represents the assignee, a initialization statement) declared variable. assignment : subj nGenetive copImp? ‘;’ The inline initialization statement is an expression | nGenetive subj copImp? ‘;’ ; of both of the declarative and assignment nGenetive and copImp are a noun ending in statements within a single sentence. This genitive and copula in imperative mood statement is operationally a compound statement. respectively. It consists of two operations: 1. Declaration of a The assignment statement adapts the metarule variable, and 2. Assignment of an initial value to ṣaṣṭhī sthāneyogā 10 from AD. It defines that ṣaṣṭhī or the genitive marker indicates a sthānī or that variable. It contains two clauses separated by a locus, an element that holds another a comma, including one main clause followed by value/element. a relative clause in order. Both of these two SAN: nāma chātrasya ; //or clauses are derived from the very two main SAN: chātrasya nāma syāt ; //or clauses of the declarative and the assignment SAN: nāma chātrasya syāt ; statements. The main clause within an inline ENG: “nāma replaces chātra.” initialization statement is a declarative statement The NPLC can optionally have the keyword ‘sthāne’, locative paradigm of the ‘sthāna’ in itself. The relative clause represents an singular to express “in the place”. For instance, assignment statement introduced by a possessive the AD rule form of the relative pronoun ‘tat’ to meet the SAN: iko yaṇ aci genitive metarules. The antecedent is provided by ENG: “yan (must be) of ik (if) ac follows” the subject NP of the main clause. (literal) inlineInitialisation : assignment rePro subj ‘ca’ ENG: “yan must place ik if ac follows” cop? ; It has the alternative expression SAN: ikaḥ sthāne yaṇ syād11 The relative pronoun agrees with its antecedent in ENG: “yan must be in the place of ik if ac number and gender, the subject NP within the follows.” main clause. In the example below, the relative The Kāśikāvṛtti, a commentary on AD interprets pronoun tasya (of that) establishes an anaphoric that sthāna is synonymous with ‘prasaṅga’ relationship with the antecedent chātraḥ, which is meaning ‘context’. masculine and singular. 12 SAN: sthānaśabdaśca prasaṅgavācī SAN: chātraḥ al asti, tasya nāma ca syāt ; ENG: “sthāna means context”

10 Sthāna means prasaṅga or context. That which constitutes yogo yasyā meaning that which has the context - context is called sthāneyogā. The locative marker in the word prathamāvṛtti, AD 1.1.49. sthāneyogā appears exceptionally even after compounded. 11 Siddhāntakaumudī on AD 6.1.77. The un-compounding form of this compound is sthānena 12 Kāśikāvṛtti on AD 1.1.49. 75

ENG: “chātra is a string, and nāma must replace //conditional statement ------> that.” SAN: yadi 1 krāmāṅkam vyapnoti, chātrasya "raviḥ" ; 3.4 if-then-else Statement ENG: “if 1 pervades krāmāṅka, chātra (must The if-then-else statement is a complex sentence. It hold) “ravi”.” consists of two clauses: 1) the block of if SAN: yadi 1 krāmāṅkam vyāpnoti tarhi conditional clause and 2) the block of main clause. chātrasya "raviḥ"; The if-clause precedes the main clause. An if- ENG: “if 1 pervades krāmāṅka then chātra (must clause expresses and checks one or more hold) “ravi”.” conditions prior to the other one or more We have reserved the metarule of locative consequences. The if-clause begins with the marker which is specified in AD: conditional marking word yadi meaning if. The if- SAN: tasminniti nirdiṣṭe pūrvasya. clause contains one or more affirmative or negative ENG: “A word in the locative case indicates that or both of the copular sentences to express a an operation must happen on an immediate boolean condition or conditions. Two or more previous entity.” conditional sentences are separated by a comma This rule defines a metarule of a conditional and optionally closed by the conjunction ca (and) statement that checks the boolean condition of an just before the introduction of the following main element following another element. This rule clause. The if-clause is immediately followed by a restricts the locative marker to mark a following consequence, the main clause representing element. For instance, x-locative checks the consequences. The tarhi (then), which is the boolean condition of x that comes after a corresponding consequent marker of the yadi (of preceding element. Here is an example rule from the if-clasue) begins the main clause. the AD itself: ifThenElse : ifStatement elseIfStat* elseStat? ‘;’ ; SAN: iko yaṇ aci. ifStatement : ‘yadi’ conditions+ ‘tarhi’ statements* ENG: “yaṇ replaces ik (if) ac follows (ik).” ; The aci (ac-locative) is declined in locative. The conditions : conditon moreConditions* ; moreCondition : (‘,’ condition)+ ‘ca’ ; tasminniti nirdiṣṭe pūrvasya checks the boolean elseIfStat : ‘yadi’ ‘ca’ conditions+ ‘tarhi’ condition of occurrence of ac after ik. statements* ; 3.5 For Loop Statement elseStat : ‘anyaTA’ statements* ; An if phrase expresses that one action or For the for-loop statement, we adopt the basic situation must happen first before the other one syntax of header rule or Adhikāra rule from AD. A will/can happen. The action in the clause header rule in AD is a rule that initiates a scope expresses either: (1) a condition for a singular marking its boundary generally, until a next scope outcome to occur or (2) a recurring situation starts. Morphologically, a header rule constitutes the name of the scope declining in locative case "whenever" with a predictable outcome (in like in the form of x-locative. For instance, Kārake general). means Kāraka-locative meaning in (the scope of) If the condition/s in the first clause is true, then Kāraka (relational dependency). We selected the the statement in the second clause is executed key word Āmreḍite, the locative paradigm of the else, the second clause is bypassed. Here is a piece word Āmreḍita meaning repetition (Williams, of Sanskritam code to show the if-then-else 1872) to mark for-loop. Thus, the for-loop statements: statement consists of a header sentence followed SAN: chātraḥ al ; // al=string declaration by a list of statements and finally closed by the ENG: “chātra (is) string.” direct marker iti. SAN: chātrasya "dīpakaḥ" ; // assignment forStatement : (Amredite | Amreditasya ENG: “chātra (must hold) “dīpakaḥ”.” Avasthayam) forInitializer forTerminator SAN: kramāṅkaḥ saṅkhyā ; // saṅkhyā = int decl. Anupurva '-' statBlock Iti Amreditam? ';' ENG: “krāmāṅka (is an) int.” | forInitializer forTerminator Anupurva '-' SAN: krāmāṅkasya 1 ; // int = 1; statBlock Iti Amredaya ';' ; ENG: “krāmāṅka (must hold) 1.”

76

Symbols initialized by upper case are lexical rules 4 Conclusion and Future Work representing terminal symbols. Tokens within the single inverted commas are terminal symbols We have tried to describe the syntax of each which are to appear in the program exactly as selected Sanskritam statements separately by written. Three examples have some slightly giving the formal rules of each statement. The different optional keywords are given below: English translations of example source code may //for loop : 1st Syntax not match the exact syntax of Sanskritam SAN: āmreḍite vepakāt āvepitam anupūrvaśaḥ - specification. A complete parser of the Sanskritam ENG: “in āmredita from vepaka to vepita one by specification based on a computational formal one” grammar starting from its start symbol will be ………. discussed in future. Later on, we will discuss its “statemets….” implementation in Java environment. SAN: iti; Acknowledgments ENG: “end;” //for loop : 2nd Syntax We are very thankful to the Jawaharlal Nehru SAN: vepakāt āvepitam anupūrvaśaḥ - University for providing research infrastructure. ENG: “from vepaka to vepita one by one -” We specially thank those friends who helped in …………. understanding the traditional logic and Sanskrit “statements……” grammar. SAN: iti āmreḍaya; ENG: “thus repetit (them);” References //for loop : 3rd Syntax Bharati, A., Chaitanya, V., & Sangal, R. (2010). SAN: āmreḍitasya avasthāyāṃ vepakāt Natural Language Processing. New Delhi: PHI prākvepitāt anupūrveṇa – Learning Private Limited. ENG: “in the condition of āmreḍita from vepaka Bhattacharya, Kamaleswar (2006). On the Language to vepita one by one” of Navya-Nyāya: an Experiment with Precision ………….. through a Natural Language. The Journal of Indian “statements….” Philosophy, IIIIV(1/2), 5-13. SAN: iti āmreḍitam; Bhattacharya, Sibajiban (1990). Some Features of the ENG: “thus āmreḍita;” Technical Language of Navya-Nyāya. Philosophy East and West, XX(2), 129-149. 3.6 While Loop Statement Brigg, R. (1985). Knowledge Representation in The while loop also consists of two clauses, one Sanskrit and Artificial Intelligence. AI Magazine, subordinate and one main clause respectively 32-39. representing the two blocks of a common while Deshpande, Madhav M. (1991). Prototypes in Pāṇinian loop statement like in Java. The subordinate clause Syntax. Journal of the American Oriental Society, represents one or more conditional expressions and III(3), 465-480. main clause can consist of one more statements Huet, Gérard (2016). Sanskrit signs and Pāṇinian separated by commas representing the code block Scripts. Sanskrit Computational Linguistics, 53-76. to be evaluated. New Delhi: D.K. Publishers. whileStatement: Yavat conditionStatements Tavat Jha, Girish Nath (2010). Sanskrit Computational statBlock Iti';' ; Linguistics. Germany: Springer Verlag. The keyword Yavat meaning while begins the subordinate clause which is just immediately Kadvany, J. (2016). Pāṇini's Grammar and Modern Computation. History and Philosophy of Logic, followed by the keyword Tavat, a correlative of the 325-346. Yavat begins the main clause. SAN: Yāvat vepitaḥ vepakāt nyūnataraḥ Tāvat Kiparsky, P. (2002). On the Architecture of Pāṇini's ENG: “while vepita (is) lesser from vepaka then” Grammar. Hydrabad: Central Institute of English and Foreign Languages. SAN: vepitaḥ = vepitaḥ + 1; ENG: “vepita = vepita + 1;” Kuhn, Tobias (2013). A Principled Approach to SAN: iti; Grammars for Controlled Natural Languages and ENG: “thus;” Predictive Editors. Journal of Logic, Language, and Information, XXII(3), 33-70. 77

Kuhn, Tobias (2014). A Survey and Classification of Controlled Natural Languages. Computational Linguistics, XL(1). Parr, T. (2013). The Definitive ANTLR 4 Reference. Texas: Pragmatic Bookshelf. Schildt, Herbert (2019). Java: The Complete Reference (11th ed.). New York: McGraw-Hill Education. Sharma, R. N. (2002). The AD of Panini (2nd ed. Vol. 1). New Delhi: Munshiram Manoharlal Publishers. Williams, Monier (1872). A Sanskṛit-English dictionary etymologically and philologically arranged: With special reference to Greek, Latin, Gothic, German, Anglo-Saxon, and other cognate Indo-European languages. Oxford: The Clarendon Press.

78