ICON 2016 Workshop on bridging the gap between computational linguistics tools and management of Sanskrit digital libraries 17 – 20 December 2016, IIT-BHU

edited by AMBA KULKARNI,GERARD´ HUET SRINIVAS VARAKHEDI, and PAWAN GOYAL

28 November 2016 ii Contents

1 Sanskrit Library conventions 1 1 Character encoding ...... 2 2 TEI text markup ...... 6 3 TEI bibliography ...... 7 4 TEI dictionary ...... 8 5 Dictionary headword coordination ...... 9 6 Root lists ...... 11 7 Morphological and syntactic tagging ...... 12 7.1 Lexical tagging ...... 12 7.2 Inflectional tagging ...... 13 7.3 Syntactic tagging ...... 14 8 Meter recognition ...... 15

iii iv 0 P. SCHARF 1

Sanskrit Library conventions of digital representation and annotation of texts, lexica, and manuscripts

PETER M.SCHARF The Sanskrit Library [email protected]

The ICON 2016 workshop on bridging the gap between San- skrit computational linguistics tools and management of Sanskrit digital libraries aims to improve interoperability between software and corpora. The first and most important step in facilitating such interoperability is the explicit description and publication of con- ventions used. Hence in order to foster interoperability, the current paper surveys the best of the conventions developed and used at the Sanskrit Library and the efforts to coordinate them with colleagues in the Sanskrit Computational Linguistics Consortium. These con- ventions include character encoding, text markup, bibliographic annotation, morphological tagging, syntactic tagging, meter recog- nition, and the markup and lexical representation of dictionaries and root lists. This survey makes no claim to being comprehensive but only to contributing to productive interchange at the workshop.

1 2 P. SCHARF

1 Character encoding

Contemporary uses of information technology demand higher standards of encoding than the inherited systems prevalent today. Guided by visual factors, current encoding systems reproduce defi- ciencies inherent in traditional writing systems. The contemporary use of computers for the manipulation of linguistic and textual data demands more relevant information-processing principles. The fundamental issue in encoding natural language texts concerns the relation information selected bears to natural language structure. Selecting appropriate information to encode demands addressing whether to encode written characters or speech sounds, whether to encode segments or features, and what criteria to use to con- trast items selected for encoding. Scharf and Hyman (2011) and Scharf (2014) considered these issues in detail and provided ex- tensive bibliography. To prevent loss of knowledge requires determining what in- formation is essential and devising a lossless method of encod- ing that information while ignoring the extraneous. Efficiency in representing essential information implies the fundamental prin- ciple of character encoding, namely, to avoid ambiguity and re- dundancy. To avoid ambiguity and redundancy requires that an encoding system be characterized by a one-to-one correspondence between characters and items to be encoded, and that all encoded items be of the same kind (.g., phonemes or written characters). Most contemporary encoding systems for Sanskrit designed for use with the computer are based either upon the Indic script structure represented by Devanagar¯ ¯ı or on the Sanskrit Romaniza- tion. The encodings therefore retain features inherent in the scripts themselves and likewise reproduce the following five deficiencies inherent in these scripts: SANSKRIT LIBRARY CONVENTIONS 3

1. In the ¯ standards, there are separate charac- ters for vowels when they appear post-consonantally versus when not preceded by a consonant. In the Roman standards, a single character is used in all contexts.

2. In the Devanagari¯ standards, post-consonantal /a/ is implic- itly indicated by the graph of the preceding consonant, while its absence is explicitly represented by a sign indicating the cessation of speech (virama¯ ). In the Roman standards, the distribution of hai corresponds exactly to the distribution of the vowel /a/.

3. In the Roman standards, certain single sounds are repre- sented by digraphs: the aspirate stops (kh, gh, ch, jh,. th, d. h, th, dh, ph, bh) and the open diphthongs (, ). In the De- vanagari¯ standards, single characters represent each of these segments.

4. In both the Devanagari¯ and Roman standards the aspirate retroflex lateral flap / h/ is represented by a digraph: \h, .lh. 5. An additional discrepancy exists between the encoding of accent in Devanagari¯ script and the encoding in standard Romanization. The Romanization encodes lexical or post- prosodic high pitch and independent circumflex, or deep ac- cent. Devanagari¯ encodes manifest pitch or surface accent. The failure of scholars to recognize the difference has led to confused explanations of Devanagari¯ accentual systems and the obfuscation of genuinely different traditions of recitation and dialects (Scharf 2012).

In items (1), (3), and (4), a single sound is represented by more than one character, and in (2), a sound is inversely represented: that is, the presence of the sound is represented by the absence 4 P. SCHARF of a character, and the absence of the sound by the presence of a character. The departure from the principle of a one-to-one corre- spondence between what is to be represented and the representa- tion signals confusion concerning the principles of encoding. In the digraphs that represent aspirated consonants in Sanskrit, the hhi represents aspiration, a phonetic feature rather than a segment, while by itself hhi represents a phonetic segment. The Devanagari¯ representation of the aspirate retroflex lateral flap / h/ suffers the same fault. Similarly in the digraphs haii and haui that represent open diphthongs in Romanization, the graphs hai, hii, and hui rep- resent subsegments, while independently they represent indepen- dent segments. While the dual use of hhi (and hhi in Devanagari)¯ does not lead to ambiguity because the sequence of consonant plus /h/ does not occur in Sanskrit, the dual use of hii and hui in the Ro- manization is ambiguous. The identical sequence haii in taih. and mana¨ıccha¯, for example, does not itself indicate that it represents a diphthong in the former and a sequence of two independent vow- els in the latter. Similarly the sequence of characters haui repre- sents the sequence of two simple vowels in prauga¨ but represents a diphthong in praud. . Although one could disambiguate the Ro- manization by introducing a diaeresis over the second character to show that the two characters represent two vowels in sequence rather than a diphthong (thus mana¨ıccha¯, prauga¨ ), such a proce- dure introduces redundancy. There would then be two ways of representing the phonetic segments // and //, one with and one without diaeresis. In practical terms, someone searching for the occurrence of the word iccha¯ would have to search a second time for ¨ıccha¯ as well, or software developers would have to accommo- date the anomalous encoding in their search program. Introduction of the diaeresis to remove the ambiguity is a visual patch that re- veals deeper structural problems in the encoding. Digital archives and software that persist in the use of Unicode Devanagari or Unicode representation of the International Alpha- SANSKRIT LIBRARY CONVENTIONS 5 bet of Sanskrit Transliteration (IAST) perpetuate these structural problems. While encodings such as WX address them to a limited extent,1 the Sanskrit Library phonetic encoding schemes address them in a consistent and comprehensive manner. These encoding schemes are capable of showing all the contrastive information in the corpus of Sanskrit texts using a broad concept of phoneme. Un- like Devanagar¯ ¯ı script and the encoding schemes based upon it but like Romanis.ations, the Sanskrit Library basic encoding scheme (SLP1) represents vowels, including the vowel /a/, by unique char- acters, regardless of whether they follow a consonant or not. Un- like most Romanis.ations, SLP1 represents diphthongs and aspi- rated stops by a single character as well. Unlike both Devanagar¯ ¯ı and Romanis.ations, it represents the aspirated retroflex lateral flap by a single character too. Yet because SLP1 is limited to ASCII characters, accents, nasalization and other modifications of basic sound segments are represented by modifier characters that rep- resent features rather than segments. SLP2 is a strictly segmen- tal encoding that represents each segment by a single codepoint. SLP3 is a strictly featural encoding that represents only phonetic features; segments are represented by bundles of features rather than by single codepoints. The Sanskrit Library Phonetic encoding schemes simplify lin- guistic processing. Hence the Sanskrit Library utilizes these en- codings for the storage of a corpus of Sanskrit texts and for lin- guistic processing. We segregate these functions from the func- tions of data-entry and display and employ transcoding software to interface between the functions. Besides simplifying linguis- tic processing, the segregation of functions allows for a high de- gree of flexibility in data-entry and display options thereby permit-

1Like SLP1, WX usually represents a single segment by a single character. The scheme is, however, limited, since it does not have characters that designate long vocalic ¯l, the unaspirated and aspirated retroflex lateral flaps .l and .lh, any system of accents,˚ or other sounds peculiar to Vedic traditions. 6 P. SCHARF ting users to select the display and input method of their choice. Thus we transcode to a variety of Indic scripts and Romaniza- tion in Unicode for display purposes and employ various meta- transliterations, Indic Unicode, as well as clickable input key- boards for data input.

2 TEI text markup

The Text-Encoding Initiative (TEI) has emerged as the world stan- dard for text markup. TEI provides basic XML tags for marking hierarchical divisions, sentences or verses and their parts such as lines and smaller segments, and intermediate chunks of text such as paragraphs, and speeches. A broad array of XML elements and attributes provides the means of coordinating parallel texts, such as a continuous version (samhit˙ a¯), a version with interword phonetic changes analyzed (padapat¯.ha), and a translation; for coordinating text with images, and for specifying the nature of divisions. TEI likewise provides parameters for an elaborate header that can pre- cisely identify the digital edition, its editor, the source of it in a printed edition or manuscript, and specifications of encoding and markup. It is clear that TEI should be adopted as the world stan- dard for creating digital editions of Sanskrit texts. Over the past several decades however, many digital editions of texts have been produced in simple text files with little or no markup. Often markup is in the form of an idiosyncratic use of special characters and alphanumeric line sequence numbers. Texts continue to be produced in this manner due familiarity with prior procedures, ease, and lack of knowledge of TEI. Yet transform- ing texts in any digital format to a standard encoding with TEI markup is feasible with the thoughtful application of regular ex- pressions and simple software. In fact, data-entry and markup are regularly handled in separate steps in any case due to the ease of SANSKRIT LIBRARY CONVENTIONS 7 typing character markup compared with XML tags. Texts are an- alyzed in advance of data-entry to ensure that the use of special character sequences avoid ambiguity and then the resulting prod- uct is converted programmatically to TEI. While the Sanskrit Li- brary has produced a few selected texts with the desired measure of TEI markup, time and attention is required to convert a large archive. Nevertheless, our selected texts could provide a template for large-scale conversion and production.

3 TEI bibliography

Specification of the source of a text is absolutely essential to en- sure that it is a valid witness of the text of which it claims to be an edition. A scholarly critical edition regularly includes a critical introduction that identifies the manuscripts collated and consulted in the production of the edition. A scholarly translation regularly identifies the edition of the text translated. Obviously the editor and the translator are likewise identified. A digital edition should conform to the same high standards. A number of digital texts, however, including much of the large Gretil archive, contain no indication of the editor of the digital text or its source edition. As mentioned in the previous section, TEI provides conventions to supply a digital text with a header that includes a complete bibli- ography. The Sanskrit Library uses the TEI bibliographic markup for all its texts and dictionaries, even where the documents them- selves have yet to be converted to TEI. Scharf (2015) also used TEI bibliographic markup to produce the original version of its bibliography of Sanskrit Syntax. The original was then converted programmatically from TEI to both LaTeX and HTML. The San- skrit Library has produced templates for several different types of bibliographic entries which can easily be filled in and tailored to a specific work. 8 P. SCHARF

4 TEI dictionary

TEI provides dictionary markup conventions. Bringing source files into conformity with these conventions should be the aim of digital lexical resource development. However, prac- tical considerations, namely the sheer volume of information and the few people dedicated to the task, have severely lim- ited the degree of attention given to this end. The San- skrit Library has collaborated with the Cologne Digital San- skrit Dictionary project (CDSL) for about 20 years to upgrade the markup in its original digital editions of Monier-Williams (1899), Bohtlingk and Roth’s Sanskrit Worterbuch¨ and to ex- tend the archive to more than 35 lexical sources. These sources are available online at http://www.sanskrit-lexicon. uni-koeln.de/ and http://sanskritlibrary.org/ integratedDictionaries.html. The CDSL and Sanskrit Library lexical sources were created by data-entry from printed sources and character markup based upon visual characteristics in the printed sources, that is, by different scripts and typefaces used. These visual characteristics hardly capture all of the distinctions necessary to be made in the segments of data in a definition. In monolingual Sanskrit dictionaries in Devanagari, no distinctions in typeface distinguish the various parts of an entry at all, including definitions, examples, text titles, abbreviations, derivations, and citations. In Romanized dictionaries the use of Devanagari versus Roman, and the use of bold and italic distinguish some but not all of these segments of an entry. In general, programmatic conver- sion of character data based on the visual characteristics served to create the databases that serve as the source for HTML display. The initial character data of these editions were programmatically converted to the Sanskrit Library phonetic encoding. Character markup was converted to XML, and the XML file to mySQL and SqlLite databases which serve as the source for programmatic dis- SANSKRIT LIBRARY CONVENTIONS 9 play in HTML. The HTML display is therefore limited to distin- guishing data originally distinguished by the visual characteristics of the printed edition. The first two dictionaries mentioned were subjected to a great deal of additional work, particularly Monier Williams 1899. Cor- rections have systematically been made, supplements integrated, Greek cognates added, abbreviations expanded, compound head- words completed, corresponding feminine forms provided for ad- jectives, and links provided to inflectional software. Huet links his dictionary and Monier Williams to his parser, and the Sanskrit Library links words in sandhi-analyzed texts to a morphological analyzer linked to its integrated dictionary interface. This work required a great deal of systematic human effort coordinated with software development. Additional corrections and extensions to Monier-Williams (1899), Bohtlingk and Roth’s Sanskrit Worter-¨ buch, and Bhtlingk’s (1879–1889) Sanskrit-Wrterbuch¨ in Krzerer¨ Fassung, coordinated by Jim Funderburk, are underway.2

5 Dictionary headword coordination

A task long envisioned towards which we are slowly working is to integrate the dictionary headwords with each other. The task is not a simple one-to-one mapping. Different conventions of sandhi are used, such as with regard to doubling, and anusvara¯ versus the nasal homorganic with the following stop. While many dic- tionaries use stems as headwords, Apte and the Sabdakalpadruma use nominative singulars. Bohtlingk¨ and Roth use roots with gu-

2Corrections at https://github.com/sanskrit-lexicon/ CORRECTIONS/blob/master/dictionaries/MW/MW_ correctionform.txt and extensions at https://github.com/ sanskrit-lexicon/PWK/blob/master/pw_ls/pwbib/pwbib_ new.txt and https://github.com/sanskrit-lexicon/PWK/ issues/24 10 P. SCHARF n.a-grade ar while Monier Williams uses roots with zero-grade of vocalic r.Bohtlingk¨ and Roth includes roots with preverbs as subentries˚ of roots while Monier Williams lists them as separate main entries. Pronouns are handled variously. In order to coordi- nate the massive variation, we conceived of a comprehensive lexi- cal grammar table that coordinates the headwords of all dictionar- ies with a base headword set. Software derives variants from the base headword set using selected rules of derivation, inflection and sandhi. We initially explored using Monier Williams 1899 as the base headword set because of its comprehensive coverage and hi- erarchical organization that relates derivates to roots. However, we quickly dismissed the idea when confronted with several examples of complex handling clearly motivated merely by practical consid- erations. Monier Williams splits the pronoun tad, for example, including also sa (the masculine nominative singular), and lists as separate entries nih. , nis´, and nis. , in addition to the Pan¯ .inian nis, and nir. While as the last example demonstrates, Pan¯ .inian gram- mar also creates some duplication for practical considerations, I decided to build the base of the grammar table from the program- matic implementation of a formalization of Pan¯ .ini’s As..tadhy¯ ay¯ ¯ı. Such a base would carry the hierarchical relationship of forms to a much higher degree by relating derivates to their roots in accor- dance with their Pan¯ .iniian derivation. Rules to derive variant head- words would then be selected from the Pan¯ .ini ruleset. By the time of this workshop I will have completed my comprehensive XML formalization of the As..tadhy¯ ay¯ ¯ı and begun its implementation. Yet the principal has already been tried. Jim Funderburk and I used earlier versions of my XML implementations of Pan¯ .inian rules to coordinate headwords with preverbs in Monier Williams’s dictio- nary with our morphological analyzer and to produce normalized forms of roots from the forms of roots expected by Pan¯ .ini’s rules. SANSKRIT LIBRARY CONVENTIONS 11

6 Root lists

Root lists (dhatup¯ at¯.has), being lexical lists, share some of the issues discussed with regard to dictionary headword coordina- tion. Yet since they are also components of grammatical systems, they display additional modes of variation from each other. Well known, for instance, is the fact that the Pan¯ .inian dhatup¯ at¯.ha lists roots with initial retroflex nasal and sibilant. Although roots with penultimate nasal are uniformly listed with the nasal homorganic with the following stop, Pan¯ .ini requires roots with penultimate nasal to have a dental nasal. In our canonical index of the M˙ a-¯ dhav¯ıya Dhatuvr¯ tti, we derive the forms with normal sandhi from ˚ the canonical forms expected by Pan¯ .ini using a selected set of XML formalizations of Pan¯ .inian rules. In the introduction, Scharf (2009) thoroughly describes the role of various unusual phonetic segments, markers, and accents in the Pan¯ .inian dhatup¯ at¯.ha, the rules that expect them and the rules required to produce normal- ized forms or roots. Not only may roots themselves appear in different forms in various dhatup¯ at¯.has, but their accents, markers and marker ac- cents may differ, and they may appear in different sublists within the main list. Shailaja (2014) describes her comparison of dha-¯ tupat¯.ha commentaries. She and Amba Kulkarni have a web- based concordance at http://sanskrit.uohyd.ac.in/ scl/dhaatupaatha/ Modern root lists include Westergaard (1841), Whitney (1885), and Werba (1997). Scharf and Hyman (2004) produced a digital edition with completion of hyphenated forms and iden- tification of forms. Scharf aims to integrate the work of Werba (1997), Oberlies (2003), and Goto¯ (1990, 1991, 1993, 1997) and others with this in future projects. 12 P. SCHARF

7 Morphological and syntactic tagging

Several levels of categorization and tagging are distinguished. Morphology includes two categories predominantly distinguish- able as derivational morphology and inflectional morphology. In- flectional morphology is concerned with the conjugation of verbs and the declension of nominals while derivational morphology is concerned with the relationship of complex stems to fundamental roots. The two are not neatly distinguished because derivation in one system may be treated as inflection in another and vice versa. For example, the causative, desiderative, and intensive are treated as secondary conjugations of primary roots in European grammars of Sanskrit (see Whitney 1885 for example) while they are sim- ple conjugations of derived roots in Pan¯ .inian grammar. Moreover, a few forms, such as comparatives and superlatives of conjugated verbs (A. 5.3.56), and pronominal and indeclinable forms with the affix akac (A. 5.3.71), add a derivational affix to an inflected form. Syntactic relations, which usually refer to the relations between words, are not neatly distinguished from morphology. For mor- phological subsegments of some words in Sanskrit may be related to other words. Regularly, for instance, Pan¯ .inian cognitive scien- tists such as Kaun.d.abhat.t.a and Nage¯ sa´ relate the action denoted by the root of a verb, rather than the verb as a whole, to the karaka¯ denoted by a nominal while European dependency relations would relate the nominal to the verb as a whole.

7.1 Lexical tagging Aussant (2015b,a) describes schemes of the classification of parts of speech. Parts of speech are predominantly used in European dictionaries for the lexical classification of words. She compared various schemes of classification within Indian linguistic traditions with each other and with European schemes of classification. SANSKRIT LIBRARY CONVENTIONS 13

7.2 Inflectional tagging

The construction of forms in the Pan¯ .inian grammatical system in- troduces terms and affixes in a sequence to denote semantic condi- tions more central to the meaning of a word first. Gender is more intimate to the meaning of a nominal than its karaka¯ and number, since the former is inherent in the case of most nouns and fem- inine affixes precede inflectional terminations. Pan¯ .ini introduces terms for vibhaktis for affixes in sets of three while number distin- guishes individual terminations with these sets. Hence for nomi- nals, the logical order of classification is gender, case, number. In verbal derivation, the root, which denotes the action, is primary. Temporal and modal signs designate the time and attitude toward the action. Roots are classified in ten lists within the Dhatup¯ at¯.ha. L-affixes are conditioned by time and attitude as well as by agent (kartr), direct object (karman), and the action itself (bhava¯ ), which are collectively˚ referred to in modern parlance as prayoga. Only after Pan¯ .ini introduces l-affixes for time, mood, and prayoga, does he replace them with verbal terminations. Terms for verbal termi- nations are introduced in the following sequence: parasmaipada and atmanepada¯ for sets of nine, terms for purus.a in sets of three, and then terms for number within those sets. Hence the logical -¯ n.inian order of conjugational classification is time/mood, prayoga, class, l-affix, pada, person, number. Time/mood and prayoga are semantic and syntactic parameters not always inferable from the morphology alone. In European grammar, information regarding the secondary derivations of causative, desiderative, intensive and denominative forms is included in inflectional identification. In Pan¯ .inian grammar, secondary roots are derived from base roots, listed nominal stems, and padas using the affixes n. ic, san, yan˙, etc. While most of these could be considered derived lexical items rather than conjugational forms, the productivity of these derivates 14 P. SCHARF may make inclusion of this information in the parameters of in- flection in a Pan¯ .inian system practical. Amba Kulkarni, Huet, and I undertook to coordinate our schemes of inflectional tagging following the Semi- nar on Sanskrit syntax and discourse structures, and work- shop on computational Sanskrit syntax, in Paris, in June 2013 (http://sanskritlibrary.org/syntaxParis/ announcement.html). Scharf and Bunker created an XML document that coordinates various schemes of inflec- tional tagging from which is generated a comprehensive HTML table describing and displaying the various tags (http: //sanskritlibrary.org/helpmorphids.html). The XML document is extendable to include parallel tagging schemes in other languages. Using the coordination of tags, we devel- oped a semi-automated morphological tagger with an editing inter- face that worked with a summary of the results of Huet’s Sanskrit reader. A. Ajotikar and Ben-Dor then used this summary edition interface to tag approximately 3000 sentences in Purn¯ .abhadras Pa- nc˜ akhy¯ anaka¯ and prose sections of the Mahabh¯ arata¯ . Hellwig has tagged thousands of forms in the Mahabh¯ arata¯ .

7.3 Syntactic tagging Amba Kulkarni has worked closely with Krishnamacaryulu to de- velop a comprehensive set of dependency tags and has developed dependency tagging software. Goyal developed an interface to al- low dependency information (relation and target) to be added to our morphologically tagged database. A. Ajotikar then added de- pendency relations to a subset to create a small database of syn- tactically tagged sentences. These dependency tags were used in papers delivered at the Seminar on Sanskrit syntax and discourse structures and discussed at the workshop on computational San- skrit syntax. The tags have been refined since then. SANSKRIT LIBRARY CONVENTIONS 15

8 Meter recognition

Melnad, Goyal, and Scharf (2015b,a) review literature concerned with metrical patterns and present Web-based software that ana- lyzes Sanskrit metrical patterns and identifies meters. The soft- ware draws upon a database of more than 600 metrical patterns defined by regular expressions and recognizes full, half, or quar- ter verses as well as multiple verses and metrical patterns em- bedded in prose. An open source web archive of metrically re- lated software and data can be found at https://github. com/shreevatsa/sanskrit with an interface at http:// sanskritmetres.appspot.com/. The author and contrib- utors to this archive and data were unknown at the time and not in- cluded in our literature review. No description of the extent, com- prehensiveness, and effectiveness of the software has been found.

Aussant, Emilie.´ 2015a. “The beginning of descriptive grammati- cal activity: classification of words in some earlier systems of language description and their relation to the Pan¯ .inian classifi- cation of padas.” —. 2015b. “To classify words: European and Indian grammatical approaches,” pp. 219–35. Providence: The Sanskrit Library. Goto,¯ Toshifumi. 1990. “Materialien zu einer Liste altindischer Verbalformen: 1. ami, 2. ay/i, 3. as/s.” Bulletin of the National Museum of Ethnology 15.4: 986–1012. —. 1991. “Materialien zu einer Liste altindischer Verbalformen: 4. dogh/dugh/doh/duh, 5. sav/su, 6. 1savi/su¯, 7. 2(savi/)su¯.” Bul- letin of the National Museum of Ethnology 16.3: 680–707. —. 1993. “Materialien zu einer Liste altindischer Verbalformen: 8. ard/rd, 9. ¯ıs. , 10. uks. , 11. es. /is. , 12. es. i/is. i, 13. ok/oc/uc, 14. - ˚ n. , 15. vaks. /uks. .” Bulletin of the National Museum of Ethnology 18.1: 118–40. 16 P. SCHARF

—. 1997. “Materialien zu einer Liste altindischer Verbalformen: 16. chad, 17. chand/chad, 18. chard/chrd, 19. dagh/dhag, 20. ˚ dves. /dvis. , 21. bandh/badh, 22. 1man, 23. 2man, 24. mna¯, 25. 1yav/yu, 26. 2yav/yu, 27. sani, 28. star/atr, 29. stari/str.” Bul- letin of the National Museum of Ethnology˚ 22.4: 1000–59.˚ Melnad, Keshav, Pawan Goyal, and Peter M. Scharf. 2015a. “Up- dating Meter Identifying Tool (MIT).” —. 2015b. “Identification of meter in Sanskrit verse: selected pa- pers presented at the seminar on Sanskrit syntax and discourse structures, 13–15 June 2013, Universite´ Paris Diderot, with a bibliography of recent research by Hans Henrich Hock,” pp. 325–46. Providence: The Sanskrit Library. Oberlies, Thomas. 2003. A Grammar of Epic Sanskrit. Indian philology and South Asian studies 5. Berlin; New York: Walter de Gruyter. Scharf, Peter M. 2009. Madhav¯ ¯ıya Dhatuvr¯ tti canonical in- ˚ dex. Providence: The Sanskrit Library. URL: http : / / sanskritlibrary.org. —. 2012. “Vedic accent: underlying versus surface.” Deva- datt¯ıyam: Johannes Bronkhorst Felicitation Volume, ed. by Franc¸ois Voegeli, Vincent Eltschinger, Danielle Feller, Maria Piera Candotti, Bogdan Diaconescu, and Malhar Kulkarni, pp. 405–26. Bern: Peter Lang. —. 2014. “Linguistic issues and intelligent technological solutions in encoding Sanskrit.” Document numerique´ 16.3. —. 2015. Sanskrit syntax: selected papers presented at the semi- nar on Sanskrit syntax and discourse structures, 13–15 June 2013, Universite´ Paris Diderot, with a bibliography of recent research by Hans Henrich Hock. Providence: The Sanskrit Li- brary. Scharf, Peter M. and Malcolm D. Hyman, eds. 2004. The roots, verb forms and primary derivatives of the Sanskrit language: a supplement to his Sanskrit grammar. Providence: The San- SANSKRIT LIBRARY CONVENTIONS 17

skrit Library. URL: http://sanskritlibrary.org/ Sanskrit / whitney / index2 . html. [A digital edi- tion with completion of hyphenated forms and identification of forms.] —. 2011. Linguistic issues in encoding Sanskrit. Providence; Delhi: The Sanskrit Library; Motilal Banarsidass. Shailaja, N. 2014. “Comparison of Paninian Dhatuvr¯ ttis.” Ph.D. dissertation. Hyderabad: University of Hyderabad.˚ Werba, Chlodwig H. 1997. Verba Indoarica: die primaren¨ und sekundaren¨ Wurzeln der Sanskrit-Sprache. Vol. Part I. Vienna: Verlag der osterreichischen¨ Akademie der Wissenschaften. Westergaard, N. L. 1841. Radices linguæ sanscritæ: ad decreta grammaticorum definivit atque copia exemplorum exquisitito- rum illustravit. Bonn: H.B. Konig.¨ Whitney, William Dwight. 1885. The roots, verb forms and pri- mary derivatives of the Sanskrit language: a supplement to his Sanskrit grammar. 1st ed. Leipzig: Breitkopf, Hartel;¨ Lon- don: Trubner,¨ and Co. [2d. American Oriental Series 30. New Haven: American Oriental Society, 1945.]