Quick viewing(Text Mode)

3. Standardization

3. Standardization

3. Standardization

3.1 Revision of Unicode Standard-3.0 Code Chart for Devanagari Script Unicode Standards are widely being used by the Industry for the development of Multilingual Softwares. Indian scripts are also included in the Unicode Standards. Unicode consortium had taken the basic inputs for standardisation of Indian scripts from ISCII-1988 document. There are some deficiencies in the present Unicode Standards for Indian Scripts and need to be removed for proper representation of the Indian Scripts. MIT is the voting member of the Unicode Consortium. The ministry has collected inputs for each of the Indian scripts in order to make a single unified presentation of Indian scripts to the Unicode Consortium. A number of meetings starting from November 2000 have been organized with the concerned State Governments/ Organizations/ Experts and Industry on this subject. Based on the discussions / feedback draft Code charts for each of the script have been prepared along-with code details. The first draft proposal of the proposed changes in the existing Unicode Standards for Indian scripts was published in the May 2001 issue of the TDIL Newsletter- VishwaBharat@tdil to get the feedback from experts/ industry / users working in the area of Indian Language Software Development. The news letter was also sent to the members of the Unicode Consortium for their comments and initial response. The draft proposal was discussed in the Unicode Technical Committee (UTC) meeting held in USA in November 2001. The minutes of the UTC Meeting are available in document No. L2/01-430R and L2/ 01-431R of Unicode Consortium. The URL for viewing these documents are as given below: http://tdil.mit.gov.in/newsletter1.htm http://www.unicode.org/L2/L2001/01304- feedback.pdf After getting the feedback from state government / linguists / experts / industry the second draft proposal is also being prepared. The final draft of the Devanagari script is published here for your reference. The whole write-up is divided into three parts: 1. Code chart - For quick reference of the characters and their code value. 2. Code Details - For reference of the character names and annotations. 3. Devanagari:A brief review of the script – This write-up covers various aspects of the script.

Page 26 Devanagari Code Chart Details Consonants 0915 Eò DEVANAGARI LETTER Code Character Description 0916 JÉ DEVANAGARI LETTER Point 0917 MÉ DEVANAGARI LETTER 0901 #Ä DEVANAGARI SIGN 0918 PÉ DEVANAGARI LETTER CANDRABINDU 0919 Ró DEVANAGARI LETTER NGA = anunasika 091A SÉ DEVANAGARI LETTER .0310 combining 091B Uô DEVANAGARI LETTER candrabindu 091C VÉ DEVANAGARI LETTER 0902 DEVANAGARI SIGN 091D ZÉ DEVANAGARI LETTER 091E \É DEVANAGARI LETTER NYA = bindu 091F ]õ DEVANAGARI LETTER TTA 0903 #: DEVANAGARI SIGN 0920 B DEVANAGARI LETTER TTHA 0921 b÷ DEVANAGARI LETTER DDA 0922 fø DEVANAGARI LETTER DDHA Independent 0923 hÉ DEVANAGARI LETTER NNA 0905 + DEVANAGARI LETTER A 0924 iÉ DEVANAGARI LETTER TA 0906 +É DEVANAGARI LETTER AA 0925 lÉ DEVANAGARI LETTER THA 0907 < DEVANAGARI LETTER I 0926 nù DEVANAGARI LETTER DA 0908 <Ç DEVANAGARI LETTER II 0927 vÉ DEVANAGARI LETTER DHA 0909 = DEVANAGARI LETTER 0928 xÉ DEVANAGARI LETTER NA 090A >ð DEVANAGARI LETTER UU 0929 xÉà DEVANAGARI LETTER NNNA 090B @ñ DEVANAGARI LETTER • for transcribing Dravidian VOCALIC R alveolar n 090C Bô DEVANAGARI LETTER ≡ 0928 xÉ 093C #à VOCALIC L 092A {É DEVANAGARI LETTER 090D Bì DEVANAGARI LETTER 092B ¡ò DEVANAGARI LETTER CANDRA 092C ¤É DEVANAGARI LETTER 090E DEVANAGARI LETTER 092D ¦É DEVANAGARI LETTER SHORT E 092E ¨É DEVANAGARI LETTER • for transcribing Dravidian 092F ªÉ DEVANAGARI LETTER short e 0930 ®ú DEVANAGARI LETTER 090F B DEVANAGARI LETTER E 0931 .® DEVANAGARI LETTER RRA 0910 Bä DEVANAGARI LETTER • for transcribing Dravidian 0911 +Éì DEVANAGARI LETTER alveolar r CANDRA 0932 ±É DEVANAGARI LETTER 0912 +Éà DEVANAGARI LETTER 0933 ¤ý DEVANAGARI LETTER LLA SHORT O 0934 ¤Ã DEVANAGARI LETTER LLLA • for transcribing Dravidian • for transcribing Dravidian l short o ≡ 0933 ¤ý 093C #à 0913 +Éä DEVANAGARI LETTER O 0935 ´É DEVANAGARI LETTER 0914 +Éè DEVANAGARI LETTER 0936 ¶É DEVANAGARI LETTER SHA

Page 27 0937 ¹É DEVANAGARI LETTER SIGN CANDRA O SSA 094A DEVANAGARI SHORT O 0938 ºÉ DEVANAGARI LETTER SA • for transcribing Dravidian 0939 ½ DEVANAGARI LETTER vowels 094B #Éä DEVANAGARI 093A # DEVANAGARI INVISIBLE SIGN O LETTER 094C #Éè DEVANAGARI VOWEL Various signs SIGN AU 093C #à DEVANAGARI SIGN NUKTA Various signs • for extending the alphabet 094D # DEVANAGARI SIGN HAL to new letters • suppresses inherent vowel 093D % DEVANAGARI SIGN 094E 094F 093E #É DEVANAGARI VOWEL 0950 $ DEVANAGARI SIGN AA 0951 DEVANAGARI STRESS Dependent vowel signs SIGN UDATTA 093F Ê# DEVANAGARI VOWEL 0952 DEVANAGARI STRESS SIGN SIGN ANUDATTA • stands to the left of the 0953 DEVANAGARI GRAVE consonant ACCENT 0940 #Ò DEVANAGARI VOWEL 0954 DEVANAGARI ACUTE SIGN II ACCENT 0941 #Ö DEVANAGARI VOWEL 0955 DEVANAGARI ANUSWARA • SIGN U Used in Yajurveda 0942 #Ú DEVANAGARI VOWEL 0956 DEVANAGARI SIGN SIGN UU YAJURVEDIC ANUSWARA • 0943 #Þ DEVANAGARI VOWEL Used in Sanskrit SIGN VOCALIC R 0957 DEVANAGARI JIVHAMULIYA • 0944 DEVANAGARI VOWEL Used in Sanskrit SIGN VOCALIC RR Additional consonants (Their use should be avoided) 0945 #ì DEVANAGARI VOWEL 0958 EÃò DEVANAGARI LETTER QA SIGN CANDRA E ≡0915 Eò 093C #à = candra 0959 JÉà DEVANAGARI LETTER 0946 #à DEVANAGARI VOWEL KHHA SIGN SHORT E ≡0916 JÉ 093C #à • for transcribing Dravidian 095A MÉà DEVANAGARI LETTER vowels GHHA 0947 #ä DEVANAGARI VOWEL ≡0917 MÉ 093C #à SIGN E 095B VÃÉ DEVANAGARI LETTER ZA 0948 #è DEVANAGARI VOWEL ≡091C VÉ 093C #à SIGN AI 095C c÷ DEVANAGARI LETTER 0949 #Éì DEVANAGARI VOWEL DDDHA

Page 28 ≡0921 b÷ 093C #à RENCY SIGN 095D gø DEVANAGARI LETTER 0972 IÉ DEVANAGARI LETTER KSHA RHA 0973 YÉ DEVANAGARI LETTER ≡0922 fø 093C #à GNYA 095E ¡Ãà DEVANAGARI LETTER FA 0974 ¸É DEVANAGARI SIGN ≡092B ¡ò 093C #à SHRA 095F ªÉà DEVANAGARI LETTER YYA 0975 #Ç DEVANAGARI SIGN ≡092F ªÉ 093C #à REPH Generic additions 0976 ¨Ì LETTER SHA Used in 0960 DEVANAGARI LETTER Marathi VOCALIC RR • ¶É (0936) ≡ ¨Ì (0976) 0961 Cô DEVANAGARI LETTER 0977 ¡ô DEVANAGARI LETTER VOCALIC LL LA Used in Marathi 0962 #à DEVANAGARI VOWEL • ±É (0932) ≡ ¡ô (0977) SIGN VOCALIC L 0978 DEVANAGARI SIGN/ 0963 #Ä DEVANAGARI VOWEL SOFT RA/ SIGN VOCALIC LL •Used in MARATHI as 0964 * DEVANAGARI PURNA consonant modifier 0979 DEVANAGARI CONSO- = phrase separator NANT 0965 ** DEVANAGARI DEERGH • Used for SINDHI implo- VIRAM sive placed just below the Digits consonant 0966 0 DEVANAGARI DIGIT ZERO 097A DEVANAGARI CONSO- 0967 1 DEVANAGARI DIGIT ONE NANT • • Shape 1 is also used in Used for SINDHI implo- 0968 2 DEVANAGARI DIGIT TWO sive placed just below the 0969 3 DEVANAGARI DIGIT THREE consonant 096A 4 DEVANAGARI DIGIT FOUR 097B DEVANAGARI CONSO- 096B 5 DEVANAGARI DIGIT FIVE NANT • • Shape 5 is also used in Hindi Used for SINDHI implo- 096C 6 DEVANAGARI DIGIT SIX sive placed just below the 096D 7 DEVANAGARI DIGIT SEVEN consonant 096E 8 DEVANAGARI DIGIT EIGHT 097C DEVANAGARI CONSO- • Shape 8 is also used in Hindi NANT • 096F 9 DEVANAGARI DIGIT NINE Used for SINDHI implo- • Shape 9 is also used in Hindi sive placed just below the consonant Devanagari-specific additions 0970 0 DEVANAGARI ABBRE- VIATION SIGN 0971 ¯û0 DEVANAGARI CUR-

Page 29 Explanations for Revised Devanagari Code Chart in Indian Standard (IS)13194:1991. This new version partially modified the layout and repertoire Devanagari : U +0900-U +097F of the ISCII-1988 standard. Because of these events, The Devanagari script is used for writing classical the Unicode Standard does not precisely follow the Sanskrit and its modern historical derivative, Hindi. layout of the current version of ISCII. Nevertheless, Extensions to Devanagari are used to write other the Unicode Standard remains a superset of the related languages of India (such as Marathi, Konkani, ISCII-1991 repertoire except for a number of new Sindhi and Sanskrit) and of Nepal (Nepali). In addition, Vedic extension characters defined in IS 13194:1991 the Devanagari script is used to write the dialects of Annex G - Extended Character Set for Vedic. Hindi and various other regional & tribal languages. Modern, non-Vedic texts encoded with ISCII-1991 may be automatically converted to Unicode code All other Indic scripts including Nandi Nagari, as values and back to their original encoding without well as the Sinhala script of Sri Lanka, the Tibetan loss of information. script, and the Southeast Asian scripts (Thai, Lao, Khmer, and Myanmar), are historically connected Encoding Principles : The writing systems that with the Devanagari script as descendants of the employ Devanagari and other Indic scripts constitute ancient . The entire family of scripts a cross between syllabic writing systems and shares a large number of structural features. phonemic writing systems (alphabets). The effective unit of these writing systems is the orthographic The principles of the Indic scripts are covered in syllable, consisting of a consonant and vowel (CV) some detail in this introduction to the Devanagari core and, optionally, one or more preceding script. The remaining introductions to the Indic consonants, with a canonical structure of ((C) C) scripts are abbreviated but highlight any differences CV. The orthographic syllable need not correspond from Devanagari where appropriate. exactly with a phonological syllable, especially when Standards : The Devanagari block of the Unicode a consonant cluster is involved, but the writing Standard is based on ISCII-1988 (Indian Standard system is built on phonological principles and tends Code for Information Interchange). The ISCII to correspond quite closely to pronunciation. standard of 1988 differs from and is an update of The orthographic syllable is built up of alphabetic earlier ISCII standards issued in 1983 and 1986. pieces, the actual letters of the Devanagari script. The Unicode Standard encodes Devanagari These pieces consist of three distinct character types: characters in the same relative position as those coded consonant letters with inherent vowel /a/, pure in positions A0-F416 in the ISCII-1988 standard. consonant, independent vowels, and dependent The same character code layout is followed for eight vowel signs. In a text sequence, these characters are other Indic scripts in the Unicode Standard: Bengali, stored in logical (phonetic) order. Gurmukhi, Gujarati, Oriya, Tamil, Telugu, Kannada, and Malayalam. This parallel code layout Principles of the Script emphasizes the structural similarities of the Brahmi Rendering Devanagari Characters : Devanagari script and follows the stated intention of the Indian characters, like characters from many other scripts, coding standards to enable one-to-one mappings can combine or change shape depending on their between analogous coding positions in different context. A character’s appearance is affected by its scripts in the family. Sinhala, Thai, Lao, Khmer, and ordering with respect to other characters, the font Myanmar depart to a greater extent from the used to render the character, and the application or Devanagari structural pattern, so the Unicode system environment. These variables can cause the Standard does not attempt to provide any direct appearance of Devanagari characters to differ from mappings for these scripts to the Devanagari order. their nominal glyphs (used in the code charts). In November 1991, at the time The Unicode Additionally, a few Devanagari characters cause a Standard, Version 1.0, was published, the Bureau of change in the order of the displayed characters. This Indian Standards published a new version of ISCII reordering is not commonly seen in non-Indic scripts

Page 30 and occurs independently of any bidirectional depicted in combination with a base letterform. A character reordering that might be required. single consonant, or a consonant cluster, may have a dependent vowel applied to it to indicate the vowel Consonant Letters : Each consonant letter represents quality of the syllable, when it is different from the a single consonantal sound but also has the peculiarity inherent vowel. Explicit appearance of a dependent of having an inherent vowel, generally the short vowel in a syllable overrides the inherent vowel of a vowel /a/ in Devanagari and the other Indic scripts. single consonant letter. Thus U+0915 DEVANAGARI LETTER KA represents not just /k/ but also /ka/. In the presence The greatest variation among different Indic scripts of a dependent vowel, however, the inherent vowel is found in the way that the dependent vowels are associated with a consonant letter is overridden by applied to base letterforms. Devanagari has a the dependent vowel. collection of nonspacing dependent vowel signs that may appear above or below a consonant letter, as Consonant letters may also be rendered as half-forms, well as spacing dependent vowel signs that may occur which are presentation forms used to depict the initial to the right or to the left of a consonant letter or consonant in consonant clusters. These half-forms consonant cluster. Other Indic scripts generally have do not have an inherent vowel. Their rendered forms one or more of these forms, but what is a nonspacing in Devanagari often resemble the full consonant but mark in one script may be a spacing mark in another. are missing the vertical stem, which marks a syllabic Also, some of the Indic scripts have single dependent core. (The stem glyph is graphically and historically vowels that are indicated by two or more glyph related to the sign denoting the inherent /a/ vowel.) components and those glyph components may Some Devanagari consonant letters have alternative surround a consonant letter both to the left and right presentation forms whose choice depends upon or may occur both above and below it. neighboring consonants. This variability is especially The Devanagari script has only one character notable for U+0930 DEVANAGARI LETTER RA, denoting a left-side dependent vowel sign: U+093F which has numerous different forms, both as the DEVANAGARI VOWEL SIGN I. Other Indic initial element and as the final element of a consonant scripts either have no such vowel (Telugu and cluster. Only the nominal forms, rather than the Kannada) or include as many as three of these signs contextual alternatives, are depicted in the code chart. (Bengali, Tamil, Malayalam). The traditional Sanskrit / Devanagari alphabetic A one-to-one correspondence exists between the encoding order for consonants follows articulatory independent vowels and the dependent vowel signs. phonetic principles, starting with pre velar Independent vowels are sometimes represented by a consonants and moving forward to bilabial sequence consisting of the independent form of the consonants, followed by liquids, semi vowels and vowel /a/ followed by a dependent vowel sign. For then fricatives, sibilents (Ushma). ISCII and the example Figure 9.1 illustrates this relationship (see Unicode standard both observe this traditional order. the notation formally described in the “Rules for Independent Vowel Letters : The independent Rendering” later in this section). vowels in Devanagari are letters that stand on their Figure 9.1 : Dependent Versus Independent Vowels own. The writing system treats independent vowels as orthographic CV syllables in which the consonant /a/ + Dependent Vowel Independent Vowel is null. The independent vowel letters are used to An + Ivs Ivs + An =In write syllables that start with a vowel. + + Ê# Ê+ = <

Dependent Vowel Signs (Matras) : The dependent An + Uvs An + Uvs =Un vowels serve as the common manner of writing + + #Ö +Ö = = noninherent vowels and are generally referred to as vowel signs, or as Matras in Sanskrit. The dependent The combination of the independent form of the vowels do not stand alone; rather, they are visibly default vowel /a/ (in the Devanagari script, U+0905

Page 31 DEVANAGARI LETTER A) with a dependent Under normal circumstances, a consonant cluster is vowel sign may be viewed as an alternative spelling depicted with a conjunct glyph if such a glyph is of the phonetic information normally represented available in the current font(s). In the absence of a by an isolated independent vowel form. However, conjunct glyph, the one or more dead consonants these two representations should not be considered that form part of the cluster are depicted using half- equivalent for the purposes of rendering. Higher- form glyphs. In the absence of half-form glyphs, the level text processes may choose to consider these dead consonants are depicted using the nominal alternative spellings equivalent in terms of consonant forms combined with visible hal signs (see information content, but such an equivalence is not Figure 9.3). stipulated by this standard. Figure 9.3 : Conjunct Formations Hal Sign : Devanagari and other Indic scripts employ (1) GA + DHA GA + DHA a sign known as the hal sign (representing consonant), d I h n or vowel omission sign. A hal sign (for example, MÉ + vÉ MvÉ U+094D DEVANAGARI SIGN HAL) normally (2) KA + KA K.KA serves to cancel (or kill) the inherent vowel of the d I n consonant to which it is applied. The hal functions EÂò + Eò Gò as a combining character, with its shape varying from A number of types of conjunct formations appear in script to script. When a consonant has lost its these examples: (1) a half-form of GA in its inherent vowel by the application of hal, it is known combination with the full form of DHA; (2) a as a dead consonant; in contrast, a live consonant is vertical conjunct K.KA. one that retains its inherent vowel or is written with an explicit dependent vowel sign. In the Unicode A well-designed Indic script font may contain Standard, a dead, consonant is defined as a sequence hundreds of conjunct glyphs, but they are not consisting of a consonant letter followed by a hal encoded as Unicode characters because they are the sign. The default rendering for a dead consonant is result of ligation of distinct letters. Indic script to position the hal as a combining mark bound to rendering software must be able to map appropriate the consonant letterform. combinations of characters in context to the appropriate conjunct glyphs in fonts. For example, if Cn denotes the nominal form of Explicit Hal : Normally a hal sign serves to create dead consonant C and Cd denotes the dead consonant form, then a dead consonant is encoded as shown in consonants that are, in turn, combined with subsequent Figure 9.2. consonants to form conjuncts. This behavior usually results in a hal sign not being depicted visually. Figure 9.2 : Dead Consonants Occasionally, however, this default behavior is not desired when a dead consonant should be excluded from TAn + HALn TAd iÉ + # iÉ conjunct formation, in which case the hal sign is visibly rendered. To accomplish this goal, the Unicode Standard Consonant Conjuncts : The Indic scripts are noted adopts the convention of placing the character U+200C for a large number of consonant conjunct forms that ZERO WIDTH NON-JOINER immediately after the serve as orthographic abbreviations (ligatures) of two encoded dead consonant that is to be excluded from or more adjacent letterforms. This abbreviation takes conjunct formation. In this case, the hal sign is always place only in the context of a consonant cluster. An depicted as appropriate for the consonant to which it is orthographic consonant cluster is defined as a attached. sequence of characters that represents one or more KA + ZWNJ + SSHA KA + SSHA dead consonants (denoted C ) followed by a normal, d l d n d EÂò + ZWNJ + ¹É EÂò¹É live consonant letter (denoted Cl) or an independent vowel having an inherent vowel or a vowel sign Explicit Half-Consonants : When a dead consonant representing other vowel. participates in forming a conjunct, the dead

Page 32 consonant form is often absorbed into the conjunct Figure 9.7 : Consonant Forms form, such that it is no longer distinctly visible. In other contexts, however, the dead consonant may Eò Eò KAI remain visible as a half-consonant form. In general, Eò + #Â EÂò KAd a half-consonant form is distinguished from the Eò + #Â + ZWJ C KA nominal consonant form by the loss of its inherent h vowel stem, a vertical stem appearing to the right Rendering side of the consonant form. In other cases, the vertical Rules for Rendering : The following provides more stem remains but some part of its right-side geometry formal and detailed rules for minimal rendering of is missing. Devanagari as part of a plain text sequence. It In certain cases, it is desirable to prevent a dead describes the mapping between Unicode characters consonant from assuming full conjunct formation and the glyphs in a Devanagari font. It also describes yet still not appear with an explicit hal. In these cases, the combining and ordering of those glyphs. the half-form of the consonant is used. To explicitly These rules provide minimal requirements for legibly encode a half-consonant form, the Unicode Standard rendering interchanged Devanagari text. As with any adopts the convention of placing the character script, a more complex procedure can add rendering U+200D ZERO WIDTH JOINER immediately characteristics, depending on the font and after the encoded dead consonant. The ZERO application. WIDTH JOINER denotes a nonvisible letter that presents linking or cursive joining behavior on either It is important to emphasize that in a font that is side (that is, to the previous or following letter). capable of rendering Devanagari, the set of glyphs is Therefore, in the present context, the ZERO greater than the number of Devanagari Unicode WIDTH JOINER may be considered to present a characters. context to which a preceding dead consonant may Notation : In the next set of rules, the following join so as to create the half-form of the consonant. notation applies:

For example, if Ch denotes the half-form glyph of Cn Nominal glyph form of consonant C as it consonant C, then a half-consonant form is encoded appears in the code charts. as shown in Figure 9.5. C A live consonant, depicted identically to C . Figure 9.5 : Half-Consonants l n Cd Glyph depicting the dead consonant form of KAd + ZWJ + SSHAI KAh + SSHAn consonant C. EÂò + ZWJ + ¹É C¹É Ch Glyph depicting the half-consonant form of This encoding of half-consonant forms also applies consonant C. in the absence of a base letterform. That is, this Ln Nominal glyph form of a conjunct ligature technique may also be used to encode independent consisting of two or more component half-forms, as shown in Figure 9-6. consonants. A conjunct ligature composed of

Figure 9.6 : Independent Half-Forms two consonants X and Y is also denoted X.Yn.

GA + ZWJ GA RAsub A nonspacing combining mark glyph form d h of the U+0930 DEVANAGARI LETTER MÉÂ + ZWJ M RA positioned below or attached to the lower Consonant Forms. In summary, each consonant may part of a base glyph form. be encoded such that it denotes a live consonant, a Vvs Glyph depicting the dependent vowel sign dead consonant that may be absorbed into a form of a vowel V. conjunct, or the half-form of a dead consonant (see Figure 9.7). HAL The nominal glyph form nonspacing

Page 33 combining mark depicting U+094D R3 If a dead consonant (other than RAd) precedes

DEVANAGARI SIGN HAL. RAd then subsitution of RA for RAsub is performed as described above; however, the Hal • A HAL character is not always depicted; when that formed RA remains so as to form a dead it is depicted, it adopts this nonspacing mark d consonant conjunct form. form. TA +RA TA +RA +HAL T.RA Dead Consonant Rule : The following rule d d n sub n d logically precedes the application of any other rule iÉ + ®Â iÉ + #Å + # jÉ to form a dead consonant. Once formed, a A dead consonant conjunct form that contains dead consonant may be subject to other rules an absorbed RA may subsequently combine described next. d to form a multipart conjunct form. Rl When a consonant C precedes a Hal n n T.RA +YA T.R.YA it is considered to be a dead consonant d l n jÉ + ªÉ jªÉ Cd. A consonant Cn that does not

precede Haln is considered to be a live Modifier Mark Rules : In addition to vowel signs,

consonant CI. three other types of combining marks may be applied TA +Hal TA to a component of an orthographic (visual) syllable n n d or to the syllable as a whole: nukta, bindus, and + iÉ #Â iÉÂ svaras. Consonant RA Rules : The character U+0930 R4 The nukta sign, which modifies a consonant DEVANAGARI LETTER RA takes one of a number form, is placed immediately after the consonant of visual forms depending on its context in a in the memory representation and is attached consonant cluster. By default, this letter is depicted to that consonant in rendering. If the consonant with its nominal glyph form (as shown in the code represents a dead consonant, then NUKTA charts). In two contexts, it is depicted using a should precede Hal in the memory nonspacing glyph form that combines with a base representation. letterform.

KAn + NUKTAn +Hal QAd R1 Except for the dead consonant RAd, when a dead Eò + #Ã + #Â FÂò consonant Cd precedes the live consonant RAI, then C is replaced with its nominal form C d n R5 The other modifying marks, bindus and svaras, and RA is replaced by the subscript nonspacing apply to the orthographic syllable as a whole mark RA which is positioned so that it sub, and should follow (in the memory applies to C . n representation) all other characters that constitute the syllable. In particular, the bindus THAd +RAI THAn+RAsub Displayed Output should follow any vowel signs, and the svaras should come last. The relative placement of BÂ + ®ú B + #Å BÅ these marks is horizontal rather than vertical; the horizontal rendering order may vary R2 For certain consonants, the mark RAsub may graphically combine with the consonant to form according to typographic concerns.

a conjunct ligature form. These combinations, KAn +AAvs + CANDRABINDUn such as the one shown here, are further Eò + #É + #Ä EòÉÄ addressed by the ligature rules described shortly. Ligature Rules : Subsequent to the application of the rules just described, a set of rules governing PHAd +RAl PHAn +RAsub Displayed Output ligature formation apply. The precise application of ¡Âò + ®¡ò+#Å £ò these rules depends on the availability of glyphs in the current font(s) being used to display the text.

Page 34 R6 If a dead consonant immediately precedes by a vowel sign V in the memory representation. another dead consonant or a live consonant, This order is employed by the ISCII standard and then the first dead consonant may join the corresponds with both the phonetic and keying order subsequent element to form a two-part conjunct of textual data. ligature form. Rendering Order JA + NYA J.NYA d l n Character Order Glyphh Order VÉÂ + \É YÉ KAn + Ivs Ivs +KAn TTAd + TTHAl + TT.TTHAn ]Â + B _ Eò + Ê# ÊEò R7 A conjunct ligature form can itself behave as a Because Devanagari and other lndic scripts have some dead consonant and enter into further, more dependent vowels that must be depicted to the left complex ligatures. side of their consonant letter, the software that renders the Indic scripts must be able to reorder

SAd +TAd+RAn SAd+T.R.An+S.T.RAn elements in mapping from the logical (character) ºÉ + iÉ + ® ºÉ + jÉ + ºjÉ store to the presentational (glyph) rendering. For example, if C denotes the nominal form of A conjunct ligature form can also produce a half- n consonant C, and Vvs denotes a left-side dependent form. vowel sign form of vowel V, then a reordering of glyphs with respect to encoded characters occurs as T.R.Ad +YAl T.R.Ah +YAn jÉ + ªÉ jªÉ just shown. R10 When the dependent vowel I is used to R8 If a nominal consonant or conjunct ligature vs override the inherent vowel of a syllable, it is form precedes RA , then the consonant or sub always written to the extreme left of the ligature form may join with RA to form a sub orthographic syllable. If the orthographic multipart conjunct ligature. syllable contains a consonant cluster, then this

KAn +RAsub K.RAn vowel is always depicted to the left of that Eò + #Å Gò cluster. For example: PHA +RA PH.RA n sub n TAd +RAd+Ivs T.RAn+Ivs Ivs+T.RAd ¡ + #Å £ò iÉ + ®Âú + Ê# jÉ +Ê# ÊjÉ R9 In some cases, other combining marks will also Sample Half-Forms : Dev.1 shows examples of half- combine with a base consonant, either attaching consonant forms that are commonly used with the at a nonstandard location or changing shape. Devanagari script. These forms are glyphs, not In minimal rendering there are only two cases, characters. They may be encoded explicitly using

RAI with Uvs or UUvs . ZERO WIDTH JOINER as shown; in normal RA +U RU conjunct formation, they may be used spontaneously l vs n to depict a dead consonant in combination with + ® #Ö¯û subsequent consonant forms. RA +UU RUU l vs n Dev.1 : Sample Half Forms ®ú + #Ú°ü Eò #Â ZWJ C Memory Representation and Rendering Order The order for storage of plain text in Devanagari JÉ #Â ZWJ J and all other Indic scripts generally follows phonetic order; that is, a CV syllable with a dependent vowel MÉ #Â ZWJ M is always encoded as a consonant letter C followed

Page 35 SÉ # ZWJ S this script uses all of these forms; in particular, many of these forms are used only in writing Sanskrit texts. VÉ # ZWJ V Furthermore, individual fonts may provide fewer or more ligature forms than are depicted here. ZÉ # ZWJ Z Dev.2 : Sample Ligatures \É # ZWJ \ Eò # Eò Gò hÉ # ZWJ h Eò # iÉ Hò iÉ # ZWJ i Eò # ®ú Fêò RÂó # E Só lÉ # ZWJ l RÂó # JÉ Vó vÉ # ZWJ v RÂó # MÉ Wó RÂó # PÉ Xó xÉ # ZWJ x \É # VÉ gÌ {É # ZWJ { VÉ # \É YÉ nÂù # PÉ } ¡ò # ZWJ } nÂù # nù qù ¤É # ZWJ ¤ nÂù # vÉ rù nÂù # ¤É ‚ù ¦É # ZWJ ¦ nÂù # ¦É „ ¨É # ZWJ ¨ nÂù # ¨É s nÂù # ªÉ t ªÉ # ZWJ ª nÂù # ´É uù ±É # ZWJ ± ]Âõ # ]õ ^õ ]Âõ # B _ö ´É # ZWJ ´ B~ # B aö ºÉ # ZWJ º bÂ÷ # MÉ bÂ÷ # b÷ d÷ ¹É # ZWJ ¹ bÂ÷ # fø e÷ ºÉ # ZWJ º iÉ # iÉ kÉ iÉ # ®ú jÉ IÉ # ZWJ I xÉ # xÉ zÉ YÉ # ZWJ Y ¡Âò # ®ú £ò ½Âþ # ¨É À ¸É # ZWJ ¸ ½Âþ # ªÉ Á ½Âþ # ±É ½þ Sample Ligatures : Dev.2 shows examples of ½Âþ # ´É ¾ conjunct ligature forms that are commonly used with the Devanagari script. These forms are glyphs, not ½#Þ Àþ characters. Not every writing system that employs ® ú#Ö ¯û

Page 36 ®#Ú°ü and Symbols : U+0964 ºÉ jÉ ºjÉ DEVANAGARI PURNA VIRAM is similar to a full stop. Corresponding forms occur in many other Indic Sample Half - Ligature Forms : In addition to half- scripts. U+0965 DEVANAGARI DEERGH form glyphs of individual consonants, half-forms are VIRAM marks the end of a verse in traditional texts. also used to depict conjunct ligature forms. A sample of such forms is shown in Dev.3. These forms are Many modern languages written in the Devanagari glyphs, not characters. They may be encoded script intersperse punctuation derived from the Latin explicitly using ZERO WIDTH JOINER as shown; script. Thus U+002C COMMA and U+00E FULL in normal conjunct formation, they may be used STOP are freely used in writing Hindi, and the spontaneously to depict a conjunct ligature in ‘PURNA VIRAM (danda) is usually restricted to combination with subsequent consonant forms. more traditional texts.

Dev.3 : Sample Half-Ligature Forms Encoding Structure : The Unicode Standard iÉ #Â iÉ #Â k organizes the nine principal Indic scripts in blocks iÉ #Â ®ú #Â j of 128 encoding points each. The first six columns in each script are isomorphic with the ISCII-1988 Combining Marks : Devanagari and other Indic encoding, except that the last 11 positions (U+0955 scripts have a number of combining marks that could .. U+095F in Devanagari, for example), which are be considered . One class of these marks, unassigned or undefined in ISCII-1988, are used in known as bindus, is represented by U+0901 the Unicode encoding. DEVANAGARI SIGN CANDRABINDU and The seventh column in each of these scripts, along U+0902 DEVANAGARI SIGN ANUSVARA. The with the last 11 positions in the sixth column, first mark indicates nasalization of a vowel and the represent additional character assignments in the second mark represent a nasal consonant occurring Unicode Standard that are matched across all nine after a vowel or final nasal closure of a syllable. scripts. For example, positions U+xx66 ... U+xx6F U+093C DEVANAGARI SIGN NUKTA is a true and U+xxE6 ... U+xxEF code the Indic script digits diacritic. It is used to extend the basic set of for each script. consonant letters by modifying them (with a subscript dot in Devanagari) to create new letters. The eighth column for each script is reserved for U+0951..U+0957 are a set of combining marks used script-specific additions that do not correspond from in transcription of Sanskrit texts. one Indic script to the next.

Digits : Each Indic script has a distinct set of digits (The above revision is based on detailed discussions appropriate to that script. These digits may or may with National & State level Institutions/Directorates not be used in ordinary text in that script. The dealing with Devanagari based languages - Sanskrit, international form of Indian Digits (Hindsa) have Hindi, Marathi, Nepali, Konkani & Sindhi) displaced the Indic script forms in modern usage in Contact : [email protected] many of the scripts. Some Indic scripts-notably [email protected] Tamil-lack a distinct digit for zero.

Page 37