<<

8.3 The Perso- Standard for iv. Numerals are placed after erabs and superscripts. Information Interchange (This is provided only to support display for specific numerals and standard The standard proposed by C-DAC GIST is an numerals i.e. the ASCII numerals are available). extension to the standard 8-bit ASCII. It compliments the symbol set of Latin by Standardization of Perso-Arabic Fonts adding the symbol of the Perso-Arabic scripts. The standard supports storage for the Perso-Arabic Characteristics of Perso-Arabic : languages like , Persian, Sindhi, Kashmiri, and Arabic. Perso-Arabic languages are written in & scripts. Urdu & Kashmiri are traditionally Characteristics written in the Nastaliq script ; while Sindhi is written i. Its a 8-bit standard in Naskh script. Although the script employs basic letters of the language, the rendering of these letters ii. Supports letters for Urdu, Arabic, Sindhi, in a word is extremely complex. The reason for Kashmiri this complexity is that the text has traditionally been iii. Defines Perso-Arabic alphabets in the upper composed through , a medium whose ASCII (This leaves the lower ASCII free. The precepts are based on the aesthetic sense of the lower ASCII can be used for English alphabets calligrapher rather than on any formula. So great e.g. to give a bi-lingual font support). is the variation in calligraphy that many times it is difficult to recognize the letters in a constituent iv. Defines numerals other than ASCII numbers word. This is because, in their calligraphed form, (48 to 57) (This may help supporting both the individual letters partially or completely fused Arabic Numerals 0-9 and language specific into each other thereby losing their identity. A numerals) degree of fusion is purposely introduced to make v. Maintains the order of alphabets for Perso- the resulting fused glyph visually appealing. Arabic languages. Another characteristic of the Perso-Arabic languages vi. Alphabets / letters are placed in their ascending is the use of diacritics. Diacritics, although sparingly order. Letters like “bhey” are not provided for used, help in the proper pronunciation of the URDU but kept for languages like Sindhi. constituent word. The diacritics appear above or Urdu may make use of the digraph “be” and below a character to specify a vowel or emphasize a “choTi-” for that. particular sound. These are essential for the removal vii. Minimal erabs are provided. Tanveen, for of ambiguities, natural language processing and example do-zabar, can be formed with the help speech synthesis. of double zabar. Standardization of Glyph Set vii. compatability can be achieved by having PASCII to UNICODE & viceversa Following was taken into consideration while converter. designing fonts for the Perso-Arabic languages. Superscripts Considering the complexities of the script it was not possible to accommodate all the glyphs / i. Place for superscripts like khaRa-alif is provided ligatures in an 8 bit code space. Hence 16 – bit ii. Place for superscripts for Arabic is provided font code space was considered. iii. Place for superscripts like “re-”, “ain”, etc. is 1. Alphabet provided.

58 October 2002 2. Numerals • Includes Beginning ligatures 3. Special characters • Includes Middle ligatures 4. Diacritics • Includes Ending ligatures. 5. Religious and linguistic symbols • Includes dotted circle glyph 6. Control characters The 16-bit Nastaliq font for Urdu & Kashmiri India is a paradise in the foot of the great Himalayas in the northern end and lies cocooned by huge Fonts developed by C-DAC for Urdu & Kashmiri oceans on the other three sides. While the Arabian are 16-bit. The Glyphs are defined in the User Area Sea borders the southwest side, the southeast is lulled of the Unicode range. The ASCII range is not used by the Bay of Bengal, and the southern tip - Kanya and can be used for different purposes (it can be Kumari (Cape Comorin) is washed by the Indian used to support English for example). Ocean. Hence protected by such natural barriers • Includes all the basic shapes like mountains and water, it is separated from the rest of Asia. For geographers, it lies to the north of • Includes all the starting shapes and variations the equator between 8.4 and 37.6 degrees north • Includes all the middle shapes and variations latitude and 68.7 and 97.25 degrees east longitude. • Includes all the ending shapes and variations India measures 3214 km from north to south and 2933 kms from east to west. it has a land frontier • Includes levels for erabs (short vowels) of 15,200 kms and a coastline of 7516.5 kms. • Includes Complete ligatures India shares its political borders with and • Includes Beginning ligatures on the west; and Myanmar in the east; Nepal, China, Tibet and Bhutan in the • Includes Middle ligatures north.The Capital of India is New Delhi. • Includes Ending ligatures Languages • Includes dotted circle glyph India has 18 officially recognized languages among The 16-bit Naskh font for Sindhi, Urdu & about 200 languages as enumerated in the census. Kashmiri. Names of Languages Font developed by C-DAC for Sindhi, Urdu & Following languages are listed in the 8th schedule Kashmiri are 16-bit. The Glyphs are defined in the of the Constitution (given in Devanagri order): User Area of the Unicode range. The ASCII range is not used and can be used for different purposes • Assamese (it can be used to support English for example). • Urdu • Includes all the basic shapes • Oriya • Includes all the starting shapes • Kannada • Includes all the middle shapes • Kashmiri • Includes all the ending shapes • Konkani • Includes levels for erabs (short vowels) • Gujarati • Includes Complete ligatures • Tamil

October 2002 59 • Telugu plexities of rendering, a number of alternate shapes • Nepali are possible for a single letter, considering its posi- tion in the word and the letter next to it. Due to • Punjabi this nature of Nastaliq, it increases the glyph set for • Bengali the language. • Manipuri The characters of Urdu also need diacritics to help in a proper pronunciation of the constituent word. • Marathi There are a number of diacritics, the common ones • being Zabar, Zer, and Pesh. • History of Urdu language • Sindhi The word Urdu means ‘Lashkar’, derived from the • Hindi Turkish language meaning 'armies'. In the south of India it flourished under the name of Dakhani Urdu Design Guide : General Information and southwest as Gurjari while in Delhi its name Introduction changed from Hindi to Hindavi and Hindustani. This document provides general information about Alternate names of Urdu are DAKHINI(DAKANI, the Urdu language and some conventions of its DECCAN, DESIA, MIRGAN), PINJARI, usage in India. REKHTA (REKHTI). The information presented in this document is in- Population using the Urdu Language tended to assist in understanding the nature and 48,062,000 in India (1997 IMA); problems of Urdu implementation in the digital 10,719,000 in Pakistan (1993), or 7.57% of the medium. It contains the generic description of population; Urdu. 600,000 in Bangladesh; Urdu is one of the official . It is the official language of Pakistan, and spoken in 64,000 in Mauritius (1993 Johnstone). various countries around the world. 170,000 in South Africa (1987). Language Description 18,500 in Bahrain (1979 WA); Urdu belongs to the Indo-Aryan subgroup of the 17,800 in Oman (1980 WA); Indo-European family of languages. It has devel- oped with the heavy influences of Arabic, Persian 15,400 in Qatar; and Turkish languages. Urdu writing system is a 382,000 in Saudi Arabia; super set of Arabic and Persian and contains 39 characters. Urdu is written from right side to left. 3,562 in Fiji (1980 WA); Unlike English, the characters do not have upper 23,000 in Germany; and lower cases. Further, the shape assumed by a character in a word is context-sensitive i.e. the shape 14,000 in Norway; is different depending whether the position of the Totals : character is at the beginning, in the middle or at the end of the constituent word. 60,290,000 or more in all countries Urdu is traditionally written in Nastaliq, a script 104,000,000 including second language users rich in calligraphic content. Owing to the com- (1999 WA).

60 October 2002 PASCII (Perso-Arabic Standard for Information Interchange) Version 1.0

128 144 160 176 192 208 224 240 8 9 A B C D E F

0 9 k - y ¶ ª 4 1 Kasheeda õ m 2 |» ë l5/ 3 @ ö_øÀÇ 6; ¦ 4 + †Å à 7: üÿ­ 5 ý B c Š 8?= ÿè 6 ò gûþ È 9 µ [ 7 Ê l“ f!.g G ˜Ê k » 8 ú ø ±n e h n/ œÕ/ ÔÂ------} 9 È % cb L ¢ à r A Å o d Reserved Ó ¦ Û/Ö / B ô L i Reserved ô ô p « à ( C l M Q / s­ 0 ) ß D ó ù *. u/ ±áZ 1 E ±ù û +Reserved V {ë2 F ÷ j ATR Reserved [w { 3

October 2002 61 Code Chart Details of Pascii Storage Standard 145 õ LETTER JNE Code Character Description Sindhi Point 146 LETTER _ Urdu, Sindhi, Kashmiri 129 Kasheeda Kasheeda Indicator (used to stretch character) 147 ö LETTER CHHE Sindhi 130 LETTER ALIF @ Urdu, Sindhi, Kashmiri 148 LETTER HAY c Urdu, Sindhi, Kashmiri 131 LETTER ALIF WITH + MADD Urdu, Sindhi, 149 LETTER KHAY Kashmiri g Urdu, Sindhi, Kashmiri 132 LETTER BE 150 LETTER DAAL B Urdu, Sindhi, Kashmiri l Urdu, Sindhi, Kashmiri. 133 ý LETTER BBE 151 Ê LETTER DHAAL Sindhi Sindhi 134 ò LETTER BHE 152 ø LETTER DAAL (retroflex) Sindhi n/ Urdu, Kashmiri/Sindhi 135 LETTER 153 È LETTER DDAAL G Urdu, Sindhi, Kashmiri (implosive) Sindhi 136 ú LETTER PHE 154 Å LETTER DHAAL Sindhi (retroflex) Sindhi 137 LETTER TE 155 LETTER ZAAL L Urdu, Sindhi, Kashmiri. p Urdu, Sindhi, Kashmiri 138 LETTER TE MARBUTA 156 LETTER RE Ó Urdu. s Urdu, Sindhi, Kashmiri 139 ô LETTER THE 157 ù LETTER REY Sindhi u Urdu, Kashmiri/Sindhi 140 l LETTER TEY / 158 ±ù LETTER RHEY Sindhi Q/ Urdu/Sindhi 141 ó LETTER TTE 159 LETTER ZE w Sindhi Urdu, Sindhi, Kashmiri 142 LETTER SE 160 LETTER ZHAY V Urdu, Sindhi y Urdu, Kashmiri 143 LETTER JEEM 161 LETTER SEEN [ Urdu, Sindhi, Kashmiri | Urdu, Sindhi, Kashmiri 144 9 LETTER JJE 162 LETTER SHEEN Sindhi ø Urdu, Sindhi, Kashmiri

62 October 2002 163 LETTER SUAD 182 þ LETTER VAO with Ring † Urdu, Sindhi, Kashmiri Kashmiri 164 LETTER ZUAD 183 LETTER HE Š Urdu, Sindhi, Kashmiri Ê Urdu, Sindhi, Kashmiri 165 LETTER TOE 184 LETTER DOCHASHMI û Urdu, Sindhi, Kashmiri Õ HE Urdu, Sindhi, Kashmiri 166 LETTER ZOE 185 LETTER “ Urdu, Sindhi, Kashmiri à Urdu, Sindhi 167 LETTER AIN 186 LETTER YE ˜ Urdu, Sindhi, Kashmiri Û/Ö Urdu/Sindhi 168 LETTER GHAIN 187 L LETTER YE (CIRCLE œ Urdu, Sindhi, Kashmiri BELOW) Kashmiri 169 LETTER FE 188 M LETTER BARI YE(HALF ¢ Urdu, Sindhi, Kashmiri CIRCLE ABOVE Kashmiri 170 LETTER QAAF 189 LETTER BARI YE ¦ Urdu, Sindhi, Kashmiri á Urdu, Kashmiri 171 LETTER KAAF 190 Diacritic Mark (Zabar) «/­ Urdu, Kashmiri/Sindhi 191 ¡ Diacritic Mark (Zer) 172 LETTER KHE Sindhi 192 ¡ Diacritic Mark (Pesh) 173 « LETTER GAAF 193 ª Diacritic Mark (Ulta Pesh) ± Urdu, Sindhi, Kashmiri 194 ë Diacritic Mark 174 û LETTER GGE Sindhi Ç (Hamza Above) 175 ÷ LETTE NGE Sindhi 195 Diacritic Mark 176 LETTER LAAM à (Hamza Below) ¶ Urdu, Sindhi, Kashmiri 196 ÿ Diacritic Mark 177 LETTER MEEM (Hamza Above) » Urdu, Sindhi, Kashmiri Kashmiri 178 LETTER NOON 197 ÿ Diacritic Mark À Urdu, Sindhi, Kashmiri (Hamza Below) Kashmiri 179 LETTER NOON 198 Diacritic Mark (Tashdeed) Å GHUNNA Urdu 199 µ Diacritic Mark (Madd) 180 LETTER NNOONN ü k (retroflex) Sindhi 200 n Diacritic Mark (Jazm) 181 LETTER VAO 201 Diacritic Mark r È Urdu, Sindhi, Kashmiri (KhaRa Alif Above)

October 2002 63 202 o Diacritic Mark 230 Exclamation symbol ! (KhaRa Alif Below) 231 f Open double quote 203 i Diacritic Mark (Wasl) 232 e Close double quote 204 Punctuation (Tafsiliya) 233 c Open single quote 205 Punctuation (Batt) 234 d Close single quote 206 Z Punctuation (Comma) 235 Open bracket ë ( 207 j Superscript (Suad) 236 Close bracket ) 208 k Superscript (Re-zuad) 237 * Start symbol 209 m Superscript (Re-he) 238 + Plus sign 210 l Superscript (Ain) 239 ATR Attribute. 211 Superscript (Qaf) ARABIC 240 Minus sign ¦ - 212 Superscript(Kaaf) ARABIC 241 Forward slash ­ / 213 Superscript (Laam-alif) 242 Semi colon è ARABIC ; 243 Colon 214 Superscript (Jeem) ARABIC : [ 244 Question mark 215 Superscript (Meem) ? 245 = Equal sign » ARABIC 246 Sentence dash 216 Year symbol -Gregorian . ÔÂ------} 247 g Ayat end (Arabic) 217 % Punctuation 248 Filled circle. (Percentage symbol) h 249 Thousand separator 218 Number & Text separator b / symbol 250 Reserved (Control char for DM) 219 Decimal point (Asharya) 251 Reserved (Control char for LB) à 220 DIGIT 0 252 Empty Circle ß 221 0 DIGIT 1 253 . symbol 1 222 DIGIT 2 254 Reserved 2 223 DIGIT 3 255 Reserved 3 224 DIGIT 4 4 225 DIGIT 5 5 226 DIGIT 6 6 227 DIGIT 7 7 228 DIGIT 8 8 229 DIGIT 9 9 64 October 2002