8.3 the Perso-Arabic Standard for Information Interchange
Total Page:16
File Type:pdf, Size:1020Kb
8.3 The Perso-Arabic Standard for iv. Numerals are placed after erabs and superscripts. Information Interchange (This is provided only to support display for language specific numerals and standard The standard proposed by C-DAC GIST is an numerals i.e. the ASCII numerals are available). extension to the standard 8-bit ASCII. It compliments the symbol set of Latin script by Standardization of Perso-Arabic Fonts adding the symbol of the Perso-Arabic scripts. The standard supports storage for the Perso-Arabic Characteristics of Perso-Arabic languages : languages like Urdu, Persian, Sindhi, Kashmiri, and Arabic. Perso-Arabic languages are written in Naskh & Nastaliq scripts. Urdu & Kashmiri are traditionally Characteristics written in the Nastaliq script ; while Sindhi is written i. Its a 8-bit standard in Naskh script. Although the script employs basic letters of the language, the rendering of these letters ii. Supports letters for Urdu, Arabic, Sindhi, in a word is extremely complex. The reason for Kashmiri this complexity is that the text has traditionally been iii. Defines Perso-Arabic alphabets in the upper composed through calligraphy, a medium whose ASCII (This leaves the lower ASCII free. The precepts are based on the aesthetic sense of the lower ASCII can be used for English alphabets calligrapher rather than on any formula. So great e.g. to give a bi-lingual font support). is the variation in calligraphy that many times it is difficult to recognize the letters in a constituent iv. Defines numerals other than ASCII numbers word. This is because, in their calligraphed form, (48 to 57) (This may help supporting both the individual letters partially or completely fused Arabic Numerals 0-9 and language specific into each other thereby losing their identity. A numerals) degree of fusion is purposely introduced to make v. Maintains the order of alphabets for Perso- the resulting fused glyph visually appealing. Arabic languages. Another characteristic of the Perso-Arabic languages vi. Alphabets / letters are placed in their ascending is the use of diacritics. Diacritics, although sparingly order. Letters like “bhey” are not provided for used, help in the proper pronunciation of the URDU but kept for languages like Sindhi. constituent word. The diacritics appear above or Urdu may make use of the digraph “be” and below a character to specify a vowel or emphasize a “choTi-he” for that. particular sound. These are essential for the removal vii. Minimal erabs are provided. Tanveen, for of ambiguities, natural language processing and example do-zabar, can be formed with the help speech synthesis. of double zabar. Standardization of Glyph Set vii. Unicode compatability can be achieved by having PASCII to UNICODE & viceversa Following was taken into consideration while converter. designing fonts for the Perso-Arabic languages. Superscripts Considering the complexities of the script it was not possible to accommodate all the glyphs / i. Place for superscripts like khaRa-alif is provided ligatures in an 8 bit code space. Hence 16 – bit ii. Place for superscripts for Arabic is provided font code space was considered. iii. Place for superscripts like “re-ze”, “ain”, etc. is 1. Alphabet provided. 58 October 2002 2. Numerals • Includes Beginning ligatures 3. Special characters • Includes Middle ligatures 4. Diacritics • Includes Ending ligatures. 5. Religious and linguistic symbols • Includes dotted circle glyph 6. Control characters India The 16-bit Nastaliq font for Urdu & Kashmiri India is a paradise in the foot of the great Himalayas in the northern end and lies cocooned by huge Fonts developed by C-DAC for Urdu & Kashmiri oceans on the other three sides. While the Arabian are 16-bit. The Glyphs are defined in the User Area Sea borders the southwest side, the southeast is lulled of the Unicode range. The ASCII range is not used by the Bay of Bengal, and the southern tip - Kanya and can be used for different purposes (it can be Kumari (Cape Comorin) is washed by the Indian used to support English for example). Ocean. Hence protected by such natural barriers • Includes all the basic shapes like mountains and water, it is separated from the rest of Asia. For geographers, it lies to the north of • Includes all the starting shapes and variations the equator between 8.4 and 37.6 degrees north • Includes all the middle shapes and variations latitude and 68.7 and 97.25 degrees east longitude. • Includes all the ending shapes and variations India measures 3214 km from north to south and 2933 kms from east to west. it has a land frontier • Includes levels for erabs (short vowels) of 15,200 kms and a coastline of 7516.5 kms. • Includes Complete ligatures India shares its political borders with Pakistan and • Includes Beginning ligatures Afghanistan on the west; Bangladesh and Myanmar in the east; Nepal, China, Tibet and Bhutan in the • Includes Middle ligatures north.The Capital of India is New Delhi. • Includes Ending ligatures Languages • Includes dotted circle glyph India has 18 officially recognized languages among The 16-bit Naskh font for Sindhi, Urdu & about 200 languages as enumerated in the census. Kashmiri. Names of Languages Font developed by C-DAC for Sindhi, Urdu & Following languages are listed in the 8th schedule Kashmiri are 16-bit. The Glyphs are defined in the of the Constitution (given in Devanagri order): User Area of the Unicode range. The ASCII range is not used and can be used for different purposes • Assamese (it can be used to support English for example). • Urdu • Includes all the basic shapes • Oriya • Includes all the starting shapes • Kannada • Includes all the middle shapes • Kashmiri • Includes all the ending shapes • Konkani • Includes levels for erabs (short vowels) • Gujarati • Includes Complete ligatures • Tamil October 2002 59 • Telugu plexities of rendering, a number of alternate shapes • Nepali are possible for a single letter, considering its posi- tion in the word and the letter next to it. Due to • Punjabi this nature of Nastaliq, it increases the glyph set for • Bengali the language. • Manipuri The characters of Urdu also need diacritics to help in a proper pronunciation of the constituent word. • Marathi There are a number of diacritics, the common ones • Malayalam being Zabar, Zer, and Pesh. • Sanskrit History of Urdu language • Sindhi The word Urdu means ‘Lashkar’, derived from the • Hindi Turkish language meaning 'armies'. In the south of India it flourished under the name of Dakhani Urdu Design Guide : General Information and southwest as Gurjari while in Delhi its name Introduction changed from Hindi to Hindavi and Hindustani. This document provides general information about Alternate names of Urdu are DAKHINI(DAKANI, the Urdu language and some conventions of its DECCAN, DESIA, MIRGAN), PINJARI, usage in India. REKHTA (REKHTI). The information presented in this document is in- Population using the Urdu Language tended to assist in understanding the nature and 48,062,000 in India (1997 IMA); problems of Urdu implementation in the digital 10,719,000 in Pakistan (1993), or 7.57% of the medium. It contains the generic description of population; Urdu. 600,000 in Bangladesh; Urdu is one of the official languages of India. It is the official language of Pakistan, and spoken in 64,000 in Mauritius (1993 Johnstone). various countries around the world. 170,000 in South Africa (1987). Language Description 18,500 in Bahrain (1979 WA); Urdu belongs to the Indo-Aryan subgroup of the 17,800 in Oman (1980 WA); Indo-European family of languages. It has devel- oped with the heavy influences of Arabic, Persian 15,400 in Qatar; and Turkish languages. Urdu writing system is a 382,000 in Saudi Arabia; super set of Arabic and Persian and contains 39 characters. Urdu is written from right side to left. 3,562 in Fiji (1980 WA); Unlike English, the characters do not have upper 23,000 in Germany; and lower cases. Further, the shape assumed by a character in a word is context-sensitive i.e. the shape 14,000 in Norway; is different depending whether the position of the Totals : character is at the beginning, in the middle or at the end of the constituent word. 60,290,000 or more in all countries Urdu is traditionally written in Nastaliq, a script 104,000,000 including second language users rich in calligraphic content. Owing to the com- (1999 WA). 60 October 2002 PASCII (Perso-Arabic Standard for Information Interchange) Version 1.0 128 144 160 176 192 208 224 240 8 9 A B C D E F 0 9 k - y ¶ ª 4 1 Kasheeda õ m 2 |» ë l5/ 3 @ ö_øÀÇ 6; ¦ 4 + Å à 7: üÿ 5 ý B c 8?= ÿè 6 ò gûþ È 9 µ [ 7 Ê l f!.g G Ê k » 8 ú ø ±n e h n/ Õ/ ÔÂ---------} 9 È % cb L ¢ à r A Å o d Reserved Ó ¦ Û/Ö / B ô L i Reserved ô ô p « à ( C l M Q / s 0 ) ß D ó ù *. u/ ±áZ 1 E ±ù û +Reserved V {ë2 F ÷ j ATR Reserved [w { 3 October 2002 61 Code Chart Details of Pascii Storage Standard 145 õ LETTER JNE Code Character Description Sindhi Point 146 LETTER CHE _ Urdu, Sindhi, Kashmiri 129 Kasheeda Kasheeda Indicator (used to stretch character) 147 ö LETTER CHHE Sindhi 130 LETTER ALIF @ Urdu, Sindhi, Kashmiri 148 LETTER HAY c Urdu, Sindhi, Kashmiri 131 LETTER ALIF WITH + MADD Urdu, Sindhi, 149 LETTER KHAY Kashmiri g Urdu, Sindhi, Kashmiri 132 LETTER BE 150 LETTER DAAL B Urdu, Sindhi, Kashmiri l Urdu, Sindhi, Kashmiri. 133 ý LETTER BBE 151 Ê LETTER DHAAL Sindhi Sindhi 134 ò LETTER BHE 152 ø LETTER DAAL (retroflex) Sindhi n/ Urdu, Kashmiri/Sindhi 135 LETTER PE 153 È LETTER DDAAL G Urdu, Sindhi, Kashmiri (implosive) Sindhi 136 ú LETTER PHE 154 Å LETTER DHAAL Sindhi (retroflex) Sindhi 137 LETTER TE 155 LETTER ZAAL L Urdu, Sindhi, Kashmiri.