Tugboat, Volume 19 (1998), No. 2 115 an Overview of Indic Fonts For

Total Page:16

File Type:pdf, Size:1020Kb

Tugboat, Volume 19 (1998), No. 2 115 an Overview of Indic Fonts For TUGboat, Volume 19 (1998), No. 2 115 the Indo-Aryan and Dravidian language families of Fonts India. Such uniformity in phonetics is reflected in orthography, which in turn enables all scripts to be transliterated through a single scheme. This unifor- An Overview of Indic Fonts for TEX mity has subsequently been reflected in the translit- Anshuman Pandey eration schemes of the Indic language/script pack- ages. 1 Introduction Most packages have their own transliteration Many scholars and students in the humanities have scheme, but these schemes are essentially variations on a single scheme, differing merely in the coding preferred TEX over other “word processors” or doc- ument preparation systems because of the ease TEX of a few vowel, nasal, and retroflex letters. Most provides them in typesetting non-Roman scripts, the of these packages accept input in one of the two availability of TEX fonts of interest to them, and the primary 7-bit transliteration schemes— ITRANS or ability TEX has in producing well-structured docu- Velthuis—or a derivative of one of them. There ments. is also an 8-bit format called CS/CSX which a few However, this is not the case amongst Indol- of these packages support. CS/CSX is described in ogists. The lack of Indic fonts for TEXandthe further detail in Section 3. perceived difficulty of typesetting them have often 2 The Fonts and Packages turned Indologists away from using TEX. Little do they realize that TEXisthe foremost tool for de- Figure 1 shows examples of the various fonts de- veloping Indic language/script documents. With an scribed in this article. Table 1 lists the sites from increase over the past few years in the development which all of the fonts and packages described in this and availability of Indic language and font pack- article are available. ages, the introduction of other fonts and style pack- 3 CS/CSX ages, the flexibility of the LATEX2ε system, and the creation of TUGIndia (which may revolutionize the CS/CSX (Classical Sanskrit/Classical Sanskrit eX- typesetting of Indic scripts) there is now even more tended) is the closest thing to an accepted stan- reason for Indologists to implement TEX in their dardization of 8-bit transliteration of Indic scripts. work. Adopted in 1990 at the 8th World Sanskrit Con- There are roughly thirteen major Indic scripts ference in Vienna, CS/CSX enables Indologists to (Tibetan is included in this list) which are used exchange electronic data in a variety of platform- throughout South Asia to write the major languages independent media. and dialects of the region. As of this article all of CS/CSX is an encoding convention based on IBM Code Page 437. CS is a basic inventory of dia- these major scripts can be typeset with TEX, the exception being Assamese (see Section 6). critic letters which are traditionally used to translit- Not only is it fascinating that the major scripts erate Sanskrit written in the Devanagari script. CSX of South Asia can be typeset with TEX, but the ease is an extension of this basic inventory to include with which such a task can be accomplished is itself accented and other characters. Contrary to what an amazing feat. Anyone who has ever tried writing the name indicates, the inventory of CS/CSX char- a document with multiple non-Roman scripts and acters is not limited to Sanskrit, and may be used diacritic text in an environment other than TEX un- to transliterate other Indic languages. Introductory information on CS/CSX is found derstands the complexity of such a task. TEXtakes the user beyond such difficulties by facilitating the in an article by Dominik Wujastyk titled Standard- implementation of multiple scripts without the has- ization of Sanskrit for Electronic Data Transfer and sle of worrying about various fonts and their en- Screen Representation [1]. This document, as well codings, manual font switching, and other such hin- as supporting screen fonts and drivers for DOS-based drances to productivity caused by common “word machines, is available from the INDOLOGY site as processors”. well as from CTAN. TEX enables the incorporation of several non- Various fonts and packages have been developed Roman scripts within a single document through which enable TEX to typeset documents encoded in transliterated input of the scripts. Indic scripts are the CS/CSX convention. These are enumerated be- based on the phonetic template of the languages low: they represent, a template which is uniform in both cp437csx The file cp437csx.def is an input en- coding definition file for LATEX2ε which enables 116 TUGboat, Volume 19 (1998), No. 2 K g G R kKÅgÅGÅdû k K g G R k ã Devnag Devnac Devnag Pen Ka ga Ga z k @ØZwWC Íû©Ê× Sanskrit ItxBengali ItxGujarati K g G ऄ आ ऎ ऐ ckgGL k Punjabi Gurmukhi Washington Tamil Gौ H Iौ Jौ K ऋ ऌ ऍ ऎ ए H P X C h Telugu Malayalam Sinhala ࣽ ࣾ ࣿ ऀ k K g G f k K g G f Konark Cuttack AAI Kannada इ आ ࣽࣾࣿ H H H H P Q R S T ई GTib etian Naskh Washington Brahmi Figure 1: Example of Indic Fonts CS/CSX encoded documents to be typeset in the INDOLOGY site and is also bundled with LATEX without need for conversion. The file is the ITRANS package. available from CTAN as part of the csx package. csutopia Dominik Wujastyk produced an extension csxtimes John Smith has made available the font and re-encoding of the Adobe Utopia font ac- metrics and virtual fonts of the commonly-used cording to the CS/CSX convention called ‘CS PostScript fonts re-encoded with the CSX en- Utopia’. Users should note that ‘CS Utopia’ coding. The use of these fonts enable CS/CSX supports only the characters of the Classical documents to be typeset directly by TEX with- Sanskrit encoding and does not support the Ex- out the need for any conversions. This is facil- tended encoding. This font is available from the itated through the csx.def file which provides INDOLOGY site and is also bundled with the the CS/CSX input encoding definitions for stan- ITRANS package. dard LATEX and for the standard TEXfonts. wnri Thomas Ridgeway developed a package called cscharter Dominik Wujastyk produced an exten- ‘Washington Roman Indic’ which contained a sion and re-encoding of the Bitstream Charter family of fonts based on Computer Modern and font according to the CS/CSX convention called which were encoded with the CS/CSX and other ‘CS Charter’. Users should note that ‘CS Char- supplementary conventions. I recently revised ter’ supports only the characters of the Classi- the package for use with LATEX2ε and added cal Sanskrit encoding and does not support the a style file and input encoding definition file Extended encoding. This font is available from which does away with the need for the fonts. TUGboat, Volume 19 (1998), No. 2 117 Ridgeway also developed screen fonts and driv- tains the characters required to typeset San- ers for wnri for DOS-based machines. The up- skrit, Hindi, Marathi, and any other languages dated package is available from CTAN. which use the Devanagari script. The font ‘Dev- nag Pen’ developed by Thomas Ridgeway is a 4babel variation on ‘Devnag’ which resembles Devana- Recently Jun Takashima introduced two Indic lan- gari written with an ordinary pen and is bun- guage modules to the babel fold. These packages dled with devnag. enable support for Romanized Sanskrit and for Kan- Dominik Wujastyk, John Smith, myself, and nada in both the original and Roman scripts. Please a few others have recently upgraded devnag for refer to Section 5.5 for a description of Takashima’s A use with LTEX2ε. The package is now NFSS- Kannada package. compliant. skthyph This module provides the hyphenation A patterns for Romanized Sanskrit. As of this sanskrit The Sanskrit for LTEX2ε package devel- article these files are not distributed with the oped by Charles Wikner is an extensive pack- current version of babel but will be included age which enables the typesetting of Devana- in the next release. This module is presently gari text with Vedic accents and other special available only from the developer’s FTP site. characters not supported by the devnag pack- age. Numerous options may be set in regard 5ITRANS to transliteration, alternate characters, inter- The ITRANS package developed by Avinash Chopde character spacing, and other preferences. Only is the primary component of an on-going project to support for the Sanskrit language is available. make the typesetting of all Indic scripts possible by The font ‘Sanskrit’, also developed by Wikner, means of a single tool. As of this article, ITRANS is bundled with the package. It is a rather com- supports the Bengali, Devanagari, Gujarati, Gur- plete font in that it contains many complex lig- mukhi, Kannada, Tamil, and Telugu scripts. It also atures and variants which enable excellent type- supports the ‘CS Utopia’ diacritic Roman font. setting of Devanagari. This package is available In addition to the default TEX output, ITRANS from CTAN. can produce direct HTML and PostScript output itrans Four fonts provide Devanagari support in ITRANS DOS from the input file. versions for both ITRANS: the ‘Devnag’ and ‘Devanagari Pen’ and UNIX systems are available from the developer’s fonts described above and two more called ‘Dev- website. nac’ and ‘Xdvng’. The ‘Devnac’ font is a 5.1 Bengali PostScript Type 3 font developed by Avinash Chopde for the ITRANS package. ‘Devnac’ was arosgaon The AroSgaon package was developed developed to enable users unfamiliar with TEX by Muhammad Masroor Ali as an extension to to still produce texts in Devanagari through HP the ‘SonarGaon’ Laserjet softfont designed the “dumb textual interface” mode of ITRANS.
Recommended publications
  • Section 14.4, Phags-Pa
    The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1.
    [Show full text]
  • Comparison, Selection and Use of Sentence Alignment Algorithms for New Language Pairs
    Comparison, Selection and Use of Sentence Alignment Algorithms for New Language Pairs Anil Kumar Singh Samar Husain LTRC, IIIT LTRC, IIIT Gachibowli, Hyderabad Gachibowli, Hyderabad India - 500019 India - 500019 a [email protected] s [email protected] Abstract than 95%, and usually 98 to 99% and above). The evaluation is performed in terms of precision, and Several algorithms are available for sen- sometimes also recall. The figures are given for one tence alignment, but there is a lack of or (less frequently) more corpus sizes. While this systematic evaluation and comparison of does give an indication of the performance of an al- these algorithms under different condi- gorithm, the variation in performance under varying tions. In most cases, the factors which conditions has not been considered in most cases. can significantly affect the performance Very little information is given about the conditions of a sentence alignment algorithm have under which evaluation was performed. This gives not been considered while evaluating. We the impression that the algorithm will perform with have used a method for evaluation that the reported precision and recall under all condi- can give a better estimate about a sen- tions. tence alignment algorithm's performance, We have tested several algorithms under differ- so that the best one can be selected. We ent conditions and our results show that the per- have compared four approaches using this formance of a sentence alignment algorithm varies method. These have mostly been tried significantly, depending on the conditions of test- on European language pairs. We have ing. Based on these results, we propose a method evaluated manually-checked and validated of evaluation that will give a better estimate of the English-Hindi aligned parallel corpora un- performance of a sentence alignment algorithm and der different conditions.
    [Show full text]
  • Slides for My Lecture for the Texperience 2010
    ◦ DEVELOPMENTDEVELOPMENT OF OFxxındyındy◦ SORTSORT AND AND MERGE MERGE RULES RULES FORFOR INDIC INDIC LANGUAGES LANGUAGES ZdeněkZdeněk Wagner, Wagner, Praha, Praha, Česká Česká republika republika AnshumanAnshuman Pandey, Pandey, Univ. Univ. Michigan, Michigan, USA USA JayaJaya Saraswati, Saraswati, Mumbai, Mumbai, India India ◦ NoteNote on on pronunciation pronunciation of ofxxındy◦ındy ◦ NoteNote on on pronunciation pronunciation of ofxxındy◦ındy Czech: ks ◦ NoteNote on on pronunciation pronunciation of ofxxındy◦ındy Czech: ks English: usually as z ◦ NoteNote on on pronunciation pronunciation of ofxxındy◦ındy Czech: ks English: usually as z Hindi: as ks., l#mF can be transliterated either Lakshmi or Laxmi ◦ NoteNote on on pronunciation pronunciation of ofxxındy◦ındy Czech: ks English: usually as z Hindi: as ks., l#mF can be transliterated either Lakshmi or Laxmi Chinese: as sh ◦ NoteNote on on pronunciation pronunciation of ofxxındy◦ındy Czech: ks English: usually as z Hindi: as ks., l#mF can be transliterated either Lakshmi or Laxmi Chinese: as sh Russian: 娭¤¨ (meaning Hindi) ◦ NoteNote on on pronunciation pronunciation of ofxxındy◦ındy Czech: ks English: usually as z Hindi: as ks., l#mF can be transliterated either Lakshmi or Laxmi Chinese: as sh Russian: 娭¤¨ (meaning Hindi) x◦ındy sorts Hindi MakeIndexMakeIndex • version for English and German • CSIndex – version for Czech and Slovak • unpublished version for Sanskrit (Mark Csernel) Tables defining the sort algorithm are hard-wired in the pro- gram source code. Modification for other languages is difficult and leads rather to confusion than to development of a univer- sal tool. InternationalInternationalMakeIndexMakeIndex • Tables defining the sort algorithm present in external files. • Sort rules defined by regular expressions.
    [Show full text]
  • Note: the Font “Hindiromanized” Must Be Selected
    Hinweis: Der Font “HindiRomanized” muss Note: the font “HindiRomanized” must be ausgewählt sein! selected! Jetzt können Sie ihre Kaschmiri-Texte in Devanagari- Now you may convert your Hindi texts in Schrift umwandeln Devanagari script entweder in ihrer reversen Form (reverse either in their reverted form (reverse Transliteration) Transliteration) oder in ihrer romanisierten Form (Romanization) or in their romanized form (Romanization) Um das zu machen, verfahren Sie so: To do this proceed as described: Klicken Sie mit der rechten Maustaste auf das Right click the “Aksharamala” icon and choose „Aksharamala” Icon und wählen Sie „Options“ “Options” Wählen Sie „Transliteration using multiple Select “Transliteration using multiple Keymaps” Keymaps” Um Kashmiri revers: Aksharas mit dem inherenten To get Kashmiri revers: Aksharas with the Vokal “a” zu erhalten inherent Vowel “a” Wählen Sie als den 1. Keymap: Select as the 1st Keymap: “VerynewDevKashmiri FullReverse-Literation “VerynewDevKashmiri FullReverse-Literation (Unicode -> ITRANS)” (Unicode -> ITRANS)” Wählen Sie als den 2. Keymap “RomKashmiriRaina Select as the 2nd Keymap: (ITRANS -> Unicode)” “RomKashmiriRaina (ITRANS -> Unicode)” oder or “RomKashmiriKoul (ITRANS -> Unicode)” “RomKashmiriKoul (ITRANS -> Unicode)” Drücken Sie anschließend „OK“ Click finally on “OK” Wählen Sie nun den Keymap “VerynewDevKashmiri Now select the keymap “VerynewDevKashmiri FullReverse-Literation (Unicode -> ITRANS)” aus FullReverse-Literation (Unicode -> ITRANS)” Um Kashmiri revers: Aksharas ohne den inherenten
    [Show full text]
  • Tugboat, Volume 15 (1994), No. 4 447 Indica, an Indic Preprocessor
    TUGboat, Volume 15 (1994), No. 4 447 HPXC , an Indic preprocessor for T X script, ...inMalayalam, Indica E 1 A Sinhalese TEXSystem hpx...inSinhalese,etc. This justifies the choice of a common translit- Yannis Haralambous eration scheme for all Indic languages. But why is Abstract a preprocessor necessary, after all? A common characteristic of Indic languages is In this paper a two-fold project is described: the first the fact that the short vowel ‘a’ is inherent to con- part is a generalized preprocessor for Indic scripts (scripts of languages currently spoken in India—except Urdu—, sonants. Vowels are written by adding diacritical Sanskrit and Tibetan), with several kinds of input (LATEX marks (or smaller characters) to consonants. The commands, 7-bit ascii, CSX, ISO/IEC 10646/unicode) beauty (and complexity) of these scripts comes from and TEX output. This utility is written in standard Flex the fact that one needs a special way to denote the (the gnu version of Lex), and hence can be painlessly absence of vowel. There is a notorious diacritic, compiled on any platform. The same input methods are called “vir¯ama”, present in all Indic languages, which used for all Indic languages, so that the user does not is used for this reason. But it seems illogical to add a need to memorize different conventions and commands sign, to specify the absence of a sound. On the con- for each one of them. Moreover, the switch from one lan- trary, it seems much more logical to remove some- guage to another can be done by use of user-defineable thing, and what is done usually is that letters are preprocessor directives.
    [Show full text]
  • Exploring the Linguistic Influence of Tibet in Ladakh(La-Dwags)
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Kobe City University of Foreign Studies Institutional Repository 神戸市外国語大学 学術情報リポジトリ Exploring the linguistic influence of Tibet in Ladakh(La-dwags) 著者 Namgyal Tsetan journal or Journal of Research Institute : Historical publication title Development of the Tibetan Languages volume 49 page range 115-147 year 2013-03-01 URL http://id.nii.ac.jp/1085/00001408/ Creative Commons : 表示 - 非営利 - 改変禁止 http://creativecommons.org/licenses/by-nc-nd/3.0/deed.ja -RXUQDORI5HVHDUFK,QVWLWXWH9RO ([SORULQJ WKH /LQJXLVWLF ,QIOXHQFH RI 7LEHW LQ /DGDNK /DGZDJV 7VHWDQ1DPJ\DO -DZDKDUODO1HKUX8QLYHUVLW\ ,QWURGXFWLRQ /DQJXDJH LVWKHHVVHQFHRINQRZOHGJHDQGWKHOLIHIRUFHRIKXPDQNLQG ,W VHUYHVDVWKHPHGLXP RIFRPPXQLFDWLRQLQVRFLHW\DQGRWKHUVRFLDOGRPDLQV DQG GHWHUPLQHVRQH¶VFXOWXUHLQUHODWLRQ WRWKH ZRUOG$VWKH QRWHG7LEHWDQVFKRODU=KDQJWRQ7HQSD*\DWVR ± VWDWHG³,WLV JRRG WR OHDUQ DOO ODQJXDJHV EXW IRUJHWWLQJ DQG LJQRULQJ RQH¶V RZQ ODQJXDJH LV D VKDPH´ 5HJDUGLQJ WKH7LEHWDQODQJXDJHIRUWXQDWHO\WKHDJHROGLQGLJHQRXV7LEHWDQVFULSW LQZKLFKWKH HQWLUH %XGGKLVW VFULSWXUHV DQG RWKHU UHODWHG OLWHUDWXUH DUH ZULWWHQ KDV EHHQ SUHVHUYHG 7KLV ODQJXDJH LQWURGXFHGGXULQJWKHUHLJQRI(PSHURU6RQJWVHQ*DPSR ± $' LV EDVHG RQWKH,QGLFVFULSW ,QVSLWHRID GHFOLQHLQLWVJUDPPDWLFDOXVDJHRYHU WKHODVWVL[GHFDGHV RZLQJ WRSROLWLFDOXSKHDYDOWKH7LEHWDQODQJXDJHDOVRNQRZQDV%KRWL RU%RGKL UHPDLQV RQHRIWKH PRVWLPSRUWDQWODQJXDJHV RI&HQWUDO$VLD,QDGGLWLRQWKHODQJXDJHKDVEHFRPHZHOO NQRZQIRU LWVVLJQLILFDQWFRQWULEXWLRQWRZDUG WKHGHYHORSPHQWRIKXPDQVRFLHW\
    [Show full text]
  • Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR)
    Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) LGR Version: 3.0 Date: 2019-03-06 Document version: 2.6 Authors: Neo-Brahmi Generation Panel [NBGP] 1. General Information/ Overview/ Abstract The purpose of this document is to give an overview of the proposed Kannada LGR in the XML format and the rationale behind the design decisions taken. It includes a discussion of relevant features of the script, the communities or languages using it, the process and methodology used and information on the contributors. The formal specification of the LGR can be found in the accompanying XML document: proposal-kannada-lgr-06mar19-en.xml Labels for testing can be found in the accompanying text document: kannada-test-labels-06mar19-en.txt 2. Script for which the LGR is Proposed ISO 15924 Code: Knda ISO 15924 N°: 345 ISO 15924 English Name: Kannada Latin transliteration of the native script name: Native name of the script: ಕನ#ಡ Maximal Starting Repertoire (MSR) version: MSR-4 Some languages using the script and their ISO 639-3 codes: Kannada (kan), Tulu (tcy), Beary, Konkani (kok), Havyaka, Kodava (kfa) 1 Proposal for a Kannada Script Root Zone Label Generation Ruleset (LGR) 3. Background on Script and Principal Languages Using It 3.1 Kannada language Kannada is one of the scheduled languages of India. It is spoken predominantly by the people of Karnataka State of India. It is one of the major languages among the Dravidian languages. Kannada is also spoken by significant linguistic minorities in the states of Andhra Pradesh, Telangana, Tamil Nadu, Maharashtra, Kerala, Goa and abroad.
    [Show full text]
  • Learning to Serve and to Roam You Could Ask Almost Any Tibetan
    CHAPTER FIVE LEARNING TO SERVE AND TO ROAM You could ask almost any Tibetan secondary student in Tibet or the diaspora where the writing system for Tibetan came from and they would tell you the story of King Songtsen Gampo and his seven ministers. While it is unclear if this story recounts actual events, it is the most well known account regarding the origins of the Tibetan writing system and as such is part of Tibetans’ collective memory. Through its retelling to successive generations of Tibetan children, it has helped to shape ideologies of lit- eracy by linking the writing system to Tibetan religion and the Tibetan nation-state. In this version of the origination story, King Songtsen Gampo, the sixth century ruler of Tibet, is credited with having sent seven government min- isters to India with the mission of bringing back a writing system that could be used to translate Buddhist texts into Tibetan. These ministers, however, met with many hardships while in India. Six of the ministers succumbed to illness or disease, leaving Thonmi Sambhota the sole minis- ter to return to Tibet with the writing system they had developed from the Indian Devanagari script. So famous is the story of this perilous journey and successful return that a statue of Thonmi Sambhota today stands in the main courtyard at the Tibetan Academy of Social Sciences in Lhasa. Outside the Library of Tibetan Works and Archives in McLeod Ganj, there is a mural depicting him sitting cross-legged writing the Tibetan alphasyllabary1 on parchment. The story of the Tibetan script, King Songtsen Gampo, and Thonmi Sambhota, however, is more than just a tale of adventure and adversity.
    [Show full text]
  • A Script Independent Approach for Handwritten Bilingual Kannada and Telugu Digits Recognition
    IJMI International Journal of Machine Intelligence ISSN: 0975–2927 & E-ISSN: 0975–9166, Volume 3, Issue 3, 2011, pp-155-159 Available online at http://www.bioinfo.in/contents.php?id=31 A SCRIPT INDEPENDENT APPROACH FOR HANDWRITTEN BILINGUAL KANNADA AND TELUGU DIGITS RECOGNITION DHANDRA B.V.1, GURURAJ MUKARAMBI1, MALLIKARJUN HANGARGE2 1Department of P.G. Studies and Research in Computer Science Gulbarga University, Gulbarga, Karnataka. 2Department of Computer Science, Karnatak Arts, Science and Commerce College Bidar, Karnataka. *Corresponding author. E-mail: [email protected],[email protected],[email protected] Received: September 29, 2011; Accepted: November 03, 2011 Abstract- In this paper, handwritten Kannada and Telugu digits recognition system is proposed based on zone features. The digit image is divided into 64 zones. For each zone, pixel density is computed. The KNN and SVM classifiers are employed to classify the Kannada and Telugu handwritten digits independently and achieved average recognition accuracy of 95.50%, 96.22% and 99.83%, 99.80% respectively. For bilingual digit recognition the KNN and SVM classifiers are used and achieved average recognition accuracy of 96.18%, 97.81% respectively. Keywords- OCR, Zone Features, KNN, SVM 1. Introduction recognition of isolated handwritten Kannada numerals Recent advances in Computer technology, made every and have reported the recognition accuracy of 90.50%. organization to implement the automatic processing Drawback of this procedure is that, it is not free from systems for its activities. For example, automatic thinning. U. Pal et al. [11] have proposed zoning and recognition of vehicle numbers, postal zip codes for directional chain code features and considered a feature sorting the mails, ID numbers, processing of bank vector of length 100 for handwritten Kannada numeral cheques etc.
    [Show full text]
  • 16-Sanskrit-In-JAPAN.Pdf
    A rich literary treasure of Sanskrit literature consisting of dharanis, tantras, sutras and other texts has been kept in Japan for nearly 1400 years. Entry of Sanskrit Buddhist scriptures into Japan was their identification with the central axis of human advance. Buddhism opened up unfathomed spheres of thought as soon as it reached Japan officially in AD 552. Prince Shotoku Taishi himself wrote commentaries and lectured on Saddharmapundarika-sutra, Srimala- devi-simhanada-sutra and Vimala-kirt-nirdesa-sutra. They can be heard in the daily recitation of the Japanese up to the day. Palmleaf manuscripts kept at different temples since olden times comprise of texts which carry immeasurable importance from the viewpoint of Sanskrit philology although some of them are incomplete Sanskrit manuscripts crossed the boundaries of India along with the expansion of Buddhist philosophy, art and thought and reached Japan via Central Asia and China. Thousands of Sanskrit texts were translated into Khotanese, Tokharian, Uigur and Sogdian in Central Asia, on their way to China. With destruction of monastic libraries, most of the Sanskrit literature perished leaving behind a large number of fragments which are discovered by the great explorers who went from Germany, Russia, British India, Sweden and Japan. These excavations have uncovered vast quantities of manuscripts in Sanskrit. Only those manuscripts and texts have survived which were taken to Nepal and Tibet or other parts of Asia. Their translations into Tibetan, Chinese and Mongolian fill the gap, but partly. A number of ancient Sanskrit manuscripts are strewn in the monasteries nestling among high mountains and waterless deserts.
    [Show full text]
  • Devan¯Agar¯I for TEX Version 2.17.1
    Devanagar¯ ¯ı for TEX Version 2.17.1 Anshuman Pandey 6 March 2019 Contents 1 Introduction 2 2 Project Information 3 3 Producing Devan¯agar¯ıText with TEX 3 3.1 Macros and Font Definition Files . 3 3.2 Text Delimiters . 4 3.3 Example Input Files . 4 4 Input Encoding 4 4.1 Supplemental Notes . 4 5 The Preprocessor 5 5.1 Preprocessor Directives . 7 5.2 Protecting Text from Conversion . 9 5.3 Embedding Roman Text within Devan¯agar¯ıText . 9 5.4 Breaking Pre-Defined Conjuncts . 9 5.5 Supported LATEX Commands . 9 5.6 Using Custom LATEX Commands . 10 6 Devan¯agar¯ıFonts 10 6.1 Bombay-Style Fonts . 11 6.2 Calcutta-Style Fonts . 11 6.3 Nepali-Style Fonts . 11 6.4 Devan¯agar¯ıPen Fonts . 11 6.5 Default Devan¯agar¯ıFont (LATEX Only) . 12 6.6 PostScript Type 1 . 12 7 Special Topics 12 7.1 Delimiter Scope . 12 7.2 Line Spacing . 13 7.3 Hyphenation . 13 7.4 Captions and Date Formats (LATEX only) . 13 7.5 Customizing the date and captions (LATEX only) . 14 7.6 Using dvnAgrF in Sections and References (LATEX only) . 15 7.7 Devan¯agar¯ıand Arabic Numerals . 15 7.8 Devan¯agar¯ıPage Numbers and Other Counters (LATEX only) . 15 1 7.9 Category Codes . 16 8 Using Devan¯agar¯ıin X E LATEXand luaLATEX 16 8.1 Using Hindi with Polyglossia . 17 9 Using Hindi with babel 18 9.1 Installation . 18 9.2 Usage . 18 9.3 Language attributes . 19 9.3.1 Attribute modernhindi .
    [Show full text]
  • The Unicode Standard, Version 4.0--Online Edition
    This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consor- tium and published by Addison-Wesley. The material has been modified slightly for this online edi- tion, however the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/standard/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten.
    [Show full text]