Punjabi Machine Transliteration Muhammad Ghulam Abbas Malik

Total Page:16

File Type:pdf, Size:1020Kb

Punjabi Machine Transliteration Muhammad Ghulam Abbas Malik Punjabi Machine Transliteration Muhammad Ghulam Abbas Malik To cite this version: Muhammad Ghulam Abbas Malik. Punjabi Machine Transliteration. 21st international Conference on Computational Linguistics (COLING) and the 44th Annual Meeting of the ACL, Jul 2006, Sydney, France. pp.1137-1144. hal-01002160 HAL Id: hal-01002160 https://hal.archives-ouvertes.fr/hal-01002160 Submitted on 15 Jan 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Punjabi Machine Transliteration M. G. Abbas Malik Department of Linguistics Denis Diderot, University of Paris 7 Paris, France [email protected] Transliteration refers to phonetic translation Abstract across two languages with different writing sys- tems (Knight & Graehl, 1998), such as Arabic to Machine Transliteration is to transcribe a English (Nasreen & Leah, 2003). Most prior word written in a script with approximate work has been done for Machine Translation phonetic equivalence in another lan- (MT) (Knight & Leah, 97; Paola & Sanjeev, guage. It is useful for machine transla- 2003; Knight & Stall, 1998) from English to tion, cross-lingual information retrieval, other major languages of the world like Arabic, multilingual text and speech processing. Chinese, etc. for cross-lingual information re- Punjabi Machine Transliteration (PMT) trieval (Pirkola et al, 2003), for the development is a special case of machine translitera- of multilingual resources (Yan et al, 2003; Kang tion and is a process of converting a word & Kim, 2000) and for the development of cross- from Shahmukhi (based on Arabic script) lingual applications. to Gurmukhi (derivation of Landa, PMT is a special kind of machine translitera- Shardha and Takri, old scripts of Indian tion. It converts a Shahmukhi word into a Gur- subcontinent), two scripts of Punjabi, ir- mukhi word irrespective of the type constraints respective of the type of word. of the word. It not only preserves the phonetics of the transliterated word but in contrast to usual The Punjabi Machine Transliteration transliteration, also preserves the meaning. System uses transliteration rules (charac- Two scripts are discussed and compared. ter mappings and dependency rules) for Based on this comparison and analysis, character transliteration of Shahmukhi words into mappings between Shahmukhi and Gurmukhi are Gurmukhi. The PMT system can translit- drawn and transliteration rules are discussed. erate every word written in Shahmukhi. Finally, architecture and process of the PMT sys- tem are discussed. When it is applied to Punjabi 1 Introduction Unicode encoded text especially designed for testing, the results were complied and analyzed. Punjabi is the mother tongue of more than 110 PMT system will provide basis for Cross- million people of Pakistan (66 million), India (44 Scriptural Information Retrieval (CSIR) and million) and many millions in America, Canada Cross-Scriptural Application Development and Europe. It has been written in two mutually (CSAD). incomprehensible scripts Shahmukhi and Gur- mukhi for centuries. Punjabis from Pakistan are 2 Punjabi Machine Transliteration unable to comprehend Punjabi written in Gur- mukhi and Punjabis from India are unable to According to Paola (2003), “When writing a for- comprehend Punjabi written in Shahmukhi. In eign name in one’s native language, one tries to contrast, they do not have any problem to under- preserve the way it sounds, i.e. one uses an or- stand the verbal expression of each other. Pun- thographic representation which, when read jabi Machine Transliteration (PMT) system is an aloud by the native speaker of the language, effort to bridge the written communication gap sounds as it would when spoken by a speaker of between the two scripts for the benefit of the mil- the foreign language – a process referred to as lions of Punjabis around the globe. Transliteration”. Usually, transliteration is re- ferred to phonetic translation of a word of some 1137 Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL, pages 1137–1144, Sydney, July 2006. c 2006 Association for Computational Linguistics specific type (proper nouns, technical terms, etc) of types of characters e.g. consonants, vowels, across languages with different writing systems. diacritical marks, etc. Native speakers may not understand the meaning of transliterated word. 4.1 Consonant Mapping PMT is a special type of Machine Translitera- Consonants can be further subdivided into two tion in which a word is transliterated across two groups: different writing systems used for the same lan- Aspirated Consonants: There are sixteen as- guage. It is independent of the type constraint of pirated consonants in Punjabi (Malik, 2005). Ten the word. It preserves both the phonetics as well of these aspirated consonants ( [ ], [ ], as the meaning of transliterated word. JJ bʰ JJ pʰ JJ[ṱʰ], JJ[ʈʰ], bY[ʤʰ], bb[ʧʰ], |e[ḓʰ], |e[ɖʰ], ÏÏ[kʰ], 3 Scripts of Punjabi ÏÏ[gʰ]) are very frequently used in Punjabi as 3.1 Shahmukhi compared to the remaining six aspirates (|g[rʰ], Shahmukhi derives its character set form the [ ], [ ], [ ], [ ], [ ]). In Arabic alphabet. It is a right-to-left script and the |h ɽʰ Ïà lʰ Jb mʰ JJ nʰ |z vʰ shape assumed by a character in a word is con- Shahmukhi, aspirated consonants are represented text sensitive, i.e. the shape of a character is dif- by the combination of a consonant (to be aspi- ferent depending whether the position of the rated) and HEH-DOACHASHMEE (|). For character is at the beginning, in the middle or at example [ ] + [ ] = [ ] and [ ] + [ ] the end of the word. Normally, it is written in [ b | h JJ bʰ ` ʤ | h Nastalique, a highly complex writing system that = b Y [ʤʰ]. is cursive and context-sensitive. A sentence illus- In Gurmukhi, each frequently used aspirated- trating Shahmukhi is given below: consonant is represented by a unique character. But, less frequent aspirated consonants are repre- ð ÌÌ6=P sented by the combination of a consonant (to be@~ –6ڻ >X}Z Ìáââ y6– ÌÐâ It has 49 consonants, 16 diacritical marks and aspirated) and sub-joined PAIREEN HAAHAA 16 vowels, etc. (Malik 2005) e.g. ਲ [l] + ◌੍ + ਹ [h] = ਲ (Ïà) [lʰ] and ਵ [v] + ◌੍ 3.2 Gurmukhi + ਹ [h] = ਵ (|z) [vʰ], where ◌੍ is the sub-joiner. Gurmukhi derives its character set from old scripts of the Indian Sub-continent i.e. Landa The sub-joiner character (◌੍) tells that the follow- (script of North West), Sharda (script of Kash- ing [h] is going to change the shape of mir) and Takri (script of western Himalaya). It is ਹ PAIREEN HAAHHA. a left-to-right syllabic script. A sentence illustrat- The mapping of ten frequently used aspirated ing Gurmukhi is given below: consonants is given in Table 1. ਪੰ ਜਾਬੀ ਮੇਰੀ ਮਾਣ ਜੋਗੀ ਮ ਬੋਲੀ ਏ. Sr. Shahmukhi Gurmukhi Sr. Shahmukhi Gurmukhi It has 38 consonants, 10 vowels characters, 9 1 [bʰ] 6 [ʧʰ] vowel symbols, 2 symbols for nasal sounds and 1 JJ ਭ bb ਛ symbol that duplicates the sound of a consonant. 2 JJ [pʰ] ਫ 7 |e [ḓʰ] ਧ (Bhatia 2003, Malik 2005) 3 JJ [ṱʰ] ਥ 8 |e [ɖʰ] ਢ 4 [ ] 9 [ ] 4 Analysis and PMT Rules JJ ʈʰ ਠ ÏÏ kʰ ਖ 5 bY [ʤʰ] ਝ 10 ÏÏ [gʰ] ਘ Punjabi is written in two completely different Table 1: Aspirated Consonants Mapping scripts. One script is right-to-left and the other is left-to-right. One is Arabic based cursive and the The mapping for the remaining six aspirates is other is syllabic. But both of them represent the covered under non-aspirated consonants. phonetic repository of Punjabi. These phonetic Non-Aspirated Consonants: In case of non- sounds are used to determine the relation be- aspirated consonants, Shahmukhi has more con- tween the characters of two scripts. On the basis sonants than Gurmukhi, which follows the one of this idea, character mappings are determined. symbol for one sound principle. On the other hand there are more then one characters for a For the analysis and comparison, both scripts single sound in Shahmukhi. For example, Seh are subdivided into different group on the basis 1138 (_), Seen (k) and Sad (m) represent [s] and [s] Hamza (Y) is a special character and always comes between two vowel sounds as a place has one equivalent in Gurmukhi i.e. Sassaa (ਸ). holder. For example, in õGõ6 W [ɑsɑɪʃ] (comfort), Similarly other characters like ਅ [a], ਤ [ṱ], ਹ [h] Hamza (Y) is separating two vowel sounds Alef (Z) and ਜ਼ [z] have multiple equivalents in Shah- and Zer ( ), in [ o] (come), Hamza ( ) is mukhi. Non-aspirated consonants mapping is G◌ zW ɑ Y given in Table 2. separating two vowel sounds Alef Madda (W) [ɑ] Sr. Shahmukhi Gurmukhi Sr. Shahmukhi Gurmukhi and Vav (z) [o], etc. In the first example õGõ6 W 1 [ [b] ਬ 21 o [ṱ] ਤ [ɑsɑɪʃ] (comfort), Hamza (Y) is separating two 2 \ [p] ਪ 22 p [z] ਜ਼ vowel sounds Alef (Z) and Zer (G◌), but normally 3 ] [ṱ] ਤ 23 q [ʔ] ਅ Zer ( ) is dropped by common people. So 4 ^ [ʈ] ਟ 24 r [ɤ] ਗ਼ G◌ Hamza ( ) is mapped on [ ] when it is followed 5 _ [s] ਸ 25 s [f] ਫ਼ Y ਇ ɪ 6 ` [ʤ] ਜ 26 t [q] by a consonant. In Gurmukhi, vowels are represented by ten 7 a [ʧ] ਚ 27 u [k] ਕ independent vowel characters (ਅ, ਆ, ਇ, ਈ, ਉ, 8 b [h] ਹ 28 v [g] ਗ , , , , ) and nine dependent vowel signs 9 c [x] ਖ਼ 29 w [l] ਲ ਊ ਏ ਐ ਓ ਔ e [ḓ] ਦ 30 wؕ [ɭ] ਲ਼ (◌ਾ, ਿ◌, ◌ੀ, ◌ੁ, ◌ੂ, ◌ੇ, ◌ੈ, ◌ੋ, ◌ੌ).
Recommended publications
  • Schwa Deletion: Investigating Improved Approach for Text-To-IPA System for Shiri Guru Granth Sahib
    ISSN (Online) 2278-1021 ISSN (Print) 2319-5940 International Journal of Advanced Research in Computer and Communication Engineering Vol. 4, Issue 4, April 2015 Schwa Deletion: Investigating Improved Approach for Text-to-IPA System for Shiri Guru Granth Sahib Sandeep Kaur1, Dr. Amitoj Singh2 Pursuing M.E, CSE, Chitkara University, India 1 Associate Director, CSE, Chitkara University , India 2 Abstract: Punjabi (Omniglot) is an interesting language for more than one reasons. This is the only living Indo- Europen language which is a fully tonal language. Punjabi language is an abugida writing system, with each consonant having an inherent vowel, SCHWA sound. This sound is modifiable using vowel symbols attached to consonant bearing the vowel. Shri Guru Granth Sahib is a voluminous text of 1430 pages with 511,874 words, 1,720,345 characters, and 28,534 lines and contains hymns of 36 composers written in twenty-two languages in Gurmukhi script (Lal). In addition to text being in form of hymns and coming from so many composers belonging to different languages, what makes the language of Shri Guru Granth Sahib even more different from contemporary Punjabi. The task of developing an accurate Letter-to-Sound system is made difficult due to two further reasons: 1. Punjabi being the only tonal language 2. Historical and Cultural circumstance/period of writings in terms of historical and religious nature of text and use of words from multiple languages and non-native phonemes. The handling of schwa deletion is of great concern for development of accurate/ near perfect system, the presented work intend to report the state-of-the-art in terms of schwa deletion for Indian languages, in general and for Gurmukhi Punjabi, in particular.
    [Show full text]
  • Shahmukhi to Gurmukhi Transliteration System: a Corpus Based Approach
    Shahmukhi to Gurmukhi Transliteration System: A Corpus based Approach Tejinder Singh Saini1 and Gurpreet Singh Lehal2 1 Advanced Centre for Technical Development of Punjabi Language, Literature & Culture, Punjabi University, Patiala 147 002, Punjab, India [email protected] http://www.advancedcentrepunjabi.org 2 Department of Computer Science, Punjabi University, Patiala 147 002, Punjab, India [email protected] Abstract. This research paper describes a corpus based transliteration system for Punjabi language. The existence of two scripts for Punjabi language has created a script barrier between the Punjabi literature written in India and in Pakistan. This research project has developed a new system for the first time of its kind for Shahmukhi script of Punjabi language. The proposed system for Shahmukhi to Gurmukhi transliteration has been implemented with various research techniques based on language corpus. The corpus analysis program has been run on both Shahmukhi and Gurmukhi corpora for generating statistical data for different types like character, word and n-gram frequencies. This statistical analysis is used in different phases of transliteration. Potentially, all members of the substantial Punjabi community will benefit vastly from this transliteration system. 1 Introduction One of the great challenges before Information Technology is to overcome language barriers dividing the mankind so that everyone can communicate with everyone else on the planet in real time. South Asia is one of those unique parts of the world where a single language is written in different scripts. This is the case, for example, with Punjabi language spoken by tens of millions of people but written in Indian East Punjab (20 million) in Gurmukhi script (a left to right script based on Devanagari) and in Pakistani West Punjab (80 million), written in Shahmukhi script (a right to left script based on Arabic), and by a growing number of Punjabis (2 million) in the EU and the US in the Roman script.
    [Show full text]
  • Contribution to the UN Secretary-General's 2018 Report
    COMMISSION ON SCIENCE AND TECHNOLOGY FOR DEVELOPMENT (CSTD) Twenty-second session Geneva, 13 to 17 May 2019 Submissions from entities in the United Nations system and elsewhere on their efforts in 2018 to implement the outcome of the WSIS Submission by Internet Corporation for Assigned Names and Numbers This submission was prepared as an input to the report of the UN Secretary-General on "Progress made in the implementation of and follow-up to the outcomes of the World Summit on the Information Society at the regional and international levels" (to the 22nd session of the CSTD), in response to the request by the Economic and Social Council, in its resolution 2006/46, to the UN Secretary-General to inform the Commission on Science and Technology for Development on the implementation of the outcomes of the WSIS as part of his annual reporting to the Commission. DISCLAIMER: The views presented here are the contributors' and do not necessarily reflect the views and position of the United Nations or the United Nations Conference on Trade and Development. 2018 ANNUAL REPORT TO UNCTAD: ICANN CONTRIBUTION Progress made in the implementation of and follow-up to the outcomes of the World Summit on the Information Society at the regional and international levels Executive Summary ICANN is pleased and honoured be invited to contribute to this annual UNCTAD Report. We value our involvement with, and contribution to, the overall WSIS process and to our relationship with the UN Commission on Science and Technology for Development (CSTD). 2018 has been a busy and important year for ICANN and for the Internet Governance Ecosystem in general; with the ITU Plenipotentiary taking place in Dubai and the IGF in Paris.
    [Show full text]
  • Malayalam Range: 0D00–0D7F
    Malayalam Range: 0D00–0D7F This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
    [Show full text]
  • Pronunciation
    PRONUNCIATION Guide to the Romanized version of quotations from the Guru Granth Saheb. A. Consonants Gurmukhi letter Roman Word in Roman Word in Gurmukhi Meaning Letter letters using the letters using the relevant letter relevant letter from from the second the first column column S s Sabh sB All H h Het ihq Affection K k Krodh kroD Anger K kh Khayl Kyl Play G g Guru gurU Teacher G gh Ghar Gr House | ng Ngyani / gyani i|AwnI / igAwnI Possessing divine knowledge C c Cor cor Thief C ch Chaata Cwqw Umbrella j j Jahaaj jhwj Ship J jh Jhaaroo JwVU Broom \ ny Sunyi su\I Quiet t t Tap t`p Jump T th Thag Tg Robber f d Dar fr Fear F dh Dholak Folk Drum x n Hun hux Now q t Tan qn Body Q th Thuk Quk Sputum d d Den idn Day D dh Dhan Dn Wealth n n Net inq Everyday p p Peta ipqw Father P f Fal Pl Fruit b b Ben ibn Without B bh Bhagat Bgq Saint m m Man mn Mind X y Yam Xm Messenger of death r r Roti rotI Bread l l Loha lohw Iron v v Vasai vsY Dwell V r Koora kUVw Rubbish (n) in brackets, and (g) in brackets after the consonant 'n' both indicate a nasalised sound - Eg. 'Tu(n)' meaning 'you'; 'saibhan(g)' meaning 'by himself'. All consonants in Punjabi / Gurmukhi are sounded - Eg. 'pai-r' meaning 'foot' where the final 'r' is sounded. 3 Copyright Material: Gurmukh Singh of Raub, Pahang, Malaysia B.
    [Show full text]
  • The Unicode Standard, Version 4.0--Online Edition
    This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consor- tium and published by Addison-Wesley. The material has been modified slightly for this online edi- tion, however the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/standard/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten.
    [Show full text]
  • A Message from Afar: Fact Sheet 3 (PDF)
    3 What’s the Message? Here’s your signal The detection of a signal from another world would be a most remarkable moment in human history. However, if we detect such a signal, is it just a beacon from their technology, without any content, or does it contain information or even a message? Does it resemble sound, or is it like interstellar e-mail? Can we ever understand such a message? This appears to be a tremendous challenge, given that we still have many scripts from our own antiquity that remain undeciphered, despite many serious attempts, over hundreds of years. – And we know far more about humanity than about extra-terrestrial intelligence… We are facing all the complexities involved in understanding and glimpsing the intellect of the author, while the world’s expectations demand immediacy of information. So, where do we begin? Structure and language Information stands out from randomness, it is based on structures. The problem goal we face, after we detect a signal, is to first separate out those information-carrying signals from other phenomena, without being able to engage in a dialogue, and then to learn something about the structure of their content in the passing. This means that we need a suitable filter. We need a way of separating out the interesting stuff; we need a language detector. While identifying the location of origin of a candidate signal can rule out human making, its content could involve a vast array of possible structures, some of which may be beyond our knowledge or imagination; however, for identifying intelligence that shares any pattern with our way of processing and transmitting information, the collective knowledge and examples here on Earth are a good starting point.
    [Show full text]
  • An Introduction to Indic Scripts
    An Introduction to Indic Scripts Richard Ishida W3C [email protected] HTML version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.html PDF version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.pdf Introduction This paper provides an introduction to the major Indic scripts used on the Indian mainland. Those addressed in this paper include specifically Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. I have used XHTML encoded in UTF-8 for the base version of this paper. Most of the XHTML file can be viewed if you are running Windows XP with all associated Indic font and rendering support, and the Arial Unicode MS font. For examples that require complex rendering in scripts not yet supported by this configuration, such as Bengali, Oriya, and Malayalam, I have used non- Unicode fonts supplied with Gamma's Unitype. To view all fonts as intended without the above you can view the PDF file whose URL is given above. Although the Indic scripts are often described as similar, there is a large amount of variation at the detailed implementation level. To provide a detailed account of how each Indic script implements particular features on a letter by letter basis would require too much time and space for the task at hand. Nevertheless, despite the detail variations, the basic mechanisms are to a large extent the same, and at the general level there is a great deal of similarity between these scripts. It is certainly possible to structure a discussion of the relevant features along the same lines for each of the scripts in the set.
    [Show full text]
  • Data Issues in English-To-Hindi Machine Translation
    Data Issues in English-to-Hindi Machine Translation Ondřej Bojar, Pavel Straňák, Daniel Zeman Univerzita Karlova v Praze, Ústav formální a aplikované lingvistiky Malostranské náměstí 25, CZ-11800 Praha {bojar|stranak|zeman}@ufal.mff.cuni.cz http://ufal.mff.cuni.cz/umc/ Abstract Statistical machine translation to morphologically richer languages is a challenging task and more so if the source A dataset originally collected for the DARPA-TIDES surprise- and target languages differ in word order. Current state-of-the art MT systems thus deliver mediocre results. language contest in 2002, later refined at IIIT Hyderabad and Adding more parallel data often helps improve the results; if it does not, it may be caused by various problems such provided for the NLP Tools Contest at ICON 2008. Corpus Sentences En Tokens Hi Tokens as different domains, bad alignment or noise in the new data. We evaluate several available parallel data sources Tides.train 50,000 1,226,144 1,312,435 A journalist Daniel Pipes' website (http://www.danielpipes.org/) and provide cross-evaluation results on their combinations using two freely available statistical MT systems. We Tides.dev 1,000 22,485 24,363 demonstrate various problems encountered in the data and describe automatic methods of data cleaning and limited-domain articles about the Middle East. Written in English, Tides.test 1,000 27,169 28,574 normalization. We also show that the contents of two independently distributed data sets can unexpectedly overlap, many of them translated to up to 25 other languages. which negatively affects translation quality. Together with the error analysis, we also present a new tool for viewing Daniel Pipes 6,761 176,392 122,108 Monolingual, parallel and annotated corpora for fourteen South Emille 3,501 55,660 71,010 aligned corpora, which makes it easier to detect difficult parts in the data even for a developer not speaking the Asian languages (including Hindi) and English.
    [Show full text]
  • The Ramayana by R.K. Narayan
    Table of Contents About the Author Title Page Copyright Page Introduction Dedication Chapter 1 - RAMA’S INITIATION Chapter 2 - THE WEDDING Chapter 3 - TWO PROMISES REVIVED Chapter 4 - ENCOUNTERS IN EXILE Chapter 5 - THE GRAND TORMENTOR Chapter 6 - VALI Chapter 7 - WHEN THE RAINS CEASE Chapter 8 - MEMENTO FROM RAMA Chapter 9 - RAVANA IN COUNCIL Chapter 10 - ACROSS THE OCEAN Chapter 11 - THE SIEGE OF LANKA Chapter 12 - RAMA AND RAVANA IN BATTLE Chapter 13 - INTERLUDE Chapter 14 - THE CORONATION Epilogue Glossary THE RAMAYANA R. K. NARAYAN was born on October 10, 1906, in Madras, South India, and educated there and at Maharaja’s College in Mysore. His first novel, Swami and Friends (1935), and its successor, The Bachelor of Arts (1937), are both set in the fictional territory of Malgudi, of which John Updike wrote, “Few writers since Dickens can match the effect of colorful teeming that Narayan’s fictional city of Malgudi conveys; its population is as sharply chiseled as a temple frieze, and as endless, with always, one feels, more characters round the corner.” Narayan wrote many more novels set in Malgudi, including The English Teacher (1945), The Financial Expert (1952), and The Guide (1958), which won him the Sahitya Akademi (India’s National Academy of Letters) Award, his country’s highest honor. His collections of short fiction include A Horse and Two Goats, Malgudi Days, and Under the Banyan Tree. Graham Greene, Narayan’s friend and literary champion, said, “He has offered me a second home. Without him I could never have known what it is like to be Indian.” Narayan’s fiction earned him comparisons to the work of writers including Anton Chekhov, William Faulkner, O.
    [Show full text]
  • Designing Devanagari Type
    Designing Devanagari type The effect of technological restrictions on current practice Kinnat Sóley Lydon BA degree final project Iceland Academy of the Arts Department of Design and Architecture Designing Devanagari type: The effect of technological restrictions on current practice Kinnat Sóley Lydon Final project for a BA degree in graphic design Advisor: Gunnar Vilhjálmsson Graphic design Department of Design and Architecture December 2015 This thesis is a 6 ECTS final project for a Bachelor of Arts degree in graphic design. No part of this thesis may be reproduced in any form without the express consent of the author. Abstract This thesis explores the current process of designing typefaces for Devanagari, a script used to write several languages in India and Nepal. The typographical needs of the script have been insufficiently met through history and many Devanagari typefaces are poorly designed. As the various printing technologies available through the centuries have had drastic effects on the design of Devanagari, the thesis begins with an exploration of the printing history of the script. Through this exploration it is possible to understand which design elements constitute the script, and which ones are simply legacies of older technologies. Following the historic overview, the character set and unique behavior of the script is introduced. The typographical anatomy is analyzed, while pointing out specific design elements of the script. Although recent years has seen a rise of interest on the subject of Devanagari type design, literature on the topic remains sparse. This thesis references books and articles from a wide scope, relying heavily on the works of Fiona Ross and her extensive research on non-Latin typography.
    [Show full text]
  • On Continent and Script-Wise Divisions-Based Statistical Measures for Stop-Words Lists of International Languages Jatinderkumar R
    Available online at www.sciencedirect.com ScienceDirect Procedia Computer Science 89 ( 2016 ) 313 – 319 Twelfth International Multi-Conference on Information Processing-2016 (IMCIP-2016) On Continent and Script-Wise Divisions-Based Statistical Measures for Stop-Words Lists of International Languages Jatinderkumar R. Sainia,c,∗ and Rajnish M. Rakholiab,c aNarmada College of Computer Application, Bharuch 392 011, Gujarat, India bS. S. Agrawal Institute of Computer Science, Navsari 396 445, Gujarat, India cR K University, Rajkot, Gujarat, India Abstract The data for the current research work was collected for 42 different International languages encompassing 3 continents viz. Asia, Europe and South America. The data comprised of unigram model representation of lexicons in the stop-words lists. 13 scripting systems comprising Arabic, Armenian, Bengali, Chinese, Cyrillic, Devanagari, Greek, Gurmukhi, Hanja & Hangul, Kana, Kanji, Marathi, Roman (Latin) and Thai were considered. Based on a comprehensive analysis of statistical measures for Stop-words lists, it has been concluded that Asian languages are mostly self-scripted and that the average number of stop-words in Asian languages is more than those in European languages. In addition to various important and other first research results, a very important inference from the current research work is that the average number of stop-words for any given language could be predicted to be 200. © 2016 The Authors. Published by Elsevier B.V. © 2016 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (Peer-reviewhttp://creativecommons.org/licenses/by-nc-nd/4.0/ under responsibility of organizing). committee of the Twelfth International Multi-Conference on Information PeerProcessing-2016-review under (IMCIP-2016).responsibility of organizing committee of the Organizing Committee of IMCIP-2016 Keywords: Function-Words; Language; Natural Language Processing (NLP); Script; Stop-Words.
    [Show full text]