Ancient Geez Script Recognition Using Deep Convolutional Neural Network

Total Page:16

File Type:pdf, Size:1020Kb

Ancient Geez Script Recognition Using Deep Convolutional Neural Network FITEHALEW ANCIENT GEEZ SCRIPT RECOGNITION USING DEEP CONVOLUTIONAL NEURAL NETWORK DEMILEW ASHAGRIE ASHAGRIE A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF APPLIED SCIENCES OF NEAR EAST UNIVERSITY ANCIENT GEEZ SCRIPT RECOGNITION USINGANCIENT GEEZ DEEP CONVOLUTIONAL 2019 NETWORK DEEP NEURAL By FITEHALEW ASHAGRIE DEMILEW In Partial Fulfillment of the Requirements for the Degree of Master of Sciences in Software Engineering NEU NICOSIA, 2019 ANCIENT GEEZ SCRIPT RECOGNITION USING DEEP CONVOLUTIONAL NEURAL NETWORK A THESIS SUBMITTED TO THE GRADUATE SCHOOL OF APPLIED SCIENCES OF NEAR EAST UNIVERSITY By FITEHALEW ASHAGRIE DEMILEW In Partial Fulfillment of the Requirements for the Degree of Master of Sciences in Software Engineering NICOSIA, 2019 Fitehalew Ashagrie DEMILEW: ANCIENT GEEZ SCRIPT RECOGNITION USING DEEP CONVOLUTIONAL NEURAL NETWORK Approval of Director of Graduate School of Applied Sciences Prof.Dr.Nadire CAVUS We certify this thesis is satisfactory for the award of the degree of Master of Sciences in Software Engineering Examine committee in charge: Assoc. Prof. Dr. Kamil DİMİLİLER Department of Automotive Engineering, NEU Assoc. Prof. Dr. Yöney KIRSAL EVER Department of Software Engineering, NEU Assist. Prof. Dr. Boran ŞEKEROĞLU Supervisor, Department of Information Systems Engineering, NEU I hereby declare that all information in this document has been obtained and presented in accordance with academic rules and ethical conduct. I also declare that, as required by these rules and conduct, I have fully cited and referenced all material and results that are not original of this work. Name, Last name: Fitehalew Ashagrie Demilew Signature: Date: ACKNOWLEDGMENT My deepest gratitude is to my advisor, Assist. Prof. Dr. Boran ŞEKEROĞLU, for his encouragement, guidance, support, and enthusiasm with his knowledge. I am indeed grateful for his continuous support of finalizing this project and for his patience, motivation, and immense knowledge. His guidance helped me in all the time of research and writing this paper without him, everything would not be possible. Then, I would like to thank my mother Tirngo Assefaw, my father Ashagrie Demilew, my brother Baharu Ashagrie, and the rest of my family for their support, encouragement, and ideas. Without you, everything would have been impossible and difficult for me. ii Dedicated to my mother Tirngo Assefaw... ABSTRACT In this research paper, we have presented the design and development of an optical character recognition system for ancient handwritten Geez documents. Geez, alphabet contains 26 base characters and more than 150 derived characters which are an extension of the base alphabets. These derived characters are formed by adding different kinds of strokes on the base characters. However, this paper focusses on the 26 base characters and 3 of the punctuation marks of Geez alphabets. The proposed character recognition system compromises all of the necessary steps that are required for developing an efficient recognition system. The designed system includes processes like preprocessing, segmentation, feature extraction, and classification. The preprocessing stage includes steps such as grayscale conversion, noise reduction, binarization, and skew correction. Many languages require word segmentation however, the Geez language doesn’t require it so, the segmentation stage encompasses only line and character segmentation. Among the different character classification techniques, this paper presents a deep convolutional neural network approach. A deep CNN is used for feature extraction and character classification purposes. Furthermore, we have prepared a dataset containing a total of 22,913 characters in which 70% (16,038) was used for training, 20% (4,583) used for testing, and 10% (2,292) was used for validation purpose. A total of 208 pages were collected from EOTC and other places in order to prepare the dataset. For the case of proving the proposed model is effective and efficient also, we have designed a deep neural network architecture with 3 different hidden layers. Both of the designed models were trained with the same training dataset and their results show that the deep CNN model is better in every case. The deep neural model with 2 hidden layers achieves an accuracy of 98.128 with a model loss of 0.095 however, the proposed deep CNN model obtained an accuracy of 99.389% with a model loss of 0.044. Thus, the deep CNN architecture results with a better recognition accuracy for ancient Geez document recognition. Keywords: Ancient Document Recognition; Geez Document Recognition; Ethiopic Document Recognition; Deep Convolutional Neural Network iii ÖZET Bu araştırma makalesinde, eski el yazısı Geez belgeleri için bir optik karakter tanıma sisteminin tasarımını ve geliştirilmesini sunduk. Tanrım, alfabe 26 baz karakter ve baz alfabelerin bir uzantısı olan 150'den fazla türetilmiş karakter içeriyor. Bu türetilmiş karakterler, temel karakterlere farklı tür vuruşlar eklenerek oluşturulur. Bununla birlikte, bu makale 26 temel karaktere ve Geez alfabelerinin noktalama işaretlerinden 3'üne odaklanmaktadır. Önerilen karakter tanıma sistemi, verimli bir tanıma sistemi geliştirmek için gerekli olan tüm gerekli adımları yerine getirir. Tasarlanan sistem ön işleme, segmentasyon, özellik çıkarma ve sınıflandırma gibi işlemleri içerir. Ön işleme aşaması, gri tonlamalı dönüştürme, gürültü azaltma, ikilileştirme ve eğri düzeltme gibi adımları içerir. Birçok dil kelime bölümlendirmesini gerektirir, ancak Geez dili bunu gerektirmez, bölümleme aşaması sadece çizgi ve karakter bölümlendirmesini kapsar. Farklı karakter sınıflandırma teknikleri arasında, bu makale derin bir evrişimsel sinir ağı yaklaşımı sunmaktadır. Özellik çıkarma ve karakter sınıflandırma amacıyla derin bir CNN kullanılır. Ayrıca, eğitim için% 70 (16,038), test için% 20 (4,583) ve validasyon amacıyla% 10 (2,292) kullanılmış toplam 22,913 karakter içeren bir veri seti hazırladık. Veri setini hazırlamak için EOTC ve diğer yerlerden toplam 208 sayfa toplanmıştır. Önerilen modelin etkili ve verimli olduğunu kanıtlamak için, 3 farklı katmanı olan derin bir sinir ağı mimarisi tasarladık. Tasarlanan modellerin her ikisi de aynı eğitim veri seti ile eğitildi ve sonuçları, derin CNN modelinin her durumda daha iyi olduğunu gösteriyor. 2 gizli katmanı olan derin sinir modeli, 0.095 model kaybıyla 98.128 kesinliğe ulaşır, ancak önerilen derin CNN modeli 0.044 model kaybıyla% 99.389 hassasiyet elde etti. Böylece, derin CNN mimarisi, eski Geez belge tanıma için daha iyi bir tanıma doğruluğu ile sonuçlanır. Anahtar Kelimeler: Eski Belge Tanıma; Geez Belge Tanıma; Etiyopik Belge Tanıma; Derin Konvolüsyonlu Sinir Ağı iv TABLE OF CONTENTS ACKNOWLEDGMENT ........................................................................................................... ii ABSTRACT .............................................................................................................................. iii ÖZET ......................................................................................................................................... iv TABLE OF CONTENTS .......................................................................................................... v LIST OF TABLES .................................................................................................................... x LIST OF FIGURES ................................................................................................................. xi LIST OF ABBREVIATIONS ................................................................................................. xii CHAPTER 1 : INTRODUCTION 1.1. Background...................................................................................................................... 1 1.2. Overview of Artificial Neural Network .......................................................................... 3 1.2.1. Supervised learning .................................................................................................. 4 1.2.2. Unsupervised learning .............................................................................................. 4 1.2.3. Reinforcement learning ............................................................................................ 5 1.3. Statement of the Problem ................................................................................................ 5 1.4. The Significance of the Study ......................................................................................... 7 1.5. Objectives of the Study ................................................................................................... 8 1.5.1. General objective ...................................................................................................... 8 1.5.2. Specific objectives .................................................................................................... 8 1.6. Methodology.................................................................................................................... 8 1.6.1. Literature review ...................................................................................................... 9 1.6.2. Data acquisition techniques ...................................................................................... 9 1.6.3. Preprocessing methods ............................................................................................. 9 1.6.4. System modeling and implementation ....................................................................
Recommended publications
  • The Original Pronunciation of Sanskrit Devanàgará – Transliteration – IPA-Symbols
    The Original Pronunciation of Sanskrit DevanÀgarÁ – Transliteration – IPA-Symbols $ a n$D À DØ i L Á LØ u X A Â XØ ? Ã `C C Å `CØ CØ Æ O C # e HØ #H ai DeØ, $DH o RØ $D( au DØe8 N{ k N R kh N+ J g J gh J+ Ç 1 F c WeÝ ' ch WeÝ+ M j GeÛ + jh GeÛ+ _ È × T Ê Þ 4 Êh Þ+ I Ë É ) Ëh É+ > É Ö W t W Z th W+ G{ d G [ dh G+ Q n Q S p S { ph S+ E b E bh E+ P m P \ y M U{ r `O l O Y v Y9 ] Ì Ý Í V s V K h Ù Drafted by Maciej Zieba and Ulrich Stiehl under the auspices of Manfred Mayrhofer. ! Ï [ Improvements by Jost Gippert, Madhav Deshpande, Sunder Hattangadi, John Smith and others. This chart is a compromise, since the original pronunciation of Sanskrit Î YDULRXV is not exactly known in every detail. – 06/09/2002/us. Notes: 1. $ seems to have been pronounced originally as [n] or as [], possibly never as [D]. Not even the pronunciation of this most often used letter $ is exactly known! 2. The original pronunciation of is not known. It occurs only in the verb dS(kÆp). 3. U{seems to have been pronounced as [`] or as [], and likewise the liquid seems to have been pronounced as syllabic [`C] (notation: [`]+ [ C]) or as syllabic [C]. 4. The pronunciation of the semi-vowel Y seems to have been either [Y] or [9].
    [Show full text]
  • Sanskrit Alphabet
    Sounds Sanskrit Alphabet with sounds with other letters: eg's: Vowels: a* aa kaa short and long ◌ к I ii ◌ ◌ к kii u uu ◌ ◌ к kuu r also shows as a small backwards hook ri* rri* on top when it preceeds a letter (rpa) and a ◌ ◌ down/left bar when comes after (kra) lri lree ◌ ◌ к klri e ai ◌ ◌ к ke o au* ◌ ◌ к kau am: ah ◌ं ◌ः कः kah Consonants: к ka х kha ga gha na Ê ca cha ja jha* na ta tha Ú da dha na* ta tha Ú da dha na pa pha º ba bha ma Semivowels: ya ra la* va Sibilants: sa ш sa sa ha ksa** (**Compound Consonant. See next page) *Modern/ Hindi Versions a Other ऋ r ॠ rr La, Laa (retro) औ au aum (stylized) ◌ silences the vowel, eg: к kam झ jha Numero: ण na (retro) १ ५ ॰ la 1 2 3 4 5 6 7 8 9 0 @ Davidya.ca Page 1 Sounds Numero: 0 1 2 3 4 5 6 7 8 910 १॰ ॰ १ २ ३ ४ ६ ७ varient: ५ ८ (shoonya eka- dva- tri- catúr- pancha- sás- saptán- astá- návan- dásan- = empty) works like our Arabic numbers @ Davidya.ca Compound Consanants: When 2 or more consonants are together, they blend into a compound letter. The 12 most common: jna/ tra ttagya dya ddhya ksa kta kra hma hna hva examples: for a whole chart, see: http://www.omniglot.com/writing/devanagari_conjuncts.php that page includes a download link but note the site uses the modern form Page 2 Alphabet Devanagari Alphabet : к х Ê Ú Ú º ш @ Davidya.ca Page 3 Pronounce Vowels T pronounce Consonants pronounce Semivowels pronounce 1 a g Another 17 к ka v Kit 42 ya p Yoga 2 aa g fAther 18 х kha v blocKHead
    [Show full text]
  • Inventory of Romanization Tools
    Inventory of Romanization Tools Standards Intellectual Management Office Library and Archives Canad Ottawa 2006 Inventory of Romanization Tools page 1 Language Script Romanization system for an English Romanization system for a French Alternate Romanization system catalogue catalogue Amharic Ethiopic ALA-LC 1997 BGN/PCGN 1967 UNGEGN 1967 (I/17). http://www.eki.ee/wgrs/rom1_am.pdf Arabic Arabic ALA-LC 1997 ISO 233:1984.Transliteration of Arabic BGN/PCGN 1956 characters into Latin characters NLC COPIES: BS 4280:1968. Transliteration of Arabic characters NL Stacks - TA368 I58 fol. no. 00233 1984 E DMG 1936 NL Stacks - TA368 I58 fol. no. DIN-31635, 1982 00233 1984 E - Copy 2 I.G.N. System 1973 (also called Variant B of the Amended Beirut System) ISO 233-2:1993. Transliteration of Arabic characters into Latin characters -- Part 2: Lebanon national system 1963 Arabic language -- Simplified transliteration Morocco national system 1932 Royal Jordanian Geographic Centre (RJGC) System Survey of Egypt System (SES) UNGEGN 1972 (II/8). http://www.eki.ee/wgrs/rom1_ar.pdf Update, April 2004: http://www.eki.ee/wgrs/ung22str.pdf Armenian Armenian ALA-LC 1997 ISO 9985:1996. Transliteration of BGN/PCGN 1981 Armenian characters into Latin characters Hübschmann-Meillet. Assamese Bengali ALA-LC 1997 ISO 15919:2001. Transliteration of Hunterian System Devanagari and related Indic scripts into Latin characters UNGEGN 1977 (III/12). http://www.eki.ee/wgrs/rom1_as.pdf 14/08/2006 Inventory of Romanization Tools page 2 Language Script Romanization system for an English Romanization system for a French Alternate Romanization system catalogue catalogue Azerbaijani Arabic, Cyrillic ALA-LC 1997 ISO 233:1984.Transliteration of Arabic characters into Latin characters.
    [Show full text]
  • Pali (In Various Scripts) Romanization Table
    Pali (in various scripts) Notes 1. Only the vowel forms that appear at the beginning of a syllable are listed; the forms used for vowels following a consonant can be found in grammars; no distinction between the two is made in transliteration. 2. The vowel a is implicit after all consonants and consonant clusters and is supplied in romanization, except when another vowel is indicated by its appropriate sign. 3. Exception: Niggahīta and saññaka combinations representing nasals are romanized by ṅ before gutturals, ñ before palatals, ṇ before cerebrals, n before dentals, and m before labials. 4. In Bengali script, ba and va are not differentiated. The romanization should follow the value of the consonant in the particular passage, ascertainable by checking the same passage as printed in other scripts. Romanization Bengali Burmese Devanagari Sinhalese Thai Vowels (see Note 1) a অ အ अ අ อ, อ ั ā আ အာ आ ආ อา i ই ဣ इ ඉ อ ิ ī ঈ ဤ ई ඊ อ ี u উ ဥ उ උ อ ุ ū ঊ ဦ ऊ ඌ อ ู e এ ဧ ए ඒ เอ o ও ဪ ओ ඔ โอ Consonants (see Note 2) Gutturals ka ক က क ක ก kha খ ခ ख ඛ ข ga গ ဂ ग ග ค gha ঘ ဃ घ ඝ ฆ ṅa ঙ င ङ ඞ ง Palatals ca চ စ च ච จ Romanization Bengali Burmese Devanagari Sinhalese Thai cha ছ ဆ छ ඡ ฉ ja জ ဇ ज ජ ช jha ঝ ဈ झ ඣ ฌ ña ঞ ည ञ ඤ , ญ Cerebrals ṭa ট ဋ ट ට ฏ ṭha ঠ ဌ ठ ඨ ฐ, ḍa ড ဍ ड ඩ ฑ ḍha ঢ ဎ ढ ඪ ฒ ṇa ণ ဏ ण ණ ณ Dentals ta ত တ त ත ต tha থ ထ थ ථ ถ da দ ဒ द ද ท dha ধ ဓ ध ධ ธ na ন န न න น Labials (see Note 4) pa প ပ प ප ป pha ফ ဖ फ ඵ ผ ba ব ဗ ब බ พ bha ভ ဘ भ භ ภ ma ম မ म ම ม Semivowels (see Note 4) ya য ယ य ය ย ra র ရ र ර ร la ল လ ल ල ล ḷa ဠ ळ ළ ฬ va ব ဝ व ව ว Sibilant sa স သ स ස ส Aspirate ha হ ဟ ह හ ห Romanization Bengali Burmese Devanagari Sinhalese Thai Niggahīta (see Note 3) Visagga ṃ ◌ः ḥ Romanization Khmer Lao Tua Tham/A Tua Tham/B Northern Thai Vowels (Independent) (see Note 1) a អ ອ - ā ◌ា ອາ - i ឥ ອ ິ ᩍ ī ឦ ອ ີ ᩎ u ឧ ອຸ ᩏ ū ឪ, ឩ ອູ ᩐ e ឯ ເອ ᩑ o ឲ, ឱ ໂອ - Vowels (Dependent) (see Note 1) a ◌ ◌ ◌ᩢ ā ◌ា ◌າ ◌ᩣ i ◌ិ ◌ິ ◌.
    [Show full text]
  • Brahmi-Net: a Transliteration and Script Conversion System for Languages of the Indian Subcontinent
    Brahmi-Net: A transliteration and script conversion system for languages of the Indian subcontinent Anoop Kunchukuttan ∗ Ratish Puduppully ∗y Pushpak Bhattacharyya IIT Bombay IIIT Hyderabad IIT Bombay [email protected] ratish.surendran [email protected] @research.iiit.ac.in Abstract one of the oldest writing systems of the Indian sub- continent which can be dated to at least the 3rd cen- We present Brahmi-Net - an online system for tury B.C.E. In addition, Arabic-derived and Roman transliteration and script conversion for all ma- scripts are also used for some languages. Given the jor Indian language pairs (306 pairs). The sys- diversity of languages and scripts, transliteration and tem covers 13 Indo-Aryan languages, 4 Dra- script conversion are extremely important to enable vidian languages and English. For training effective communication. the transliteration systems, we mined paral- The goal of script conversion is to represent the lel transliteration corpora from parallel trans- lation corpora using an unsupervised method source script accurately in the target script, without and trained statistical transliteration systems loss of phonetic information. It is useful for exactly using the mined corpora. Languages which reading manuscripts, signboards, etc. It can serve do not have parallel corpora are supported as a useful tool for linguists, NLP researchers, etc. by transliteration through a bridge language. whose research is multilingual in nature. Script con- Our script conversion system supports con- version enables reading text written in foreign scripts version between all Brahmi-derived scripts as accurately in a user's native script. On the other well as ITRANS romanization scheme.
    [Show full text]
  • Proposal to Encode the Kaithi Script in ISO/IEC 10646
    Proposal to Encode the Kaithi Script in ISO/IEC 10646 Anshuman Pandey University of Michigan Ann Arbor, Michigan, U.S.A. [email protected] December 13, 2007 Contents Proposal Summary Form i 1 Introduction 1 1.1 Description ..................................... ...... 1 1.2 Justification for Encoding . ........... 1 1.3 Acknowledgments ................................. ...... 2 1.4 ProposalHistory ................................. ....... 2 2 Characters Proposed 3 2.1 CharactersNotProposed . ......... 4 2.2 Basis for Character Shapes . .......... 4 3 Technical Features 8 3.1 Name ............................................ 8 3.2 Classification ................................... ....... 8 3.3 Allocation...................................... ...... 8 3.4 EncodingModel................................... ...... 8 3.5 CharacterProperties. .......... 8 3.6 Collation ....................................... ..... 11 3.7 Typology of Characters . ......... 11 4 Background 13 4.1 Origins ......................................... 13 4.2 Name ............................................ 13 4.3 Definitions...................................... ...... 13 4.4 Languages Written in the Script . ........... 14 4.5 Standardization and Growth . .......... 16 4.6 Decline ......................................... 16 4.7 Usage ........................................... 17 5 Orthography 23 5.1 Distinguishing Features . ........... 23 5.2 Vowels.......................................... 23 5.3 Consonants ...................................... ..... 23 5.4 Nasalization...................................
    [Show full text]
  • Initial Literacy in Devanagari: What Matters to Learners
    Initial literacy in Devanagari: What Matters to Learners Renu Gupta University of Aizu 1. Introduction Heritage language instruction is on the rise at universities in the US, presenting a new set of challenges for language pedagogy. Since heritage language learners have some experience of the target language from their home environment, language instruction cannot be modeled on foreign language instruction (Kondo-Brown, 2003). At the same time, there may be significant gaps in the learner’s competence; in describing heritage learners of South Asian languages, Moag (1996) notes that their acquisition of the native language has often atrophied at an early stage. For learners of South Asian languages, the writing system presents an additional hurdle. Moag (1996) states that heritage learners have more problems than their American counterparts in learning the script; the heritage learner “typically takes much more time to master the script, and persists in having problems with both reading and writing far longer than his or her American counterpart” (page 170). Moag attributes this to the different purposes for which learners study the language, arguing that non-native speakers rapidly learn the script since they study the language for professional purposes, whereas heritage learners study it for personal reasons. Although many heritage languages, such as Japanese, Mandarin, and the South Asian languages, use writing systems that differ from English, the difficulties of learning a second writing system have received little attention. At the elementary school level, Perez (2004) has summarized differences in writing systems and rhetorical structures, while Sassoon (1995) has documented the problems of school children learning English as a second script; in addition, Cook and Bassetti (2005) offer research studies on the acquisition of some writing systems.
    [Show full text]
  • Magahi and Magadh: Language and People
    G.J.I.S.S.,Vol.3(2):52-59 (March-April, 2014) ISSN: 2319-8834 MAGAHI AND MAGADH: LANGUAGE AND PEOPLE Lata Atreya1 , Smriti Singh2, & Rajesh Kumar3 1Department of Humanities and Social Sciences, Indian Institute of Technology Patna, Bihar, India 2Department of Humanities and Social Sciences, Indian Institute of Technology Patna, Bihar, India 3Department of Humanities and Social Sciences, Indian Institute of Technology Madras, Tamil Nadu, India Abstract Magahi is an Indo-Aryan language, spoken in Eastern part of India. It is genealogically related to Magadhi Apbhransha, once having the status of rajbhasha, during the reign of Emperor Ashoka. The paper outlines the Magahi language in historical context along with its present status. The paper is also a small endeavor to capture the history of Magadh. The paper discusses that once a history of Magadh constituted the history of India. The paper also attempts to discuss the people and culture of present Magadh. Keywords: Magahi, Magadhi Apbhransha, Emperor Ashoka. 1 Introduction The history of ancient India is predominated by the history of Magadh. Magadh was once an empire which expanded almost till present day Indian peninsula excluding Southern India. Presently the name ‘Magadh’ is confined to Magadh pramandal of Bihar state of India. The prominent language spoken in Magadh pramandal and its neighboring areas is Magahi. This paper talks about Magahi as a language, its history, geography, script and its classification. The paper is also a small endeavor towards the study of the history of ancient Magadh. The association of history of Magadh with the history and culture of ancient India is outlined.
    [Show full text]
  • The Brahmi Family of Scripts and Hangul: Alphabets Or Syllabaries? PUB DATE 92 NOTE 25P.; In: Linn, Mary Sarah, Ed
    DOCUMENT RESUME ED 351 852 FL 020 606 AUTHOR Wilhelm, Christopher TITLE The Brahmi Family of Scripts and Hangul: Alphabets or Syllabaries? PUB DATE 92 NOTE 25p.; In: Linn, Mary Sarah, Ed. and Oliverio, Giulia, R. M., Ed. Kansas Working Papers in Linguistics, Volume 17, Numbers 1 and 2; see FL 020 603. PUB TYPE Reports Research/Technical (143) EDRS PRICE MFO1 /PCO1 Plus Postage. DESCRIPTORS *Alphabets; Foreign Countries; Language Classification; *Written Language IDENTIFIERS Asia (Southeast); *Brahmi Script; *Hangul Script; Korea; Syllabaries ABSTRACT A great deal of disagreement exists as to whether the writing systems of the Brahmi family ol7 scripts from Southeast Asia and the Indian subcontinent and the Hangul script,..+f Korea should be classified as alphabets or syllabaries. Both have in common a mixture of syllabic and alphabetic characteristics that has spawned vigorous disagreement among scholars as to their classification. An analysis of the alphabetic and syllabic aspects of the two writing systems is presented and it is demonstrated that each exhibits significant characteristics of both types of writing systems. It is concluded that neither label entirely does either writing system justice. (JL) *********************************************************************** * Reproductions supplied by EDRS are the best that can be made from the original document. *********************************************************************** U S. DEPARTMENT OF EDUCATION tyke dl EduCat,Onal Research andImprovement EDUCATIONAL RESOURCESINFORMATION
    [Show full text]
  • Changing the Northern Khmer Orthography
    Changing the Northern Khmer orthography by Dorothy M. Thomas © 2018 SIL International® All rights reserved. Fair-Use Policy: Publications in SIL's Notes on Literacy series are intended for personal and educational use. You may make copies of these publications for instructional purposes free of charge (within fair-use guidelines) and without further permission. Republication or commercial use of SIL's Notes on Literacy is expressly prohibited without the written consent of the copyright holder. Originally published as: Thomas, Dorothy M. 1989. "Changing the Northern Khmer orthography." Notes on Literacy 57:47–59. See also: Thomas, D. 1989a (in Bibliography (Literacy)) [Topics: Asia: Thailand: Khmer, Northern, orthography, scripts: non-Roman] 1. Introduction Unless you have tangled with the Devanagari scripts of South and Southeast Asia, it is hard to imagine how complicated it is to write still another language based on one of the daughter scripts, such as Cambodian and Thai. But a reasonable orthography using Thai letters had been worked out for the Northern Khmer (N.K.) language by William Smalley and John Ellison and it had been used, with some adaptations, for about 15 years. However, there are so many problems and varieties of possible solutions, plus interference from the Cambodian script for Standard Khmer, that some objections had been raised recently by influential people. The hope was to get a consensus from the people themselves and to stir up interest in writing this dialect of Khmer, with a minimum of conflict with the Thai writing system, yet efficient for N.K. On April 23–24, 1987, the Institute of Language and Culture for Rural Development at Mahidol University, Bangkok, on the invitation of David and Dorothy Thomas, SIL members and instructors at the Institute, sponsored a conference in Surin Province, the heart of the N.K.
    [Show full text]
  • Itf Devanagari Font Free Download Itf Devanagari Font Free Download
    itf devanagari font free download Itf devanagari font free download. Welcome to DevanagariFonts, The largest and unique site that is completely dedicated to providing you easy and free download of Devanagari fonts. With more than 283 free Devanagari (Hindi, Nepali, Marathi, Sanskrit. ) fonts, you can download almost any Devanagari font you want. You are in the right place for Hindi fonts free download, and download of any other Devanagari fonts. If you are looking for a Devanagari Font to use in your website, programs, etc we are confident that you can get exactly what you are looking for right here. We are continuously increasing their number by searching for new fonts and providing them for free download. Introducing Fontshare, a free fonts service making high-quality fonts accessible to all. The Indian Type Foundry (ITF) is delighted to announce the launch of Fontshare—a brand new free fonts service—making high-quality fonts accessible to all. We are on a mission to ensure everyone has the opportunity to use beautiful fonts in their designs for free. Fonts for free for good. For a number of years, ITF has been offering students, design institutions and other not-for-profit organisations all of its commercial fonts for free. Between 2014-2016, it released several open-source fonts, including the incredibly popular Poppins (which is used on millions of websites across the globe), Rajdhani, Khand, Hind and Teko, to name just a few. It has also invested a large proportion of its resources in developing fonts for languages that are rare or on the brink of extinction, even though this is not commercially profitable.
    [Show full text]
  • Multilingual Ethiopia: Linguistic Challenges and Capacity Building Efforts Oslo Studies in Language General Editors: Atle Grønn and Dag Haug
    Oslo Studies in Language 8 (1) / 2016 Binyam Sisay Mendisu & Janne Bondi Johannessen (eds.) Multilingual Ethiopia: Linguistic Challenges and Capacity Building Efforts Oslo Studies in Language General editors: Atle Grønn and Dag Haug Editorial board International: Henning Andersen, Los Angeles (historical linguistics) Östen Dahl, Stockholm (typology) Arnim von Stechow, Tübingen (semantics and syntax) National: Johanna Barðdal, Bergen (construction grammar) Laura Janda, Tromsø (Slavic linguistics, cognitive linguistics) Terje Lohndal, Trondheim (English, syntax and semantics) Torgrim Solstad, Trondheim (German, semantics and pragmatics) Øystein Vangsnes, Tromsø (Norwegian, dialect syntax) Local: Cecilia Alvstad, ILOS (Spanish, translatology) Hans Olav Enger, ILN (Norwegian, cognitive linguistics) Ruth E. Vatvedt Fjeld, ILN (Norwegian, lexicography) Jan Terje Faarlund, CSMN, ILN (Norwegian, syntax) Cathrine Fabricius-Hansen, ILOS (German, contrastive linguistics) Carsten Hansen, CSMN, IFIKK (philosophy of language) Christoph Harbsmeier, IKOS (Chinese, lexicography) Hilde Hasselgård, ILOS (English, corpus linguistics) Hans Petter Helland, ILOS (French, syntax) Janne Bondi Johannessen, ILN, Text Laboratory (Norwegian, language technology) Kristian Emil Kristoffersen, ILN (cognitive linguistics) Helge Lødrup, ILN (syntax) Gunvor Mejdell, IKOS (Arabic, sociolinguistics) Christine Meklenborg Salvesen, ILOS (French linguistics, historical linguistics) Diana Santos, ILOS (Portuguese linguistics, computational linguistics) Ljiljana Saric, ILOS (Slavic
    [Show full text]