Multilingual Fonts for Arabic Script Jamil Khan* Department of Computer Science, University of Peshawar, Pakistan
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Inpage Urdu Manual
http://www.axiscomputers.com [email protected] Concept SOFTWARE .. . . (version) . 1 Horizontal . CharacterScript Letters & VowelsLigature . . Windows 95, 3.11 Full stop 5.0 (Decorative Work) (Spot Color Separation) . . 2 . . 3 Spell Check Kerning ME NT 4 6 .......................................................................................... ......................................................................................... .................................................................................. ..................................................................................... .............................................................................................. ......................................................................................... ................................................................................................. ..................................................................................... ................................................................................ ................................................................................. .......................................................................................... .................................................................................................. .............................................................................................. ....................................................................................................... -
ARTICLE Development of a Gold-Standard Pashto Dataset and a Segmentation App Yan Han and Marek Rychlik
ARTICLE Development of a Gold-standard Pashto Dataset and a Segmentation App Yan Han and Marek Rychlik ABSTRACT The article aims to introduce a gold-standard Pashto dataset and a segmentation app. The Pashto dataset consists of 300 line images and corresponding Pashto text from three selected books. A line image is simply an image consisting of one text line from a scanned page. To our knowledge, this is one of the first open access datasets which directly maps line images to their corresponding text in the Pashto language. We also introduce the development of a segmentation app using textbox expanding algorithms, a different approach to OCR segmentation. The authors discuss the steps to build a Pashto dataset and develop our unique approach to segmentation. The article starts with the nature of the Pashto alphabet and its unique diacritics which require special considerations for segmentation. Needs for datasets and a few available Pashto datasets are reviewed. Criteria of selection of data sources are discussed and three books were selected by our language specialist from the Afghan Digital Repository. The authors review previous segmentation methods and introduce a new approach to segmentation for Pashto content. The segmentation app and results are discussed to show readers how to adjust variables for different books. Our unique segmentation approach uses an expanding textbox method which performs very well given the nature of the Pashto scripts. The app can also be used for Persian and other languages using the Arabic writing system. The dataset can be used for OCR training, OCR testing, and machine learning applications related to content in Pashto. -
New Language Resources for the Pashto Language
New language resources for the Pashto language Djamel Mostefa 1 , Khalid Choukri 1 , Sylvie Brunessaux 2 , Karim Boudahmane 3 1 Evaluation and Language resources Distribution Agency, France 2 CASSIDIAN, France 3 Direction Générale de l'Armement, France E-mail: [email protected], [email protected], [email protected], [email protected] Abstract This paper reports on the development of new language resources for the Pashto language, a very low-resource language spoken in Afghanistan and Pakistan. In the scope of a multilingual data collection project, three large corpora are collected for Pashto. Firstly a monolingual text corpus of 100 million words is produced. Secondly a 100 hours speech database is recorded and manually transcribed. Finally a bilingual Pashto-French parallel corpus of around 2 million is produced by translating Pashto texts into French. These resources will be used to develop Human Language Technology systems for Pashto with a special focus on Machine Translation. Keywords: Pashto, low-resource language, speech corpus, monolingual and multilingual text corpora, web crawling. other one being Dari) and one regional language in 1. Introduction Pakistan. There are very few corpora and Human Language The code assigned to the language by the ISO 639-3 Technology (HLT) services available for Pashto. No standard is [pus]. language resources for Pashto can be found in the According to the Ethnologue.com website, it is spoken by catalogues of LDC1 and ELRA2. around 20 million people and three main dialects are to be Pashto is a very low-resource language. Google doesn't considered: support Pashto in its search engine or translation services. -
Developing an Arabic Typography Course for Visual Communication Design
Developing an Arabic Typography course for Visual Communication Design Students in the Middle East and North African Region A thesis submitted to the School of Visual Communication Design, College of Communication and Information of Kent State University in partial fulfillment of the requirements for the degree of Master of Fine Arts by Basma Almusallam May, 2014 Thesis written by Basma Almusallam B.F.A, Kuwait University, 2008 M.F.A, Kent State University, 2014 Approved by ___________________________ Jillian Coorey, M.F.A., Advisor ___________________________ AnnMarie LeBlanc, M.F.A., Director, School of Visual Communication Design ___________________________ Stanley T. Wearden, Ph.D., Dean, College of Communication and Information Table of Contents TABLE OF CONTENTS………………………………………………………………...... iii LIST OF FIGURES……………………………………………………………………….. v PREFACE………………………………………………………………………………..... vi CHAPTER I. INTRODUCTION…………………………………………………………. 1 The Current Issue………………………………………………….. 1 Core Objectives……………………………………………………. 3 II. THE HISTORY OF THE ARABIC WRITING SYSTEM, CALLIGRAPHY AND TYPOGRAPHY………………………………………....………….. 4 The Arabic Writing System……………………………………….. 4 Arabic Calligraphy………………………………………………… 5 The Undocumented Art of Arabic Calligraphy……………….…… 6 The Shift Towards Typography and the Digital Era………………. 7 The Pressing Issue of the Present………………………………….. 8 A NOTE ON THE PROCESS…………………………………………………………….. 10 Applying a Framework for Research Documentation…………….. 11 Mental Model……………………………………………………… 12 Proposed User Testing……………………………………………. -
Language Culture Type 2.Indd
Arabic script and typography is alphabetical; the direction of writing is from right to left; within a word, most letters form connected groups. AOne expects an alphabet to consist of a few dozen letters repre- senting one unique sound each. The Arabic alphabet evolved somewhat away from this ideal: although most letters correspond to a sound, a few letters are ambivalent between two or more sounds. Some letters don’t represent sound at all: they have only a grammatical function. For modern office use there are 28 basic letters, eight of them only differentiated from other letters by diacritics, and six optional letters for representing vowels. Older spellings made less use of diacritics for dif- ferentiating; on the other hand, to facilitate Qur’an recitation, additional vowel signs occur, along with elaborate cantillation marks. To acknowl- edge slight variations of the received text, some Qur’an editions have ad- ditional diacritics, discretely adding or eliminating consonant letters. As the Arabic script evolved into a connected script, it developed an elaborate system of assimilations and dissimilations between adjacent letters. Outside a small group of connoisseurs and calligraphers who study the principles established by the Ottoman letter artists, surpris- ingly little is understood of the efficiency and subtlety of this system,¹ and modern industrial type designs follow the approach found in el- ementary Western teaching materials.² There, beginners in Arabic script are given a maximally simplified scheme. While simplification is totally sound from a pedagogical perspective, it provides too narrow a basis for the development of professional typography. The Arabic script stems from the same source as the Latin, Greek, and Hebrew writing systems: Phoenician (figure 1).³ The underlying proto- alphabet had some two dozen characters; there were no vowels. -
MICROSOFT OEM *** ( (Special Prices) Taxes RTP(Rs.) SP (Rs.) Support : [email protected] / 1800-111100 / Airtel: 18001021100 / +91 80 40103000
SOFTMART SOLUTIONS - RESELLER PRICE LIST - September 8, 2021 The rates on the pricelist are valid only for the date of the pricelist. Please reconfirm / request for the current or extended validity prices by email : [email protected] & [email protected] before quoting to your customers or before placing any orders to us. PRICELIST CODE : PL-20210908 SAC CODE OF *SOFTWARE LICENCE : 997331 SAC CODE OF *SOFTWARE SUBSCRIPTION : 998434 HSN CODE OF SOFTWARE WITH MEDIA: 85238020 Customer Licence Information Form required for all licence orders / Deal Registration requests : www.softmartonline.com/LForm.doc and www.softmartonline.com/DealReg.doc The following details are mandatory for processing for ESD Purchase Orders : Name or organisation / Complete Physical Location Address / City / State / PINCODE / Contact Person Full Name / Contact Person Company Email-id / Tel. Number / Copy of Enduser PO required. ALL LICENCE ORDERS ARE NON-CANCELLABLE WITHOUT EXCEPTION. When you send us any enquiry for software not listed, to enable us to reply asap, please include the maximum details possible. Category of Enduser (Government / Educational / Charity/ NGO / Commercial), Software Name & Software Website, Operating System (Win 2012 / 16 / 19 Server, Win 10 / 8 / 7 Pro / Home / Linux / Macintosh), Number of users / devices and the objective which the customer plans to use the software. Enduser details are required to get ensure partner transfer rates. ESD : Electronic Software Delivery (Software to be downloaded from weblink / website ). It is mandatory -
Designing Software to Meet the Needs of International Users
DESIGNING SOFTWARE TO MEET THE NEEDS OF INTERNATIONAL USERS A Project Report Presented to The Faculty of the Department of Anthropology San José State University In Partial Fulfillment of the Requirements for the Degree Master of Arts by David Mohr December 2013 © 2013 David Mohr ALL RIGHTS RESERVED SAN JOSÉ STATE UNIVERSITY The Undersigned Graduate Committee Approves the Project Report Titled DESIGNING SOFTWARE TO MEET THE NEEDS OF INTERNATIONAL USERS by David Mohr APPROVED FOR THE DEPARTMENT OF ANTHROPOLOGY _____________________________________________________________________________ Dr. Chuck Darrah, Department of Anthropology Date _____________________________________________________________________________ Dr. Roberto Gonzalez, Department of Anthropology Date _____________________________________________________________________________ Professor Awad Awad, Middle Eastern Studies, Department of Humanities, Date Arabic Coordinator, Department of World Languages and Literatures Abstract Most software applications are designed and built in North America, catering to English- speaking, North American users. Ethnography is commonly used to ensure the product addresses the needs of domestic customers, but there is no analogous research with regard to foreign consumers. As a result, often times the features in these applications fail to sufficiently address the needs, expectations, and workflow of non-English-speaking users. Although software engineering manuals minutely detail the technical aspects of bringing an English application to -
Pashto Alphabets
LEARNING PASHTO Intensive Elementary & Secondary Pashto for Military and other Professionals by Dawood Azami Visiting Scholar Email: [email protected] The Middle East Studies Center (MESC) The Ohio State University, Columbus August 2009 1 Aims of the Course: *To provide a thorough introductory course in basic Pashto with the accent on practical spoken Pashto, coverage of grammar, familiarity with Pashto pronunciation, and essential vocabulary. *Ability to communicate within a range of situations and to handle simple survival situations (e.g. finding lodging, food, transportation etc.) *Ability to read the simple Pashto texts dealing with a variety of social and basic needs. In addition to author’s own command and expertise, a number of sources (books, both published and unpublished, journals, websites, etc.) have been consulted while preparing this material. Word of thanks: The author would like to thank Dr. Alam Payind, Director, Middle East Studies Center (MESC), and Melinda McClimans, Assistant Director, MESC. Their cooperation and assistance certainly made my stay in Columbus easier and enjoyable. Copy Right: This course material is for teaching of Pashto language at The Ohio State University. The author holds the copy right for any other use. The author intends to publish a modified version of the material as a book in the future. Please contact the author for more information. (Email: [email protected] ) 2 Contents I. Pashto Alphabet .................................................................................................................. 5 A. Pashto Sounds Similar to English ............................................................................... 7 B. Pashto Sounds Different from English ........................................................................ 7 C. Two letters pronounced differently in major Pashto dialects: ...................................... 8 D. Arabic Letters/ Sounds in Pashto .................................................................................. 8 II. -
Naila Habib Khan
LIGATURE RECOGNITION SYSTEM FOR PRINTED URDU SCRIPT USING GENETIC ALGORITHM BASED HIERARCHICAL CLUSTERING A Thesis Submitted to the Faculty of the Institute of Management Sciences, Peshawar in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY COMPUTER SCIENCE By NAILA HABIB KHAN DEPARTMENT OF COMPUTER SCIENCE INSTITUTE OF MANAGEMENT SCIENCES PESHAWAR, PAKISTAN SESSION 2014-2017 This is to certify that the research work presented in this thesis entitled “Ligature Recognition System for Printed Urdu Script Using Genetic Algorithm Based Hierarchical Clustering” was conducted by Naila Habib Khan under the supervision of Dr. Awais Adnan, Institute of Management Sciences, Peshawar, Pakistan. No part of this thesis has been submitted anywhere else for any other degree. This thesis is submitted to the Institute of Management Sciences, Peshawar in partial fulfilment of the requirements for the degree of Doctor of Philosophy in the field of Computer Science. Student Name: Naila Habib Khan Signature: ___________________________ Examination Committee: a) External Foreign Examiner 1: Dr. Yue Cao School of Computing and Communications, Lancaster University, UK Signature: ___________________________ b) External Foreign Examiner 2: Prof. Dr. Ibrahim A. Hameed Deputy Head of Research and Innovation Department of ICT and Natural Sciences, Norwegian University of Science and Technology, UK Signature: ___________________________ c) External Local Examiner: Dr. Saeeda Naz Head of Department/ Assistant Professor Govt. Girls Postgraduate College, Abbotabad, Pakistan Signature: ___________________________ ii d) Internal Local Examiner: Dr. Imran Ahmed Mughal Assistant Professor Institute of Management Sciences, Peshawar, Pakistan Signature: ___________________________ Supervisor: Dr. Awais Adnan Assistant Professor Institute of Management Sciences, Peshawar, Pakistan Signature: ___________________________ Director: Dr. -
Desktop Publishing 45, Anurag Nagar, Behind Press Complex, Indore
B.Com 1st Year (Plain) Subject- Desktop Publishing SYLLABUS Class – B.Com. I Year Subject – Desktop Publishing UNIT – I Importance and Advantages of DTP, DTP Software and Hardware, Commercial DTP Packages, Page Layout programs, Introduction to Word Processing, Commercial DTP Packages, Difference between DTP Software and word Processing. UNIT – II Types of Graphics, Uses of Computer Graphics Introduction to Graphics Programs, Font and Typeface, Types of Fonts, Creation of Fonts (Photographer), Anatomy of Typefaces, Printers, Types of Printers used in DTP, Plotter, Scanner. UNIT – III History and Versions of PageMaker, Creating a New Page, Document Setup Dialog Box, Paper size, Page Orientation, Margins, Different Methods of Placing text and graphics in a document, master Page, Story Editor, Formatting of Text, Indent, Leading, Hyphenation, Spelling Check, Creating Index, Text Wrap, Position (Superscript/Subscript), Control Palette. UNIT – IV History of Multimedia Elements, Text, Images, Sound, Animation and Video, Text, concept of Plain Text and Formatted Text, RTF& HTML Text, Image, Importance of Graphics in Multimedia, Image Capturing Methods, Scanner, Digital Camera, Sound0 Sound and its effect in Multimedia, Analog and Digital sound, Animation, Basics, Principles and use of Animation, video, Basics of Video, Analog and Digital Video. UNIT – V Features Of Multimedia, Overview of Multimedia, Multimedia Software Tools, Multimedia Authoring- Production and Presentation, Graphics File Formats, MIDI-Overviews, Concepts, Structure of MIDI, MIDI Devices, MIDI Messages. 45, Anurag Nagar, Behind Press Complex, Indore (M.P.) Ph.: 4262100, www.rccmindore.com 1 B.Com 1st Year (Plain) Subject- Desktop Publishing UNIT I 1.1 Introduction to Desktop Publishing Desktop Publishing (DTP) is the creation of electronic forms of information documents using page layout skills on a personal computer primarily for print. -
Medieval Hebrew Texts and European River Names Ephraim Nissan London [email protected]
ONOMÀSTICA 5 (2019): 187–203 | RECEPCIÓ 8.3.2019 | ACCEPTACIÓ 18.9.2019 Medieval Hebrew texts and European river names Ephraim Nissan London [email protected] Abstract: The first section of theBook of Yosippon (tenth-century Italy) maps the Table of Nations (Genesis 10) onto contemporary peoples and places and this text, replete with tantalizing onomastics, also includes many European river names. An extract can be found in Elijah Capsali’s chronicle of the Ottomans 1517. The Yosippon also includes a myth of Italic antiquities and mentions a mysterious Foce Magna, apparently an estuarine city located in the region of Ostia. The article also examines an onomastically rich passage from the medieval travelogue of Benjamin of Tudela, and the association he makes between the river Gihon (a name otherwise known in relation to the Earthly Paradise or Jerusalem) and the Gurganin or the Georgians, a people from the Caspian Sea. The river Gihon is apparently what Edmund Spenser intended by Guyon in his Faerie Queene. The problems of relating the Hebrew spellings of European river names to their pronunciation are illustrated in the case of the river Rhine. Key words: river names (of the Seine, Loire, Rhine, Danube, Volga, Dnieper, Po, Ticino, Tiber, Arno, Era, Gihon, Guyon), Kiev, medieval Hebrew texts, Book of Yosippon, Table of Nations (Genesis 10), historia gentium, mythical Foce Magna city, Benjamin of Tudela, Elijah Capsali, Edmund Spenser Textos hebreus medievals i noms de rius europeus Resum: L’inici del Llibre de Yossippon (Itàlia, segle X) relaciona la «taula de les nacions» de Gènesi 10 amb pobles i llocs contemporanis, i aquest text, ple de propostes onomàstiques temptadores, també inclou noms fluvials europeus. -
Pashto Five Reasons Why You Should Pashto Learn More About Pashtuns سالم
SOME USEFUL PHRASES IN PASHTO FIVE REASONS WHY YOU SHOULD PASHTO LEARN MORE ABOUT PASHTUNS سﻻم. زما نوم جان دئ. [saˈlɔːm zmɔː num ʤɔːn dǝɪ] Uzbeks are the most numerous Turkic /salām. zmā num jān dǝy./ AND THEIR LANGUAGE Hi. My name is John. 1. Pashto is spoken as a first or second language by people in Central Asia. They predomi- over 40 million people worldwide, but the highest nantly mostly live in Uzbekistan, a land- population of speakers are located in Afghanistan ستاسو نوم څه دئ؟ [ˈstɔːso num ʦǝ dǝɪ] and Pakistan, with smaller populations in other locked country of Central Asia that shares /stāso num ʦǝ dǝy?/ Central Asian and Middle Eastern countries such borders with Kazakhstan to the west and What is your name? as Tajikistan and Iran. north, Kyrgyzstan and Tajikistan to the ,A member of the Indo-Iranian language family .2 تاسې څنګه یاست؟ زه ښه یم، مننه. [ˈʦǝŋgɒ jɔːst zǝ ʂɒ jǝm mɒˈnǝnɒ] Pashto shares many structural similarities to east, and Afghanistan and Turkmenistan /ʦǝnga yāst? zǝ ṣ̆ a yǝm, manǝna./ languages such as Dari, Farsi, and Tajiki. to the south. Many Uzbeks can also be How are you? I’m fine, thanks. 3. Because of US involvement with Afghanistan over found in Afghanistan, Kazakhstan, Kyr- the past decade, those who study Pashto can find careers in a variety of fields including translation gyzstan, Tajikistan, Turkmenistan and the ستاسو دلیدو څخه خوشحاله شوم. [ˈstɔːso dǝˈliːdo ˈʦǝχa χoʃˈhɔːla ʃwǝm] and interpreting, consulting, foreign service and Xinjiang Uyghur Autonomous Region of /stāso dǝ-lido ʦǝxa xoš-hāla šwǝm./ intelligence, journalism, and many others.