Survey of Arabic Checker Techniques

Total Page:16

File Type:pdf, Size:1020Kb

Survey of Arabic Checker Techniques SUST Journal of Engineering and Computer Sciences (JECS), Vol. 21, No. 1, 2020 Survey of Arabic Checker Techniques Ahmed Abdalrhman Saty1, Karim Bouzoubaa2, Aouragh Si Lhoussain3 1 Sudan University of Science and Technology, Sudan 2,3 Mohammed V University in Rabat, Morocco [email protected] Received:28/09/2019 Accepted:06/11/2019 ABSTRACT- It is known that the importance of spell checking, which increases with the expanding of technologies, using the Internet and the local dialects, in addition to non-awareness of linguistic language. So, this importance increases with the Arabic language, which has many complexities and specificities that differ from other languages. This paper explains these specificities and presents the existing works based on techniques categories that are used, as well as explores these techniques. Besides, it gives directions for future work. Keywords: spell checking, rule-based, morphology, n-gram, radix-search tree, levenshtein distance, jaro- winkler distance المستخلص - من المعمهم أىمية التدقيق اﻹمﻻئي، والتي تزداد مع تهسع التقنيات، استخدام اﻻنترنت والميجات المحمية، اضافة إلى عدم اﻹلمام بقهاعد المغة. وبالتالي تزداد أىميتو أكثر مع المغة العربية ندبة ﻷنيا تحتهي عمى بعض التعقيدات والخصائص التي ت ميزىا عن المغات اﻻخرى. ىذه الهرقة تدتعرض بعض خصائص المغة العربية، كما تقهم بعرض اﻷعمال المهجهدة في ىذا المجال بناء عمى التقنيات المدتخدمة ومن ثم شرحيا. عﻻوة الى ذلك تعطي بعض اﻻتجاىات لﻷعمال المدتقبمية. INTRODUCTION specificity of many linguistic phenomenon that With the increased usage of computers and increase the probability of user mistakes such as smart devices in the processing of various multiplicity of local dialects and the non- languages, comes the need for correcting errors awareness of Arabic linguistic rules. introduced at different stages. Texts of any An Arabic spell checker behaves exactly the language can be generated from different same as an English one. For example, for the alzalad), the checker detects it as an) "الزلد" sources either by humans as document typing text and emailing software, or by machine such as erroneous word and suggests a list of close ,azzad) ”الزاد ,الولد ,الصلد ,الزبد ,الزلط“ optical character recognition (OCR) and words such as machine translation (MT). These produced texts alwalad, assalad, azzabad, azzlad). may have typing mistakes that need to be spell In the context of Arabic spell checking, many checked and corrected. Spell checking approaches and methods have been studied. constitutes one of the major areas in the field of Multiple systems with different designs already Natural Languages Processing (NLP) and has exist. Some of them exploit dictionaries while been the subject of different research studies others use morphological analysis [2]. A lot of since 1960 [1]. Spell checking mainly consists of them use also similarity among words [3, 4], and a verifying that some typed words are not few use the context [5] or mix between these accepted in the used language and suggests a list techniques [6, 7]. of close words to the erroneous word. This paper surveys the existing Arabic spell Accordingly, numerous approaches have been checkers with broad coverage of their explored to correct spelling errors in texts using advantages and disadvantages and consequently NLP tools and resources. sheds light on new opportunities in order to Some languages, such as English, developed improve these existing works. The remainder of advanced detection and spell checking systems. the paper is organized as follows. Section 2 For the case of Arabic, such systems are double- explains the specificities of the Arabic language needed with the rapid growth of the Arabic needed in the context of spell checking. Section digital content and users (it is reported that, for 3 introduces the classification of errors, explains 2017, 43.8% of the whole Arabic populations the meaning of datasets with their types are Internet users [37]) and because of the alongside used techniques in existing works, and 34 SUST Journal of Engineering and Computer Sciences (JECS), Vol. 21, No. 1, 2020 presents our notices on these works. Finally, we dialect-pronunciation instead of the original conclude the paper and make propositions about letter. new ideas for improving the existing works for According to the above Arabic language future work. specificities (alphabet and lexicon), the origins Arabic Language Uniqueness of typing errors include: Nowadays more than 400 million people in the 1. Changing of shape according to the position Middle East and North Africa speak the Arabic of letters in the word. language [38]. Arabic is also used as a religious 2. Similarity of pronunciation and shape of language in the Islamic Word. Therefore, it is some letters (see Figure1). learned by various levels of proficiency, as a 3. The use of dialects. venerated, liturgical language[8] by many Muslims mainly in Asia (e.g., Pakistan, Malaysia, China) and Africa (e.g., Senegal) [39]. Arabic has its own alphabet, its own lexicon, its own morphology, and its syntactic rules. The alphabet is 28 letters and the morphology is very rich. As a Semitic language, Arabic is derivative Figure1: Arabic Keyboard and has a flexible syntax allowing, for example, [9, 11] both verbal and nominal sentences . Whatever the origins are, to assist Arabic users However, it is the alphabet and lexicon during their typing, many Arabic spell-checking specificities of the Arabic language that have a approaches and systems have been developed. In direct impact of producing errors while typing the next section, we review these works. and that necessitate the use of spell checking Arabic Spell-checking Issues systems. In this section, we review the most important The alphabet is written from right to left in a works about Arabic spell checking. Firstly, we cursive style. Some letters have specific rules for introduce the classification of errors. Secondly, writing leading their shapes depending on their we explain the meaning of datasets used in the context of spell checking with their different ”ك“ position in the word. For example, the letter is written at the beginning and middle of the kinds. Thirdly, we focus on the most used word different than the last. The hamza letter is approaches and techniques which are in turn written with 5 possible shapes depending on its divided into five categories, which are: rule- position in the word as well as the diacritic of based approach, distance similarity techniques, the previous and the next letter. These rules are techniques exploiting morphological analysis, not accurately known by all users causing them techniques relying on phonetics and finally to make typing errors. hybrid ones, which combined more than one of In addition, some letters have very close the existing techniques. Classification of errors .("ض" ,”د“) pronunciations such as Consequently, people who are typing on There are two main types of typing errors; the computers are interchangeably using one letter first one is isolated-words and the second one is instead of the other thinking that is the right way context-sensitive [12, 13]. The first type detects to spell the corresponding word. that the misspelling words do not exist in the On the other hand, and due to its derivative lexicon. To deal with this type, the researchers aspect, the Arabic lexicon is extremely large introduced many analyzing and statistical studies. (about 12 million entries). However, Arabic Based on the user knowledge, the misspelling citizens do not make use of this entire lexicon errors can be divided into three categories[11]: the and have over the history "squeezed-minimized" first one is the typographic error; where the user the Arabic language and use a reduced lexicon is familiar with how to write the word but (s)he that does not exceed almost 20,000 entries. makes the typing error. The second category is Nowadays, from region to region Arabic citizens the cognitive error; where the user is not familiar use their own dialects but with a change in the with how to write the words, due to the non- pronunciation of some letters. For instance, awareness of the Arabic rules. The third category is the phonetic error that takes place (“ق”) Egyptians replace the pronunciation of with when there is a replacement of letters, due to the (“ق“) while Sudanese replace (“أ”) with In these cases, proximity of the sound. In general, the typing .(“ز“) with (“ذ“) and replace (“غ”) users type the word using the letter with its errors (approximately 80%) occur due to one of 35 SUST Journal of Engineering and Computer Sciences (JECS), Vol. 21, No. 1, 2020 the following reasons [14]: letter insertion, letter approach is considerably used to handle deletion, letter substitution and transposition of common spelling and typographic errors. For two adjacent letters. example, authors of [19] proposed a system that The second type is context-sensitive and is also has a mechanism for automatic correction of called real-words error. In this type, the written common errors in Arabic based on rules such as word is correct but its position is incorrect and the dealing of hamza errors since there is a taa ,”ذ“ and zah”د“ leads to a wrong meaning [5, 15, 16]. Few Arabic confusion between the dah The mentioned errors .”ي“ and yaa ”ة“ spell-checking researchers addressed real-word marbuta errors. Among these works, the researchers of [5] are treated by applying regular expressions and proposed a spell checker with a large corpus word replacement list. Moreover, the works of [7, collected from three topics (sport, health, and 20, 21] captured also various kinds of common economics), as well as 28 confusion sets, that errors. It is also noted that the use of rule-based were collected from commonly confused words. techniques gives more satisfactory corrections. Later on, the authors of [16] proposed a system In addition, the rule-based approach can also be that deals with context errors by applying n- used to rank the candidate words by aggregating gram and machine learning instead of predefined the probabilities of applied rules [12].
Recommended publications
  • DIGITAL TYPOGRAPHY USING LATEX Springer New York Berlin Heidelberg Hong Kong London Milan Paris Tokyo Apostolos Syropoulos Antonis Tsolomitis Nick Sofroniou
    DIGITAL TYPOGRAPHY USING LATEX Springer New York Berlin Heidelberg Hong Kong London Milan Paris Tokyo Apostolos Syropoulos Antonis Tsolomitis Nick Sofroniou DIGITAL TYPOGRAPHY USING LATEX With 68 Illustrations Apostolos Syropoulos Antonis Tsolomitis 366, 28th October St. Dept. of Mathematics GR-671 00 Xanthi University of the Aegean GREECE GR-832 00 Karlobasi, Samos [email protected] GREECE [email protected] Nick Sofroniou Educational Research Centre St. Patrick’s College Drumcondra, Dublin 9 IRELAND [email protected] Library of Congress Cataloging-in-Publication Data Syropoulos, Apostolos. Digital typography using LaTeX / Apostolos Syropoulos, Antonis Tsolomitis, Nick Sofroniou. p. cm. Includes bibliographical references and indexes. ISBN 0-387-95217-9 (acid-free paper) 1. LaTeX (Computer file) 2. Computerized typesetting. I. Tsolomitis, Antonis. II. Sofroniou, Nick. III. Title. Z253.4.L38 S97 2002 686.2´2544—dc21 2002070557 ACM Computing Classification (1998): H.5.2, I.7.2, I.7.4, K.8.1 ISBN 0-387-95217-9 (alk. paper) Printed on acid-free paper. Printed on acid-free paper. © 2003 Springer-Verlag New York, Inc. All rights reserved. This work may not be translated or copied in whole or in part without the written permission of the publisher (Springer-Verlag New York, Inc., 175 Fifth Avenue, New York, NY 10010, USA), except for brief excerpts in connection with reviews or scholarly analysis. Use in connection with any form of information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed is forbidden. The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are not identified as such, is not to be taken as an expression of opinion as to whether they are subject to proprietary rights.
    [Show full text]
  • Arabic Alphabet - Wikipedia, the Free Encyclopedia Arabic Alphabet from Wikipedia, the Free Encyclopedia
    2/14/13 Arabic alphabet - Wikipedia, the free encyclopedia Arabic alphabet From Wikipedia, the free encyclopedia َأﺑْ َﺠ ِﺪﯾﱠﺔ َﻋ َﺮﺑِﯿﱠﺔ :The Arabic alphabet (Arabic ’abjadiyyah ‘arabiyyah) or Arabic abjad is Arabic abjad the Arabic script as it is codified for writing the Arabic language. It is written from right to left, in a cursive style, and includes 28 letters. Because letters usually[1] stand for consonants, it is classified as an abjad. Type Abjad Languages Arabic Time 400 to the present period Parent Proto-Sinaitic systems Phoenician Aramaic Syriac Nabataean Arabic abjad Child N'Ko alphabet systems ISO 15924 Arab, 160 Direction Right-to-left Unicode Arabic alias Unicode U+0600 to U+06FF range (http://www.unicode.org/charts/PDF/U0600.pdf) U+0750 to U+077F (http://www.unicode.org/charts/PDF/U0750.pdf) U+08A0 to U+08FF (http://www.unicode.org/charts/PDF/U08A0.pdf) U+FB50 to U+FDFF (http://www.unicode.org/charts/PDF/UFB50.pdf) U+FE70 to U+FEFF (http://www.unicode.org/charts/PDF/UFE70.pdf) U+1EE00 to U+1EEFF (http://www.unicode.org/charts/PDF/U1EE00.pdf) Note: This page may contain IPA phonetic symbols. Arabic alphabet ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع en.wikipedia.org/wiki/Arabic_alphabet 1/20 2/14/13 Arabic alphabet - Wikipedia, the free encyclopedia غ ف ق ك ل م ن ه و ي History · Transliteration ء Diacritics · Hamza Numerals · Numeration V · T · E (//en.wikipedia.org/w/index.php?title=Template:Arabic_alphabet&action=edit) Contents 1 Consonants 1.1 Alphabetical order 1.2 Letter forms 1.2.1 Table of basic letters 1.2.2 Further notes
    [Show full text]
  • A Sociolinguistic Analysis of the Use of Arabizi in Social Media Among Saudi Arabians
    International Journal of English Linguistics; Vol. 9, No. 6; 2019 ISSN 1923-869X E-ISSN 1923-8703 Published by Canadian Center of Science and Education A Sociolinguistic Analysis of the Use of Arabizi in Social Media Among Saudi Arabians Ashwaq Alsulami1 1 School of Languages, Literatures, and Linguistics, Bangor University, Wales, United Kingdom Correspondence: Ashwaq Alsulami, School of Languages, Literatures, and Linguistics, Bangor University, Wales, United Kingdom. E-mail: [email protected] Received: August 27, 2019 Accepted: September 20, 2019 Online Published: October 28, 2019 doi:10.5539/ijel.v9n6p257 URL: https://doi.org/10.5539/ijel.v9n6p257 Abstract The aim of this sociolinguistically-oriented study is to explore the Arabizi phenomenon which is characterized by spelling Arabic words using the Latin script. It is prevalent in the text-based computer-mediated communications among Saudi Arabians. The study focuses on why Arabizi is used, how, particularly in respect to with whom and in which topics, it is used, the attitudes of its users toward its use and the perceived advantages and disadvantages of its use. Using an online survey, data were collected from 241 participants, 72 of which were users of Arabizi. The findings revealed that the primary reasons for using Arabizi were its being a communication code among youths and a compensation for the lack of Arabic keyboard from technological devices as well as being more expressive than Arabic language. It was also found that Arabizi was primarily used to communicate with friends and individuals of the same age, but not with parents and older people or in formal relationships.
    [Show full text]
  • An Improved Arabic Keyboard Layout
    Sci.Int.(Lahore),33(1),5-15,2021 ISSN 1013-5316; CODEN: SINTE 8 5 AN IMPROVED ARABIC KEYBOARD LAYOUT 1Amjad Qtaish, 2Jalawi Alshudukhi, 3Badiea Alshaibani, 4Yosef Saleh, 5Salam Bazrawi College of Computer Science and Engineering, University of Ha'il, Ha'il, Saudi Arabia. [email protected], [email protected], [email protected], [email protected], [email protected] ABSTRACT: One of the most important human–machine interaction (HMI) systems is the computer keyboard. The keyboard layout (KL) dictates how a person interacts with a physical keyboard through the way in which the letters, numbers, punctuation marks, and symbols are mapped and arranged on the keyboard. Mapping letters onto the keys of a keyboard is complex because many issues need to be taken into considerations, such as the nature of the language, finger fatigue, hand balance, typing speed, and distance traveled by fingers during typing and finger movements. There are two main kinds of KL: English and Arabic. Although numerous research studies have proposed different layouts for the English keyboard, there is a lack of research studies that focus on the Arabic KL. To address this lack, this study analyzed and clarified the limitations of the standard legacy Arabic KL. Then an efficient Arabic KL was proposed to overcome the limitations of the current KL. The frequency of Arabic letters and bi-gram probabilities were measured on a large Arabic corpus in order to assess the current KL and to design the improved Arabic KL. The improved Arabic KL was then evaluated and compared against the current KL in terms of letter frequency, finger-travel distance, hand and finger balance, bi-gram frequency, row distribution, and most frequent words.
    [Show full text]
  • Arabizi – Help Or Harm?
    ARABIZI – HELP OR HARM? ANALYSIS OF THE IMPACTS OF ARABIZI -THREAT OR BENEFIT TO THE WRITTEN ARABIC LANGUAGE? Thesis Submitted to The College of Arts and Sciences of the UNIVERSITY OF DAYTON In Partial Fulfillment of the Requirements for The Degree of Master of Arts in English By Abdurazag Ahmed Saide M.S. Dayton, Ohio December 2019 ARABIZI – HELP OR HARM? AN ANALYSIS OF THE IMPACTS OF ARABIZI - THREAT OR BENEFIT TO THE WRITTEN ARABIC LANGUAGE? Name: Saide, Abdurazag Ahmed APPROVED BY: ____________________________________ Bryan Bardine, Ph.D. Faculty Advisor. ____________________________________ Thomas Wendorf, Ph.D. Faculty Advisor ____________________________________ Kirsten Mendoza, Ph.D. Faculty Advisor ii © Copyright by Abdurazag Saide All rights reserved 2019 iii ABSTRACT ARABIZI – HELP OR HARM? AN ANALYSIS OF THE IMPACTS OF ARABIZI - THREAT OR BENEFIT TO THE WRITTEN ARABIC LANGUAGE? Name: Saide, Abdurazag Ahmed University of Dayton Advisor: Dr. Bryan Bardine In the 21st century, the Arab world has undergone significant change, which has had profound impacts in the lives of Arab people throughout many occupations, fields, and locations. New technological advancements, communication innovations, and globalization efforts have been the core drivers to open the Arab world to Western civilization. The Arab world is characterized by unique cultural traditions, moral principles, and religious beliefs that differ greatly from those of the West. However, the globalization movement has had multiple impacts on Arab culture and the Arabic language. When comparing the Arabic language with the languages of Latin origin, it becomes clear that there exists multiple distinctions, especially in their writings and language. These distinctions did not play such a serious role several centuries ago, but with the increased cooperation between Arab nations and the Western world, these language dissimilarities have become well-recognized.
    [Show full text]
  • Arabic Alphabet 1 Arabic Alphabet
    Arabic alphabet 1 Arabic alphabet Arabic abjad Type Abjad Languages Arabic Time period 400 to the present Parent systems Proto-Sinaitic • Phoenician • Aramaic • Syriac • Nabataean • Arabic abjad Child systems N'Ko alphabet ISO 15924 Arab, 160 Direction Right-to-left Unicode alias Arabic Unicode range [1] U+0600 to U+06FF [2] U+0750 to U+077F [3] U+08A0 to U+08FF [4] U+FB50 to U+FDFF [5] U+FE70 to U+FEFF [6] U+1EE00 to U+1EEFF the Arabic alphabet of the Arabic script ﻍ ﻉ ﻅ ﻁ ﺽ ﺹ ﺵ ﺱ ﺯ ﺭ ﺫ ﺩ ﺥ ﺡ ﺝ ﺙ ﺕ ﺏ ﺍ ﻱ ﻭ ﻩ ﻥ ﻡ ﻝ ﻙ ﻕ ﻑ • history • diacritics • hamza • numerals • numeration abjadiyyah ‘arabiyyah) or Arabic abjad is the Arabic script as it is’ ﺃَﺑْﺠَﺪِﻳَّﺔ ﻋَﺮَﺑِﻴَّﺔ :The Arabic alphabet (Arabic codified for writing the Arabic language. It is written from right to left, in a cursive style, and includes 28 letters. Because letters usually[7] stand for consonants, it is classified as an abjad. Arabic alphabet 2 Consonants The basic Arabic alphabet contains 28 letters. Adaptations of the Arabic script for other languages added and removed some letters, such as Persian, Ottoman, Sindhi, Urdu, Malay, Pashto, and Arabi Malayalam have additional letters, shown below. There are no distinct upper and lower case letter forms. Many letters look similar but are distinguished from one another by dots (’i‘jām) above or below their central part, called rasm. These dots are an integral part of a letter, since they distinguish between letters that represent different sounds.
    [Show full text]
  • The Unicode Standard, Version 3.0, Issued by the Unicode Consor- Tium and Published by Addison-Wesley
    The Unicode Standard Version 3.0 The Unicode Consortium ADDISON–WESLEY An Imprint of Addison Wesley Longman, Inc. Reading, Massachusetts · Harlow, England · Menlo Park, California Berkeley, California · Don Mills, Ontario · Sydney Bonn · Amsterdam · Tokyo · Mexico City Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. If these files have been purchased on computer-readable media, the sole remedy for any claim will be exchange of defective media within ninety days of receipt. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten. ISBN 0-201-61633-5 Copyright © 1991-2000 by Unicode, Inc. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means, electronic, mechanical, photocopying, recording or other- wise, without the prior written permission of the publisher or Unicode, Inc.
    [Show full text]
  • Arabic Diacritics 1 Arabic Diacritics
    Arabic diacritics 1 Arabic diacritics Arabic alphabet ﻱ ﻭ ﻩ ﻥ ﻡ ﻝ ﻙ ﻕ ﻑ ﻍ ﻉ ﻅ ﻁ ﺽ ﺹ ﺵ ﺱ ﺯ ﺭ ﺫ ﺩ ﺥ ﺡ ﺝ ﺙ ﺕ ﺏ ﺍ Arabic script • History • Transliteration • Diacritics • Hamza • Numerals • Numeration • v • t [1] • e 〈ﺗَﺸْﻜِﻴﻞ〉 i‘jām, consonant pointing), and tashkil) 〈ﺇِﻋْﺠَﺎﻡ〉 The Arabic script has numerous diacritics, including i'jam .(〈ﺣَﺮَﻛَﺔ〉 vowel marks; singular: ḥarakah) 〈ﺣَﺮَﻛَﺎﺕ〉 tashkīl, supplementary diacritics). The latter include the ḥarakāt) The Arabic script is an impure abjad, where short consonants and long vowels are represented by letters but short vowels and consonant length are not generally indicated in writing. Tashkīl is optional to represent missing vowels and consonant length. Modern Arabic is nearly always written with consonant pointing, but occasionally unpointed texts are still seen. Early texts such as the Qur'an were initially written without pointing, and pointing was added later to determine the expected readings and interpretations. Tashkil (marks used as phonetic guides) The literal meaning of tashkīl is 'forming'. As the normal Arabic text does not provide enough information about the correct pronunciation, the main purpose of tashkīl (and ḥarakāt) is to provide a phonetic guide or a phonetic aid; i.e. show the correct pronunciation. It serves the same purpose as furigana (also called "ruby") in Japanese or pinyin or zhuyin in Mandarin Chinese for children who are learning to read or foreign learners. The bulk of Arabic script is written without ḥarakāt (or short vowels). However, they are commonly used in some al-Qur’ān). It is not) 〈ﺍﻟْﻘُﺮْﺁﻥ〉 religious texts that demand strict adherence to pronunciation rules such as Qur'an al-ḥadīth; plural: aḥādīth) as well.
    [Show full text]
  • (ISO)'S Arabic Transliteration Scheme an Improvement Over Library Of
    Is the Organization for Standardization (ISO)’s Arabic Transliteration Scheme an Improvement over Library of Congress’? Blair Kuntz University of Toronto he use of library transliteration of foreign languages has always T generated controversy and debate. For those opposed to translit- eration, especially in an age where computerization has introduced Uni- code in which native scripts can be displayed, entered, and searched in library catalogues, the practice is wholly unsatisfactory, serves no-one, and should probably be abolished. The critics point out that, especially for native users of languages with non-Roman scripts, searching for data in transliterated script is time-consuming and frustrating. Bilingual and multi-lingual catalogues, they note, have rendered transliteration unnecessary and obsolete. Why would a native speaker even bother searching for an item using transliteration, when searching using the original script is so much more reliable and efficient? For opponents of transliteration, transliteration is unreliable and serves neither librari- ans, bibliographers nor users of bibliographic systems. The time spent transliterating text in a record, when it could simply be entered in its native script, is wasteful and unproductive. Those who support library transliteration, even with the adoption of Unicode, however, argue that there are many reasons to continue the practice, inefficient as it is. Most of the arguments in favor of transliter- ation assert that while it is useless for native speakers to use translitera- tion, many other people need to search for records in any particular lan- guage employing non-Roman script. For example, transliteration is the only realistic way for Western-speaking librarians to maintain control over non-Roman materials.
    [Show full text]
  • Jan Just Witkam & A.G.P. Janson, Multi-Lingual
    Multi-Lingual Scholar'^ and Font Scholàr'^, a word-processorand a tool for font design for the student of Orientallanguages A testfu JanJast Vitkam dy A.G.P. Janson* l. Murrr-LTNGUALScHOLAR''. rne Wono-PnocessoR up to l5 printer fonts (5 Roman, 4 Hebrew, 2 Greek.2 Cyrillic and 2 Arabic fonts); thus with these five When comparedto other word-processors,what extras alphabetsone can write, edit and print in dozens of doesthis program offer its usersand what is neededto different languages.The program works in a purely operateit? The most conspicuousextra feature of this graphicalway. so that it offersnumerous other features program is that it enabiesthe userto work on the same which are seldomavailable in other word-processing documentconcurrently with up to five different alpha- software.One manifestexample of suchan extra featu- bets of one'sown choice- provided.of course.that re is the option ol uorking eitheruith largeon-screen they areinstalled. These alphabets can be mixedfreely'. characters(in ,10columns) or nith smallercharacters irrespectiveof the direction of u'riting. These Ín'e (in the usual80 columns).Workin-s *'ith largecharac- alphabetsinclude all sortsof accentsand vow'elsigns. tershas the adranta-qethat all the rouels. accentsand for both Europeanand non-Europeanlanguages. and the iike can be typed and seen on the screen.Using eachhas multiple font sizesand characterstyles. theo- eitherof thesescreen fonts has no consequencesfor the retically up to 30 fonts per document and in practice, way the document will be printed. since the program with the fonts that are suppliedas standardprocedure, recalculatesevery page for printing according to the user'sspecifications. * Multi-Lingual Scholar'' reviewedby Jan Just Witkam; To operateMLS, one needsto have at leastan reN,ÍPC Font Scholar''reviewed by' A.G.P.
    [Show full text]
  • Inhaltsverzeichnis Use of the Arabic Language and Script in Labdoo Project, Setting up Ubuntu Laptop
    How to install and how use „non latin letters“ language and fonts in the Labdoo project (Arabic, Urdu, Farsi, Hindi, Russian, Tamil, Thai etc.) Inhaltsverzeichnis Use of the Arabic language and script in Labdoo project, setting up ubuntu laptop...........1 Enabling Arabic support..............................................................................................................2 Keyboard layout switching.........................................................................................................5 Changing the text direction.......................................................................................................6 Linux programs with Arabic language support.......................................................................6 Arabic language as a system wide set...........................................................................................8 Keyboard layout Arabic (Algeria, Morocco, Tunisia):..............................................................9 Keyboard layout Arabic (Bahrain, Egypt, Jordan, Kuwait, Lebanon, Oman, Qatar, Saudi Arabia, Syrai, U.A.E, Yemen):.......................................................................................................9 Keyboard layout Urdu (Pakistan):............................................................................................10 Keyboard layout Farsi (Iran):.....................................................................................................10 Keyboard layout Hindi (India):..................................................................................................11
    [Show full text]
  • The Digital Orientalist Keyboard Layouts by L.W.C
    The Digital Orientalist Keyboard Layouts By L.W.C. van Lit *VERSION MARCH 9, 2014* ATTENTION: discrepancies exist between the Mac OSX and Microsoft Windows Arabic keyboard layouts. They are negligible and should become obvious upon usage. This document has been made using the Mac OSX layout. The Latin keyboard layout is only available for Mac OSX. Introduction This document explains how to install and use two keyboard lay-outs useful to all students, scholars and others interested in the Arabic and Persian language, whose primary language is a Western- European one. One will !nd that once installed, these layouts increase productivity tremendously. The Latin layout takes care of characters necessary for proper transliteration of Arabic words and accommodates all major transliteration systems. The Arabic layout allows the user to type Arabic in near-transliterary style, which is more intuitive for Orientalists. Pleaseًًًٛ note that Mac OS X Lion-users do not necessarily need to install the Latin keyboard-layout but only need to modify the regular English keyboard. However, the Latin keyboard-layout proofs to be faster and is therefore recommended. Installation for Windows • Run the Installer !le • On the taskbar, click on the language icon and select ‘add keyboard’ • Look up and select the keyboard • Delete other active Arabic keyboards Installation for Mac OSX Modifying the regular English keyboard Put the Keyboard-en.plist !le in the following folder: /System/Library/Input Methods/PressAndHold.app/Contents/Resources The computer will ask for permission and will ask what you want to do since a !le with the same name already exists.
    [Show full text]