Romanization System, Persian Text, Natural Language Processing, Written Manuscript
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Technical Reference Manual for the Standardization of Geographical Names United Nations Group of Experts on Geographical Names
ST/ESA/STAT/SER.M/87 Department of Economic and Social Affairs Statistics Division Technical reference manual for the standardization of geographical names United Nations Group of Experts on Geographical Names United Nations New York, 2007 The Department of Economic and Social Affairs of the United Nations Secretariat is a vital interface between global policies in the economic, social and environmental spheres and national action. The Department works in three main interlinked areas: (i) it compiles, generates and analyses a wide range of economic, social and environmental data and information on which Member States of the United Nations draw to review common problems and to take stock of policy options; (ii) it facilitates the negotiations of Member States in many intergovernmental bodies on joint courses of action to address ongoing or emerging global challenges; and (iii) it advises interested Governments on the ways and means of translating policy frameworks developed in United Nations conferences and summits into programmes at the country level and, through technical assistance, helps build national capacities. NOTE The designations employed and the presentation of material in the present publication do not imply the expression of any opinion whatsoever on the part of the Secretariat of the United Nations concerning the legal status of any country, territory, city or area or of its authorities, or concerning the delimitation of its frontiers or boundaries. The term “country” as used in the text of this publication also refers, as appropriate, to territories or areas. Symbols of United Nations documents are composed of capital letters combined with figures. ST/ESA/STAT/SER.M/87 UNITED NATIONS PUBLICATION Sales No. -
Arabic Alphabet - Wikipedia, the Free Encyclopedia Arabic Alphabet from Wikipedia, the Free Encyclopedia
2/14/13 Arabic alphabet - Wikipedia, the free encyclopedia Arabic alphabet From Wikipedia, the free encyclopedia َأﺑْ َﺠ ِﺪﯾﱠﺔ َﻋ َﺮﺑِﯿﱠﺔ :The Arabic alphabet (Arabic ’abjadiyyah ‘arabiyyah) or Arabic abjad is Arabic abjad the Arabic script as it is codified for writing the Arabic language. It is written from right to left, in a cursive style, and includes 28 letters. Because letters usually[1] stand for consonants, it is classified as an abjad. Type Abjad Languages Arabic Time 400 to the present period Parent Proto-Sinaitic systems Phoenician Aramaic Syriac Nabataean Arabic abjad Child N'Ko alphabet systems ISO 15924 Arab, 160 Direction Right-to-left Unicode Arabic alias Unicode U+0600 to U+06FF range (http://www.unicode.org/charts/PDF/U0600.pdf) U+0750 to U+077F (http://www.unicode.org/charts/PDF/U0750.pdf) U+08A0 to U+08FF (http://www.unicode.org/charts/PDF/U08A0.pdf) U+FB50 to U+FDFF (http://www.unicode.org/charts/PDF/UFB50.pdf) U+FE70 to U+FEFF (http://www.unicode.org/charts/PDF/UFE70.pdf) U+1EE00 to U+1EEFF (http://www.unicode.org/charts/PDF/U1EE00.pdf) Note: This page may contain IPA phonetic symbols. Arabic alphabet ا ب ت ث ج ح خ د ذ ر ز س ش ص ض ط ظ ع en.wikipedia.org/wiki/Arabic_alphabet 1/20 2/14/13 Arabic alphabet - Wikipedia, the free encyclopedia غ ف ق ك ل م ن ه و ي History · Transliteration ء Diacritics · Hamza Numerals · Numeration V · T · E (//en.wikipedia.org/w/index.php?title=Template:Arabic_alphabet&action=edit) Contents 1 Consonants 1.1 Alphabetical order 1.2 Letter forms 1.2.1 Table of basic letters 1.2.2 Further notes -
DRAFT Arabtex a System for Typesetting Arabic User Manual Version 4.00
DRAFT ArabTEX a System for Typesetting Arabic User Manual Version 4.00 12 Klaus Lagally May 25, 1999 1Report Nr. 1998/09, Universit¨at Stuttgart, Fakult¨at Informatik, Breitwiesenstraße 20–22, 70565 Stuttgart, Germany 2This Report supersedes Reports Nr. 1992/06 and 1993/11 Overview ArabTEX is a package extending the capabilities of TEX/LATEX to generate the Perso-Arabic writing from an ASCII transliteration for texts in several languages using the Arabic script. It consists of a TEX macro package and an Arabic font in several sizes, presently only available in the Naskhi style. ArabTEX will run with Plain TEXandalsowithLATEX2e. It is compatible with Babel, CJK, the EDMAC package, and PicTEX (with some restrictions); other additions to TEX have not been tried. ArabTEX is primarily intended for generating the Arabic writing, but the stan- dard scientific transliteration can also be easily produced. For languages other than Arabic that are customarily written in extensions of the Perso-Arabic script some limited support is available. ArabTEX defines its own input notation which is both machine, and human, readable, and suited for electronic transmission and E-Mail communication. However, texts in many of the Arabic standard encodings can also be processed. Starting with Version 3.02, ArabTEX also provides support for fully vowelized Hebrew, both in its private ASCII input notation and in several other popular encodings. ArabTEX is copyrighted, but free use for scientific, experimental and other strictly private, noncommercial purposes is granted. Offprints of scientific publi- cations using ArabTEX are welcome. Using ArabTEX otherwise requires a license agreement. There is no warranty of any kind, either expressed or implied. -
5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721
Internet Engineering Task Force (IETF) P. Faltstrom, Ed. Request for Comments: 5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721 The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) Abstract This document specifies rules for deciding whether a code point, considered in isolation or in context, is a candidate for inclusion in an Internationalized Domain Name (IDN). It is part of the specification of Internationalizing Domain Names in Applications 2008 (IDNA2008). Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc5892. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License. -
D Igital Text
Digital text The joys of character sets Contents Storing text ¡ General problems ¡ Legacy character encodings ¡ Unicode ¡ Markup languages Using text ¡ Processing and display ¡ Programming languages A little bit about writing systems Overview Latin Cyrillic Devanagari − − − − − − Tibetan \ / / Gujarati | \ / − Armenian / Bengali SOGDIAN − Mongolian \ / / Gurumukhi SCRIPT Greek − Georgian / Oriya Chinese | / / | / Telugu / PHOENICIAN BRAHMI − − Kannada SINITIC − Japanese SCRIPT \ SCRIPT Malayalam SCRIPT \ / | \ \ Tamil \ Hebrew | Arabic \ Korean | \ \ − − Sinhala | \ \ | \ \ _ _ Burmese | \ \ Khmer | \ \ Ethiopic Thaana \ _ _ Thai Lao The easy ones Latin is the alphabet and writing system used in the West and some other places Greek and Cyrillic (Russian) are very similar, they just use different characters Armenian and Georgian are also relatively similar More difficult Hebrew is written from right−to−left, but numbers go left−to−right... Arabic has the same rules, but also requires variant selection depending on context and ligature forming The far east Chinese uses two ’alphabets’: hanzi ideographs and zhuyin syllables Japanese mixes four alphabets: kanji ideographs, katakana and hiragana syllables and romaji (latin) letters and numbers Korean uses hangul ideographs, combined from jamo components Vietnamese uses latin letters... The Indic languages Based on syllabic alphabets Require complex ligature forming Letters are not written in logical order, but require a strange ’circular’ ordering In addition, a single line consists of separate -
Romanization Examples
Romanization examples Each title of a language or a writing system is followed by a note on the appropriate romanization system used (UN = United Nations, BGN/PCGN = US Board on Geographic Names and Permanent Committee on Geographical Names for British Official Use) Amharic [UN 1967, I/17] Lao [national 1966] ኢትዮጵያ Ityop’ya [ Ethiopia ], አዲስ አበባ Addis Abe ̱ ba ລາວ Lao [ Laos ], ວງຈັ ນ Viangchan Arabic [UN 1972, II/8] Macedonian Cyrillic [UN 1977, III/11] Jaz īrat al-‘Arab [ Arabian Peninsula ] Скопје Skopje, Битола Bitola ز رة ارب Armenian [BGN/PCGN 1981] Malayalam [UN 1972, II/11; 1977, III/12] Հայաստան Hayastan [ Armenia ], Երևան Yerevan Kera ḷaṁ, Tiruvanantapura ṁ Assamese [UN 1972, II/11; 1977, III/12] Maldivian [national 1987] Asam [ Assam ], Dichhapura [ Dispur ] ޖ އ ރ ހ ވ ދ Dhivehi Raajje [ Maldives ], ލ މ Maale Bengali [UN 1972, II/11; 1977, III/12] Marathi [UN 1972, II/11; 1977, III/12] Bāṁ lādesh, Dhaka महारा Mah ārāṣhṭra, मुंबई Mu ṁba ī Bulgarian [UN 1977, III/10] Mongolian (Cyrillic) [BGN/PCGN 1964] Република България Republika B ǎlgarija Монгол улс Mongol uls, Улаанбаатар Ulaanbaatar Burmese [BGN/PCGN 1970] Nepalese [UN 1972, II/11; 1977, III/12] ြမန်မာ Myanma, ရန်ကန် Yangôn नेपाल Nepāl, काठमाड Kāṭhm āḍau ṁ [Kathmandu ] Byelorussian [national 2007] Беларусь Bielaru ś, Минск Minsk Oriya [UN 1972, II/11; 1977, III/12] Chinese [UN 1977, III/8] Oṙish ā, Bhubaneshbar 中国 Zhongguo, 北京 Beijing Pashto [BGN/PCGN 1968] XQY Kābulل ,Afgh ānist ān اQRSTQUVن [Dzongkha [national 1997 འག་ལ Drukyuel [Bhutan ], ཐིམ་ Thimphu Persian -
United States RBDS Standard Specification of the Radio Broadcast Data System (RBDS) April, 2005
NRSC STANDARD NATIONAL RADIO SYSTEMS COMMITTEE NRSC-4-A United States RBDS Standard Specification of the radio broadcast data system (RBDS) April, 2005 Part II - Annexes NAB: 1771 N Street, N.W. CEA: 1919 South Eads Street Washington, DC 20036 Arlington, VA 22202 Tel: (202) 429-5356 Fax: (202) 775-4981 Tel: (703) 907-7660 Fax: (703) 907-8113 Co-sponsored by the Consumer Electronics Association and the National Association of Broadcasters http://www.nrscstandards.org NOTICE NRSC Standards, Bulletins and other technical publications are designed to serve the public interest through eliminating misunderstandings between manufacturers and purchasers, facilitating interchangeability and improvement of products, and assisting the purchaser in selecting and obtaining with minimum delay the proper product for his particular need. Existence of such Standards, Bulletins and other technical publications shall not in any respect preclude any member or nonmember of the Consumer Electronics Association (CEA) or the National Association of Broadcasters (NAB) from manufacturing or selling products not conforming to such Standards, Bulletins or other technical publications, nor shall the existence of such Standards, Bulletins and other technical publications preclude their voluntary use by those other than CEA or NAB members, whether the standard is to be used either domestically or internationally. Standards, Bulletins and other technical publications are adopted by the NRSC in accordance with the NRSC patent policy. By such action, CEA and NAB do not assume any liability to any patent owner, nor do they assume any obligation whatever to parties adopting the Standard, Bulletin or other technical publication. Note: The user's attention is called to the possibility that compliance with this standard may require use of an invention covered by patent rights. -
“Muda” System in Facilitating Writing Arabic Romanization in Word Processor; Standardization of Transliteration and Transcription
European Journal of Molecular & Clinical Medicine ISSN 2515-8260 Volume 8, Issue 2, 2021 “MUDA” SYSTEM IN FACILITATING WRITING ARABIC ROMANIZATION IN WORD PROCESSOR; STANDARDIZATION OF TRANSLITERATION AND TRANSCRIPTION Nurulasyikin “MUDA”, PhD1, Hazmi Dahlan2 1Pusat Penataran Ilmu dan Bahasa, Universiti Malaysia Sabah, Kampus Antarabangsa Labuan 2Fakulti Kewangan Antarabangsa Labuan, Universiti Malaysia Sabah, Kampus Antarabangsa Labuan Abstract This study aims to propose a new system in Arabic romanization to simplify the typing process by using the QWERTY keyboard. The proposed system will be named as “MUDA” as the researcher’s surname and the name itself is nearly similar to the word “MUDA”H in Malay which mean easy or simple that represent the concept of the system. This is to facilitate the needs of the writers, and to decrease the gap of ambiguity for the systems created previously with the one that can increase the product of academic writing when dealing with Arabic romanization. The data of this study is analysed by finding the similarity within the previous transliteration systems and the phonetic system in English. Both systems then will be analysed by comparing them with the characters on the typing keyboard in terms of similarity. This study found that there are 23 keys will be used in 3 ways as follow; (i) single character typing excluding shift; (ii) shift + character (T,S,D,Z,”); and (iii) combining character (th,sh,gh,kh,dz). This explains that no more complicated system of typing process will occur. Hence the speed of completing task in typing Arabic romanization will become faster. Key Words: Arabic Romanization, typing keyboard, word processing Introduction Arabic has its own character as what Japanese, Russian, Tamil and others do. -
Contents Origins Transliteration
Ayin , ע Ayin (also ayn or ain; transliterated ⟨ʿ⟩) is the sixteenth letter of the Semitic abjads, including Phoenician ʿayin , Hebrew ʿayin ← Samekh Ayin Pe → [where it is sixteenth in abjadi order only).[1) ع Aramaic ʿē , Syriac ʿē , and Arabic ʿayn Phoenician Hebrew Aramaic Syriac Arabic The letter represents or is used to represent a voiced pharyngeal fricative (/ʕ/) or a similarly articulated consonant. In some Semitic ع ע languages and dialects, the phonetic value of the letter has changed, or the phoneme has been lost altogether (thus, in Modern Hebrew it is reduced to a glottal stop or is omitted entirely). Phonemic ʕ The Phoenician letter is the origin of the Greek, Latin and Cyrillic letterO . representation Position in 16 alphabet Contents Numerical 70 value Origins (no numeric value in Transliteration Maltese) Unicode Alphabetic derivatives of the Arabic ʿayn Pronunciation Phoenician Hebrew Ayin Greek Latin Cyrillic Phonetic representation Ο O О Significance Character encodings References External links Origins The letter name is derived from Proto-Semitic *ʿayn- "eye", and the Phoenician letter had the shape of a circle or oval, clearly representing an eye, perhaps ultimately (via Proto-Sinaitic) derived from the ır͗ hieroglyph (Gardiner D4).[2] The Phoenician letter gave rise to theGreek Ο, Latin O, and Cyrillic О, all representing vowels. The sound represented by ayin is common to much of theAfroasiatic language family, such as in the Egyptian language, the Cushitic languages and the Semitic languages. Transliteration Depending on typography, this could look similar .( ﻋَ َﺮب In Semitic philology, there is a long-standing tradition of rendering Semitic ayin with Greek rough breathing the mark ̔〉 (e.g. -
Arabic in Romanization
Transliteration of Arabic 1/6 ARABIC Arabic script* DIN 31635 ISO 233 ISO/R 233 UN ALA-LC EI 1982(1.0) 1984(2.0) 1961(3.0) 1972(4.0) 1997(5.0) 1960(6.0) iso ini med !n Consonants! " 01 # $% &% ! " — (3.1)(3.2) — (4.1) — — 02 ' ( ) , * ! " #, $ (2.1) —, ’ (3.3) %, — (4.2) —, ’ (5.1) " 03 + , - . b b b b b b 04 / 0 1 2 t t t t t t 05 3 4 5 6 & & & th th th 06 7 8 9 : ' ' ' j j dj 07 ; < = > ( ( ( ) ( ( 08 ? @ * + + kh kh kh 09 A B d d d d d d 10 C D , , , dh dh dh 11 E F r r r r r r 12 G H I J z z z z z z 13 K L M N s s s s s s 14 O P Q R - - - sh sh sh 15 S T U V . / . 16 W X Y Z 0 0 0 d 1 0 0 17 [ \ ] ^ 2 2 2 3 2 2 18 _ ` a b 4 4 4 z 1 4 4 19 c d e f 5 5 5 6 6 5 20 g h i j 7 7 8 gh gh gh 21 k l m n f f f f f f 22 o p q r q q q q q 9 23 s t u v k k k k k k 24 w x y z l l l l l l 25 { | } ~ m m m m m m 26 • € • n n n n n n 27 h h h h h h 28 … " h, t (1.1) : ;, <(3.4) h, t (4.3) h, t (5.2) a, at (6.1) 29 w w w w w w 30 y y y y y y 31 ! = — y y ! • 32 s! l! la" l! l! l! l! 33 # al- (1.2) "#al (2.2) al- (3.5) al- (4.4) al- (5.3) al-, %l- (6.2) Thomas T. -
The Arabi System — TEX Writes in Arabic and Farsi
The Arabi system | ] ¨r` [ A\ TEX writes in Arabic and Farsi Youssef Jabri Ecole´ Nationale des Sciences Appliqu´ees, Oujda, Morocco yjabri (at) ensa dot univ-oujda dot ac dot ma Abstract In this paper, we will present a newly arrived package on CTAN that provides Arabic script support for TEX without the need for an external pre-processor. The Arabi package adds one of the last major multilingual typesetting capabilities to Babel by adding support for the Arabic ¨r and Farsi ¨FCA languages. Other languages using the Arabic script should also be more or less easily imple- mentable. Arabi comes with many good quality free fonts, Arabic and Farsi, and may also use commercial fonts. It supports many 8-bit input encodings (namely, CP-1256, ISO-8859-6 and Unicode UTF-8) and can typeset classical Arabic poetry. The package is distributed under the LATEX Project Public License (LPPL), and has the LPPL maintenance status \author-maintained". It can be used freely (including commercially) to produce beautiful texts that mix Arabic, Farsi and Latin (or other) characters. Pl Y ¾Abn Tn ®¤ Tr` ¤r Am`tF TAk t§ A\ ¨r` TEC .¤r fOt TEX > < A\ Am`tFA d¤ dnts ¨ n A\ (¨FCA ¤ ¨r)tl Am`tF TAk S ¨r` TEC , T¤rm rb Cdq tmt§¤ ¯m ¢k zymt§ A\n @h , T§db @n¤ Y At§ ¯ ¢ Y TAR . AARn £EA A \` Am`tF® A ¢± ¾AO ¾AA ¨r` dq§ . Tmlk ¨ ¤r AkJ d§dt ¨CA A` © ¨ ¨t ªW d Am`tF ¢nkm§ Am Am`tF¯ r ªW Tmm ¯¤ ¨A ¨r` , A\n TbsnA A w¡ Am . -
Writing Arabizi: Orthographic Variation in Romanized
WRITING ARABIZI: ORTHOGRAPHIC VARIATION IN ROMANIZED LEBANESE ARABIC ON TWITTER ! ! ! ! Natalie!Sullivan! ! ! ! TC!660H!! Plan!II!Honors!Program! The!University!of!Texas!at!Austin! ! ! ! ! May!4,!2017! ! ! ! ! ! ! ! _______________________________________________________! Barbara!Bullock,!Ph.D.! Department!of!French!&!Italian! Supervising!Professor! ! ! ! ! _______________________________________________________! John!Huehnergard,!Ph.D.! Department!of!Middle!Eastern!Studies! Second!Reader!! ii ABSTRACT Author: Natalie Sullivan Title: Writing Arabizi: Orthographic Variation in Romanized Lebanese Arabic on Twitter Supervising Professors: Dr. Barbara Bullock, Dr. John Huehnergard How does technology influence the script in which a language is written? Over the past few decades, a new form of writing has emerged across the Arab world. Known as Arabizi, it is a type of Romanized Arabic that uses Latin characters instead of Arabic script. It is mainly used by youth in technology-related contexts such as social media and texting, and has made many older Arabic speakers fear that more standard forms of Arabic may be in danger because of its use. Prior work on Arabizi suggests that although it is used frequently on social media, its orthography is not yet standardized (Palfreyman and Khalil, 2003; Abdel-Ghaffar et al., 2011). Therefore, this thesis aimed to examine orthographic variation in Romanized Lebanese Arabic, which has rarely been studied as a Romanized dialect. It was interested in how often Arabizi is used on Twitter in Lebanon and the extent of its orthographic variation. Using Twitter data collected from Beirut, tweets were analyzed to discover the most common orthographic variants in Arabizi for each Arabic letter, as well as the overall rate of Arabizi use. Results show that Arabizi was not used as frequently as hypothesized on Twitter, probably because of its low prestige and increased globalization.