P229A140007 Trustees of Indiana University
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
ARTICLE Development of a Gold-Standard Pashto Dataset and a Segmentation App Yan Han and Marek Rychlik
ARTICLE Development of a Gold-standard Pashto Dataset and a Segmentation App Yan Han and Marek Rychlik ABSTRACT The article aims to introduce a gold-standard Pashto dataset and a segmentation app. The Pashto dataset consists of 300 line images and corresponding Pashto text from three selected books. A line image is simply an image consisting of one text line from a scanned page. To our knowledge, this is one of the first open access datasets which directly maps line images to their corresponding text in the Pashto language. We also introduce the development of a segmentation app using textbox expanding algorithms, a different approach to OCR segmentation. The authors discuss the steps to build a Pashto dataset and develop our unique approach to segmentation. The article starts with the nature of the Pashto alphabet and its unique diacritics which require special considerations for segmentation. Needs for datasets and a few available Pashto datasets are reviewed. Criteria of selection of data sources are discussed and three books were selected by our language specialist from the Afghan Digital Repository. The authors review previous segmentation methods and introduce a new approach to segmentation for Pashto content. The segmentation app and results are discussed to show readers how to adjust variables for different books. Our unique segmentation approach uses an expanding textbox method which performs very well given the nature of the Pashto scripts. The app can also be used for Persian and other languages using the Arabic writing system. The dataset can be used for OCR training, OCR testing, and machine learning applications related to content in Pashto. -
A Minimalist Analysis of Uyghur Genitives Stephen Politzer-Ahles
A Minimalist analysis of Uyghur genitives Stephen Politzer-Ahles University of Kansas Abstract This paper investigates the syntactic structure of so-called genitive-possessive DPs in Uyghur, a Turkic language. Uyghur genitive-possessives bear suffixes on both the “possessing” entity (comparable to the Saxon genitive 's in English) and the “possessed” one. The suffix on the “possessor”, -ning, is considered a genitive case marker; the suffix on the “possessed” has multiple allomorphs and is considered an agreement marker that agrees in person and number with the “possessor”. Based on the multiplicity of semantic roles that the “possessing” object may bear, and the observation that it may be dropped from the DP, an analogy is made between genitive-possessive DPs and finite TPs. It is proposed that “possessors” behave in a manner parallel to that of subjects of TPs: they are introduced by a quasi-functional head n or within a gerund, and raise to [Spec,DP] to receive genitive case from D. The agreement suffix, on the other hand, is treated as the phonological realization of an Agr head that is introduced with unvalued phi-features, features which are valued when the “possessing” entity passes through the specifier of AgrP. Adopting this structure can explain data on the realization of definiteness in genitive and non-genitive DPs, and the distribution of adverbials within gerunds. Introduction One of the key components to a theory of noun phrase structure is an explanation of how possessive marking is carried out within the DP. For example, a theory of English DPs owes an explanation of where the 's comes from in phrases like “John’s book”, and how case-checking is done in such a phrase. -
New Language Resources for the Pashto Language
New language resources for the Pashto language Djamel Mostefa 1 , Khalid Choukri 1 , Sylvie Brunessaux 2 , Karim Boudahmane 3 1 Evaluation and Language resources Distribution Agency, France 2 CASSIDIAN, France 3 Direction Générale de l'Armement, France E-mail: [email protected], [email protected], [email protected], [email protected] Abstract This paper reports on the development of new language resources for the Pashto language, a very low-resource language spoken in Afghanistan and Pakistan. In the scope of a multilingual data collection project, three large corpora are collected for Pashto. Firstly a monolingual text corpus of 100 million words is produced. Secondly a 100 hours speech database is recorded and manually transcribed. Finally a bilingual Pashto-French parallel corpus of around 2 million is produced by translating Pashto texts into French. These resources will be used to develop Human Language Technology systems for Pashto with a special focus on Machine Translation. Keywords: Pashto, low-resource language, speech corpus, monolingual and multilingual text corpora, web crawling. other one being Dari) and one regional language in 1. Introduction Pakistan. There are very few corpora and Human Language The code assigned to the language by the ISO 639-3 Technology (HLT) services available for Pashto. No standard is [pus]. language resources for Pashto can be found in the According to the Ethnologue.com website, it is spoken by catalogues of LDC1 and ELRA2. around 20 million people and three main dialects are to be Pashto is a very low-resource language. Google doesn't considered: support Pashto in its search engine or translation services. -
Central Eurasian Studies Society Fourth Annual Conference October
Central Eurasian Studies Society Fourth Annual Conference October 2- 5, 2003 Hosted by: Program on Central Asia and the Caucasus Davis Center for Russian and Eurasian Studies Harvard University Cambridge, Mass., USA Table of Contents Conference Schedule ................................................................................... 1 Film Program ......................................................................................... 2 Panel Grids ................................................................................................ 3 List of Panels .............................................................................................. 5 Schedule of Panels ...................................................................................... 7 Friday • Session I • 9:00 am-10:45 am ..................................................... 7 Friday • Session II • 11:00 am-12:45 pm .................................................. 9 Friday • Session III • 2:00 pm-3:45 pm .................................................. 11 Friday • Session IV • 4:00 pm-5:45 pm .................................................. 12 Saturday • Session I • 9:00 am-10:45 am ............................................... 14 Saturday • Session II • 11:00 am-12:45 pm ............................................ 16 Saturday • Session III • 2:00 pm-3:45 pm .............................................. 18 Saturday • Session IV • 4:00 pm-6:30 pm .............................................. 20 Sunday • Session I • 9:00 am-10:45 am ................................................ -
Usage and Chinese Translation of Uyghur Auxiliary Verb “Çiq-”*
Advances in Social Science, Education and Humanities Research, volume 378 6th International Conference on Education, Language, Art and Inter-cultural Communication (ICELAIC 2019) Usage and Chinese Translation of Uyghur Auxiliary Verb “Çiq-”* Rehmitulla Tudaji School of Uyghur Language and Culture Northwest Minzu University Lanzhou, China 730030 Guzalnur Tursun Osman Juma** School of Uyghur Language and Culture School of Uyghur Language and Culture Northwest Minzu University Northwest Minzu University Lanzhou, China 730030 Lanzhou, China 730030 **Corresponding Author Abstract—Uyghur "çiq" is a kind of auxiliary verb, which attaches to the coverb and loses its lexical meaning completely or II. RELEVANT RESEARCHES ON UYGUR AUXILIARY VERBS partly, expressing the aspect or modal meaning. Uyghur Uyghur auxiliary verbs have rich functions. Although they auxiliaries are a kind of auxiliary verbs, which completely or are only a kind of auxiliary verbs, when they are combined partially lose their lexical meaning and attach to the coverb to with co-verbs, they are the carriers of the whole morpheme express the aspect or modal meaning. They are only a kind of change. Therefore, the understanding of Uyghur auxiliary auxiliary verbs; but when they are combined with coverbs, they verbs is relatively unified. The main research results are as are the carriers of the whole morpheme change. In Uyghur, auxiliary verbs play an important role in enriching language, as follows: it can make the expression of thoughts and feelings more delicate and improve communication efficiency. In addition, it can make A. Study on the Meaning of Uyghur Auxiliary Verbs the language succinct and lively and enable the thoughts and Experts and scholars have a relatively unified feelings to be expressed accurately and appropriately. -
Afghanistan's Alternatives for Peace, Governance and Development
The Afghanistan Papers | No. 2, August 2009 Afghanistan’s Alternatives for Peace, Governance and Development: Transforming Subjects to Citizens & Rulers to Civil Servants M. Nazif Shahrani This paper is a co-publication of Addressing International Governance Challenges THE CENTRE FOR INTERNATIONAL GOVERNANCE INNOVATION THE AFGHANISTAN PAPERS ABstract Letter from the Executive Director The policies of the United States and its international partners in Afghanistan during the past eight years have proven wrong-headed and ineffective in delivering the On behalf of The Centre for International Governance promised peace, stability and democratic governance. Innovation (CIGI), it gives me great pleasure to introduce This paper critically examines the underlying assumptions our Afghanistan Papers, a signature product of CIGI’s behind these failing policies and explores alternative major research program on Afghanistan. CIGI is an approaches to rescue Afghanistan’s war-to-peace transition. independent, nonpartisan think tank that addresses Faulty assumptions on the part of key US government international governance challenges. Led by a group of advisors, decision makers and many of their Afghan and experienced practitioners and distinguished academics, Pakistani clients have contributed to the resurgence of the CIGI supports research, forms networks, advances Taliban and a crisis of trust for the Karzai government and policy debate, builds capacity and generates ideas for the internationally supported state-building process. The multilateral governance improvements. Conducting Obama administration must discard the misguided policies an active agenda of research, events and publications, of the past and adopt a historically informed and culturally CIGI’s interdisciplinary work includes collaboration sensitive strategy aimed at fundamentally changing the with policy, business and academic communities around governance system in Afghanistan, rather than simply the world. -
Pashto Alphabets
LEARNING PASHTO Intensive Elementary & Secondary Pashto for Military and other Professionals by Dawood Azami Visiting Scholar Email: [email protected] The Middle East Studies Center (MESC) The Ohio State University, Columbus August 2009 1 Aims of the Course: *To provide a thorough introductory course in basic Pashto with the accent on practical spoken Pashto, coverage of grammar, familiarity with Pashto pronunciation, and essential vocabulary. *Ability to communicate within a range of situations and to handle simple survival situations (e.g. finding lodging, food, transportation etc.) *Ability to read the simple Pashto texts dealing with a variety of social and basic needs. In addition to author’s own command and expertise, a number of sources (books, both published and unpublished, journals, websites, etc.) have been consulted while preparing this material. Word of thanks: The author would like to thank Dr. Alam Payind, Director, Middle East Studies Center (MESC), and Melinda McClimans, Assistant Director, MESC. Their cooperation and assistance certainly made my stay in Columbus easier and enjoyable. Copy Right: This course material is for teaching of Pashto language at The Ohio State University. The author holds the copy right for any other use. The author intends to publish a modified version of the material as a book in the future. Please contact the author for more information. (Email: [email protected] ) 2 Contents I. Pashto Alphabet .................................................................................................................. 5 A. Pashto Sounds Similar to English ............................................................................... 7 B. Pashto Sounds Different from English ........................................................................ 7 C. Two letters pronounced differently in major Pashto dialects: ...................................... 8 D. Arabic Letters/ Sounds in Pashto .................................................................................. 8 II. -
Naila Habib Khan
LIGATURE RECOGNITION SYSTEM FOR PRINTED URDU SCRIPT USING GENETIC ALGORITHM BASED HIERARCHICAL CLUSTERING A Thesis Submitted to the Faculty of the Institute of Management Sciences, Peshawar in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY COMPUTER SCIENCE By NAILA HABIB KHAN DEPARTMENT OF COMPUTER SCIENCE INSTITUTE OF MANAGEMENT SCIENCES PESHAWAR, PAKISTAN SESSION 2014-2017 This is to certify that the research work presented in this thesis entitled “Ligature Recognition System for Printed Urdu Script Using Genetic Algorithm Based Hierarchical Clustering” was conducted by Naila Habib Khan under the supervision of Dr. Awais Adnan, Institute of Management Sciences, Peshawar, Pakistan. No part of this thesis has been submitted anywhere else for any other degree. This thesis is submitted to the Institute of Management Sciences, Peshawar in partial fulfilment of the requirements for the degree of Doctor of Philosophy in the field of Computer Science. Student Name: Naila Habib Khan Signature: ___________________________ Examination Committee: a) External Foreign Examiner 1: Dr. Yue Cao School of Computing and Communications, Lancaster University, UK Signature: ___________________________ b) External Foreign Examiner 2: Prof. Dr. Ibrahim A. Hameed Deputy Head of Research and Innovation Department of ICT and Natural Sciences, Norwegian University of Science and Technology, UK Signature: ___________________________ c) External Local Examiner: Dr. Saeeda Naz Head of Department/ Assistant Professor Govt. Girls Postgraduate College, Abbotabad, Pakistan Signature: ___________________________ ii d) Internal Local Examiner: Dr. Imran Ahmed Mughal Assistant Professor Institute of Management Sciences, Peshawar, Pakistan Signature: ___________________________ Supervisor: Dr. Awais Adnan Assistant Professor Institute of Management Sciences, Peshawar, Pakistan Signature: ___________________________ Director: Dr. -
Medieval Hebrew Texts and European River Names Ephraim Nissan London [email protected]
ONOMÀSTICA 5 (2019): 187–203 | RECEPCIÓ 8.3.2019 | ACCEPTACIÓ 18.9.2019 Medieval Hebrew texts and European river names Ephraim Nissan London [email protected] Abstract: The first section of theBook of Yosippon (tenth-century Italy) maps the Table of Nations (Genesis 10) onto contemporary peoples and places and this text, replete with tantalizing onomastics, also includes many European river names. An extract can be found in Elijah Capsali’s chronicle of the Ottomans 1517. The Yosippon also includes a myth of Italic antiquities and mentions a mysterious Foce Magna, apparently an estuarine city located in the region of Ostia. The article also examines an onomastically rich passage from the medieval travelogue of Benjamin of Tudela, and the association he makes between the river Gihon (a name otherwise known in relation to the Earthly Paradise or Jerusalem) and the Gurganin or the Georgians, a people from the Caspian Sea. The river Gihon is apparently what Edmund Spenser intended by Guyon in his Faerie Queene. The problems of relating the Hebrew spellings of European river names to their pronunciation are illustrated in the case of the river Rhine. Key words: river names (of the Seine, Loire, Rhine, Danube, Volga, Dnieper, Po, Ticino, Tiber, Arno, Era, Gihon, Guyon), Kiev, medieval Hebrew texts, Book of Yosippon, Table of Nations (Genesis 10), historia gentium, mythical Foce Magna city, Benjamin of Tudela, Elijah Capsali, Edmund Spenser Textos hebreus medievals i noms de rius europeus Resum: L’inici del Llibre de Yossippon (Itàlia, segle X) relaciona la «taula de les nacions» de Gènesi 10 amb pobles i llocs contemporanis, i aquest text, ple de propostes onomàstiques temptadores, també inclou noms fluvials europeus. -
Pashto Five Reasons Why You Should Pashto Learn More About Pashtuns سالم
SOME USEFUL PHRASES IN PASHTO FIVE REASONS WHY YOU SHOULD PASHTO LEARN MORE ABOUT PASHTUNS سﻻم. زما نوم جان دئ. [saˈlɔːm zmɔː num ʤɔːn dǝɪ] Uzbeks are the most numerous Turkic /salām. zmā num jān dǝy./ AND THEIR LANGUAGE Hi. My name is John. 1. Pashto is spoken as a first or second language by people in Central Asia. They predomi- over 40 million people worldwide, but the highest nantly mostly live in Uzbekistan, a land- population of speakers are located in Afghanistan ستاسو نوم څه دئ؟ [ˈstɔːso num ʦǝ dǝɪ] and Pakistan, with smaller populations in other locked country of Central Asia that shares /stāso num ʦǝ dǝy?/ Central Asian and Middle Eastern countries such borders with Kazakhstan to the west and What is your name? as Tajikistan and Iran. north, Kyrgyzstan and Tajikistan to the ,A member of the Indo-Iranian language family .2 تاسې څنګه یاست؟ زه ښه یم، مننه. [ˈʦǝŋgɒ jɔːst zǝ ʂɒ jǝm mɒˈnǝnɒ] Pashto shares many structural similarities to east, and Afghanistan and Turkmenistan /ʦǝnga yāst? zǝ ṣ̆ a yǝm, manǝna./ languages such as Dari, Farsi, and Tajiki. to the south. Many Uzbeks can also be How are you? I’m fine, thanks. 3. Because of US involvement with Afghanistan over found in Afghanistan, Kazakhstan, Kyr- the past decade, those who study Pashto can find careers in a variety of fields including translation gyzstan, Tajikistan, Turkmenistan and the ستاسو دلیدو څخه خوشحاله شوم. [ˈstɔːso dǝˈliːdo ˈʦǝχa χoʃˈhɔːla ʃwǝm] and interpreting, consulting, foreign service and Xinjiang Uyghur Autonomous Region of /stāso dǝ-lido ʦǝxa xoš-hāla šwǝm./ intelligence, journalism, and many others. -
Doctor of Philosophy
DOCTOR OF PHILOSOPHY Linguistic identifiers of L1 Persian speakers writing in English NLID for authorship analysis Ria Perkins 2014 Aston University Some pages of this thesis may have been removed for copyright restrictions. If you have discovered material in AURA which is unlawful e.g. breaches copyright, (either yours or that of a third party) or any other law, including but not limited to those relating to patent, trademark, confidentiality, data protection, obscenity, defamation, libel, then please read our Takedown Policy and contact the service immediately Linguistic Identifiers of L1 Persian speakers writing in English. NLID for Authorship Analysis. Ria Charlotte Perkins, M.A. Centre for Forensic Linguistics School of Languages and Social Sciences, Aston University A thesis submitted for the fulfilment of the degree of Doctor of Philosophy December, 2012 ©Ria Perkins, 2012 Ria Perkins asserts her moral right to be identified as the author of this thesis. This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with its author and that no quotation from the thesis and no information derived from it may be published without proper acknowledgement. Thesis Summary Institution: Aston University Title: Linguistic Identifiers of L1 Persian speakers writing in English. NLID for Authorship Analysis. Name: Ria Charlotte Perkins Degree: Doctor of Philosophy Year of Submission: 2012 Synopsis: This research focuses on Native Language Identification (NLID), and in particular, on the linguistic identifiers of L1 Persian speakers writing in English. This project comprises three sub-studies; the first study devises a coding system to account for interlingual features present in a corpus of L1 Persian speakers blogging in English, and a corpus of L1 English blogs. -
On Uyghur Relative Clauses
On Uyghur relative clauses Éva Á. Csató & Muzappar Abdurusul Uchturpani Csató, Éva Á. & Uchturpani, Muzappar Abdurusul 2010. On Uyghur relative clauses. Turkic Languages 14, 69-93. Two types of relative clauses are used in modern Uyghur: one in which the subject is in the Nominative and the other in which the subject is in the Genitive and the head noun bears possessive agreement. The article gives a concise account of the main characteris- tics of these and some functionally related constructions. The aim is to pave the way for more research on the issues involved. Éva Á. Csató & Muzappar Abdurusul Uchturpani, Department of Linguistics and Philol- ogy, Uppsala University, Box 635, SE-751 26 Uppsala, Sweden. E-mail: [email protected] and [email protected] Turkic relative constructions Turkic relative clauses are typically non-finite clauses based on a participle. The dominating type of relative clause is not marked for subject-predicate agreement (Csató 1996 and references given there). See Ex. 1. Ex. 1 Karachay-Balkar Non-marked relative clause konaχ kel-üčü üy guest come-PART room ‘a room where guests stay’ > ‘guestroom’ Some Turkic languages have developed relative constructions in which subject- predicate agreement is marked by a possessive suffix on the participle. In such con- structions the Genitive can be assigned to the subject. See the following examples. Ex. 2 Turkish Genitive relative clause with subject-predicate agreement kız-ıni uyu-duğ-ui oda girl-GEN sleep-DIK.PART room ‘a /the room where the girl sleeps’ Certain languages, for instance Turkmen, have a type of relative construction in which agreement between the Genitive subject of the relative clause and the head noun is marked; see Ex.