Naila Habib Khan

Total Page:16

File Type:pdf, Size:1020Kb

Naila Habib Khan LIGATURE RECOGNITION SYSTEM FOR PRINTED URDU SCRIPT USING GENETIC ALGORITHM BASED HIERARCHICAL CLUSTERING A Thesis Submitted to the Faculty of the Institute of Management Sciences, Peshawar in Partial Fulfillment of the Requirements for the Degree of DOCTOR OF PHILOSOPHY COMPUTER SCIENCE By NAILA HABIB KHAN DEPARTMENT OF COMPUTER SCIENCE INSTITUTE OF MANAGEMENT SCIENCES PESHAWAR, PAKISTAN SESSION 2014-2017 This is to certify that the research work presented in this thesis entitled “Ligature Recognition System for Printed Urdu Script Using Genetic Algorithm Based Hierarchical Clustering” was conducted by Naila Habib Khan under the supervision of Dr. Awais Adnan, Institute of Management Sciences, Peshawar, Pakistan. No part of this thesis has been submitted anywhere else for any other degree. This thesis is submitted to the Institute of Management Sciences, Peshawar in partial fulfilment of the requirements for the degree of Doctor of Philosophy in the field of Computer Science. Student Name: Naila Habib Khan Signature: ___________________________ Examination Committee: a) External Foreign Examiner 1: Dr. Yue Cao School of Computing and Communications, Lancaster University, UK Signature: ___________________________ b) External Foreign Examiner 2: Prof. Dr. Ibrahim A. Hameed Deputy Head of Research and Innovation Department of ICT and Natural Sciences, Norwegian University of Science and Technology, UK Signature: ___________________________ c) External Local Examiner: Dr. Saeeda Naz Head of Department/ Assistant Professor Govt. Girls Postgraduate College, Abbotabad, Pakistan Signature: ___________________________ ii d) Internal Local Examiner: Dr. Imran Ahmed Mughal Assistant Professor Institute of Management Sciences, Peshawar, Pakistan Signature: ___________________________ Supervisor: Dr. Awais Adnan Assistant Professor Institute of Management Sciences, Peshawar, Pakistan Signature: ___________________________ Director: Dr. Muhammad Mohsin Khan Institute of Management Sciences, Peshawar, Pakistan Signature: ___________________________ iii I, Naila Habib Khan, hereby declare that my Ph.D. thesis entitled, “Ligature Recognition System for Printed Urdu Script Using Genetic Algorithm Based Hierarchical Clustering” submitted to Research and Development Department (R&DD) by me is my own original work. I am aware of the fact that in case my work is found to be plagiarized or not genuine, R&DD has the full authority to cancel my research work and I am liable to the penal action. Naila Habib Khan July 5, 2019 iv I, solemly, declare that the research work presented in the thesis entitited, “Ligature Recognition System for Printed Urdu Script Using Genetic Algorithm Based Hierarchical Clustering” is soley my research work with no significant contribution from any other person. Small contribution whereever taken has been duly acknowledged and that the complete thesis has been written by me. I understand the zero-toelerance policy of the HEC and the Institute of Management Sciences, Peshawar, towards plagrism. Therefore, I as an author of the above-mentioned titled thesis declare that no portion of my thesis has been plagrised and any material used as a reference has been properly cited. I understand that if I am found guilty of any form of plagrism in the above-mentioned titled thesis even after award of Ph.D. degree, the institute reserves the rights to withdraw/revoke my Ph.D. degree and that HEC and the Institute has the right to publish my name on the HEC/Institute website on which names of students are placed who submitted plagrised thesis. Author’s Signature: _______________ Naila Habib Khan v This research is dedicated to my beloved parents. They give me strength when I am weak, they never let me fall and hold me up, see the best that is there in me, they are always there for me and stand by me. They have been an inspiration and a blessing for me. I am everything I am because I am loved by them. vi Firstly, all glory is to Allah Almighty Who blessed me with a strong will and determination to complete this research. I express my deepest gratitude to my supervisor Dr. Awais Adnan for his kind guidance, constant help and constructive feedback on my research. I especially thank him for his support throughout the research phase with patience. I really appreciate his input on my research, as it wouldn’t have been possible without his advice and assistance. I am also grateful to all the faculty members and colleagues of the Department of Computer Science, Institute of Management Sciences for their support, inspiration and encouragement during my PhD research work. Thank you to my friend, Sadia Basar, for always been there, encouraging me with my work throughout this tough route to PhD. Last but not the least, enormous thanks to my beloved parents, my dearest sisters, Asma Habib Khan and Nazma Habib Khan, and my dearest brothers, Imran Khan and Asif Khan, whose warm wishes, all-embracing backing, patience and prayers made the completion of this PhD research possible. I owe my gratitude to my elder sister Nazma Habib Khan for her constant advice and motivation. I would also like to thank my family members Iftikhar Anjum and Jane Agna Khan for their immense support. vii In this dissertation, a method has been presented for ligature-based recognition of printed Urdu Nastalique script. The proposed recognition system uses a genetic algorithm based hierarchical clustering approach for recognition of Urdu ligatures. The overall proposed Urdu ligature recognition system has been divided into six phases, pre-processing, segmentation, feature extraction, hierarchical clustering, classification rules, genetic algorithm optimization and recognition. In the first phase, the Urdu text line images are read one by one from the dataset. Next, in pre-processing the images are thresholded and noise is removed. Subsequently, an efficient and effective holistic approach algorithm is developed for the segmentation of the Urdu text lines into constituent ligatures. The proposed ligature segmentation algorithm is novel since its one of the first algorithm that doesn’t use the baseline information for ligature segmentation of Urdu script. Following, a unique set of fifteen hand-engineered features are extracted from the segmented ligature images. Out of these fifteen hand-engineered features, two are geometric features, nine are first-order statistical features and four are second-order statistical features. For data points distribution reduction, the features are hierarchically clustered and a total of 3645 classification rules are generated using simple IF-THEN statements. Since the rules are at an initial stage, genetic algorithm optimization is used for further refinement of the hierarchical clustering. The proposed genetic algorithm phase consists of population initialization, chromosome encoding, parent selection, crossover, mutation, fitness function and termination stages. Experiments conducted on the benchmark UPTI dataset for the proposed Urdu Nastalique ligature recognition system yields promising results. The proposed ligature segmentation algorithm achieves an accuracy of 99.86%, whereas, the genetic algorithm based hierarchical clustering approach achieves a ligature recognition rate of 96.72%. viii Table of Contents Certificate of Approval .................................................................................................. ii Author’s Declaration .................................................................................................... iv Plagrism Undertaking ................................................................................................... v Dedication ..................................................................................................................... vi Acknowledgments ........................................................................................................ vii Abstract....................................................................................................................... viii List of Figures ............................................................................................................. xiv List of Tables ............................................................................................................. xvii List of Abbreviations ................................................................................................ xviii Chapter 1. Introduction .......................................................................................... 1 1.1 Overview................................................................................................ 1 1.2 Motivation ............................................................................................. 2 1.3 Problem Statement ................................................................................ 2 1.3.1 Problem Description ............................................................................. 2 1.4 Goal and Objectives .............................................................................. 3 1.5 Research Contributions ........................................................................ 3 1.6 Thesis Structure .................................................................................... 5 1.7 Summary ............................................................................................... 6 Chapter 2. Background ........................................................................................... 7 2.1 History of OCR ....................................................................................
Recommended publications
  • ARTICLE Development of a Gold-Standard Pashto Dataset and a Segmentation App Yan Han and Marek Rychlik
    ARTICLE Development of a Gold-standard Pashto Dataset and a Segmentation App Yan Han and Marek Rychlik ABSTRACT The article aims to introduce a gold-standard Pashto dataset and a segmentation app. The Pashto dataset consists of 300 line images and corresponding Pashto text from three selected books. A line image is simply an image consisting of one text line from a scanned page. To our knowledge, this is one of the first open access datasets which directly maps line images to their corresponding text in the Pashto language. We also introduce the development of a segmentation app using textbox expanding algorithms, a different approach to OCR segmentation. The authors discuss the steps to build a Pashto dataset and develop our unique approach to segmentation. The article starts with the nature of the Pashto alphabet and its unique diacritics which require special considerations for segmentation. Needs for datasets and a few available Pashto datasets are reviewed. Criteria of selection of data sources are discussed and three books were selected by our language specialist from the Afghan Digital Repository. The authors review previous segmentation methods and introduce a new approach to segmentation for Pashto content. The segmentation app and results are discussed to show readers how to adjust variables for different books. Our unique segmentation approach uses an expanding textbox method which performs very well given the nature of the Pashto scripts. The app can also be used for Persian and other languages using the Arabic writing system. The dataset can be used for OCR training, OCR testing, and machine learning applications related to content in Pashto.
    [Show full text]
  • New Language Resources for the Pashto Language
    New language resources for the Pashto language Djamel Mostefa 1 , Khalid Choukri 1 , Sylvie Brunessaux 2 , Karim Boudahmane 3 1 Evaluation and Language resources Distribution Agency, France 2 CASSIDIAN, France 3 Direction Générale de l'Armement, France E-mail: [email protected], [email protected], [email protected], [email protected] Abstract This paper reports on the development of new language resources for the Pashto language, a very low-resource language spoken in Afghanistan and Pakistan. In the scope of a multilingual data collection project, three large corpora are collected for Pashto. Firstly a monolingual text corpus of 100 million words is produced. Secondly a 100 hours speech database is recorded and manually transcribed. Finally a bilingual Pashto-French parallel corpus of around 2 million is produced by translating Pashto texts into French. These resources will be used to develop Human Language Technology systems for Pashto with a special focus on Machine Translation. Keywords: Pashto, low-resource language, speech corpus, monolingual and multilingual text corpora, web crawling. other one being Dari) and one regional language in 1. Introduction Pakistan. There are very few corpora and Human Language The code assigned to the language by the ISO 639-3 Technology (HLT) services available for Pashto. No standard is [pus]. language resources for Pashto can be found in the According to the Ethnologue.com website, it is spoken by catalogues of LDC1 and ELRA2. around 20 million people and three main dialects are to be Pashto is a very low-resource language. Google doesn't considered: support Pashto in its search engine or translation services.
    [Show full text]
  • Pashto Alphabets
    LEARNING PASHTO Intensive Elementary & Secondary Pashto for Military and other Professionals by Dawood Azami Visiting Scholar Email: [email protected] The Middle East Studies Center (MESC) The Ohio State University, Columbus August 2009 1 Aims of the Course: *To provide a thorough introductory course in basic Pashto with the accent on practical spoken Pashto, coverage of grammar, familiarity with Pashto pronunciation, and essential vocabulary. *Ability to communicate within a range of situations and to handle simple survival situations (e.g. finding lodging, food, transportation etc.) *Ability to read the simple Pashto texts dealing with a variety of social and basic needs. In addition to author’s own command and expertise, a number of sources (books, both published and unpublished, journals, websites, etc.) have been consulted while preparing this material. Word of thanks: The author would like to thank Dr. Alam Payind, Director, Middle East Studies Center (MESC), and Melinda McClimans, Assistant Director, MESC. Their cooperation and assistance certainly made my stay in Columbus easier and enjoyable. Copy Right: This course material is for teaching of Pashto language at The Ohio State University. The author holds the copy right for any other use. The author intends to publish a modified version of the material as a book in the future. Please contact the author for more information. (Email: [email protected] ) 2 Contents I. Pashto Alphabet .................................................................................................................. 5 A. Pashto Sounds Similar to English ............................................................................... 7 B. Pashto Sounds Different from English ........................................................................ 7 C. Two letters pronounced differently in major Pashto dialects: ...................................... 8 D. Arabic Letters/ Sounds in Pashto .................................................................................. 8 II.
    [Show full text]
  • Medieval Hebrew Texts and European River Names Ephraim Nissan London [email protected]
    ONOMÀSTICA 5 (2019): 187–203 | RECEPCIÓ 8.3.2019 | ACCEPTACIÓ 18.9.2019 Medieval Hebrew texts and European river names Ephraim Nissan London [email protected] Abstract: The first section of theBook of Yosippon (tenth-century Italy) maps the Table of Nations (Genesis 10) onto contemporary peoples and places and this text, replete with tantalizing onomastics, also includes many European river names. An extract can be found in Elijah Capsali’s chronicle of the Ottomans 1517. The Yosippon also includes a myth of Italic antiquities and mentions a mysterious Foce Magna, apparently an estuarine city located in the region of Ostia. The article also examines an onomastically rich passage from the medieval travelogue of Benjamin of Tudela, and the association he makes between the river Gihon (a name otherwise known in relation to the Earthly Paradise or Jerusalem) and the Gurganin or the Georgians, a people from the Caspian Sea. The river Gihon is apparently what Edmund Spenser intended by Guyon in his Faerie Queene. The problems of relating the Hebrew spellings of European river names to their pronunciation are illustrated in the case of the river Rhine. Key words: river names (of the Seine, Loire, Rhine, Danube, Volga, Dnieper, Po, Ticino, Tiber, Arno, Era, Gihon, Guyon), Kiev, medieval Hebrew texts, Book of Yosippon, Table of Nations (Genesis 10), historia gentium, mythical Foce Magna city, Benjamin of Tudela, Elijah Capsali, Edmund Spenser Textos hebreus medievals i noms de rius europeus Resum: L’inici del Llibre de Yossippon (Itàlia, segle X) relaciona la «taula de les nacions» de Gènesi 10 amb pobles i llocs contemporanis, i aquest text, ple de propostes onomàstiques temptadores, també inclou noms fluvials europeus.
    [Show full text]
  • Pashto Five Reasons Why You Should Pashto Learn More About Pashtuns سالم
    SOME USEFUL PHRASES IN PASHTO FIVE REASONS WHY YOU SHOULD PASHTO LEARN MORE ABOUT PASHTUNS سﻻم. زما نوم جان دئ. [saˈlɔːm zmɔː num ʤɔːn dǝɪ] Uzbeks are the most numerous Turkic /salām. zmā num jān dǝy./ AND THEIR LANGUAGE Hi. My name is John. 1. Pashto is spoken as a first or second language by people in Central Asia. They predomi- over 40 million people worldwide, but the highest nantly mostly live in Uzbekistan, a land- population of speakers are located in Afghanistan ستاسو نوم څه دئ؟ [ˈstɔːso num ʦǝ dǝɪ] and Pakistan, with smaller populations in other locked country of Central Asia that shares /stāso num ʦǝ dǝy?/ Central Asian and Middle Eastern countries such borders with Kazakhstan to the west and What is your name? as Tajikistan and Iran. north, Kyrgyzstan and Tajikistan to the ,A member of the Indo-Iranian language family .2 تاسې څنګه یاست؟ زه ښه یم، مننه. [ˈʦǝŋgɒ jɔːst zǝ ʂɒ jǝm mɒˈnǝnɒ] Pashto shares many structural similarities to east, and Afghanistan and Turkmenistan /ʦǝnga yāst? zǝ ṣ̆ a yǝm, manǝna./ languages such as Dari, Farsi, and Tajiki. to the south. Many Uzbeks can also be How are you? I’m fine, thanks. 3. Because of US involvement with Afghanistan over found in Afghanistan, Kazakhstan, Kyr- the past decade, those who study Pashto can find careers in a variety of fields including translation gyzstan, Tajikistan, Turkmenistan and the ستاسو دلیدو څخه خوشحاله شوم. [ˈstɔːso dǝˈliːdo ˈʦǝχa χoʃˈhɔːla ʃwǝm] and interpreting, consulting, foreign service and Xinjiang Uyghur Autonomous Region of /stāso dǝ-lido ʦǝxa xoš-hāla šwǝm./ intelligence, journalism, and many others.
    [Show full text]
  • Doctor of Philosophy
    DOCTOR OF PHILOSOPHY Linguistic identifiers of L1 Persian speakers writing in English NLID for authorship analysis Ria Perkins 2014 Aston University Some pages of this thesis may have been removed for copyright restrictions. If you have discovered material in AURA which is unlawful e.g. breaches copyright, (either yours or that of a third party) or any other law, including but not limited to those relating to patent, trademark, confidentiality, data protection, obscenity, defamation, libel, then please read our Takedown Policy and contact the service immediately Linguistic Identifiers of L1 Persian speakers writing in English. NLID for Authorship Analysis. Ria Charlotte Perkins, M.A. Centre for Forensic Linguistics School of Languages and Social Sciences, Aston University A thesis submitted for the fulfilment of the degree of Doctor of Philosophy December, 2012 ©Ria Perkins, 2012 Ria Perkins asserts her moral right to be identified as the author of this thesis. This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognise that its copyright rests with its author and that no quotation from the thesis and no information derived from it may be published without proper acknowledgement. Thesis Summary Institution: Aston University Title: Linguistic Identifiers of L1 Persian speakers writing in English. NLID for Authorship Analysis. Name: Ria Charlotte Perkins Degree: Doctor of Philosophy Year of Submission: 2012 Synopsis: This research focuses on Native Language Identification (NLID), and in particular, on the linguistic identifiers of L1 Persian speakers writing in English. This project comprises three sub-studies; the first study devises a coding system to account for interlingual features present in a corpus of L1 Persian speakers blogging in English, and a corpus of L1 English blogs.
    [Show full text]
  • Foundations of Reading
    FOUNDATIONS OF READING B.ED (1.5 YEAR) Course Code: (8634) Units: 1–9 Faculty of Education Department of Early Childhood Education and Elementary Teacher Education ALLAMA IQBAL OPEN UNIVERSITY, ISLAMABAD (Copyright © 2019 AIOU Islamabad) All rights reserved. No part of this publication may be reproduced, stored in retrieval system or transmitted in any form or by any means, electronic, mechanical, photocopying recording, scanning or otherwise, except as permitted under AIOU copyright ACT. 1st Edition ......................................... 2019 Quantity ........................................... 1000 Composed by ................................... Mateen Ishfaq Printer .............................................. Allama Iqbal Open University, Islamabad. Publisher .......................................... Allama Iqbal Open University, Islamabad. ii COURSE TEAM Chairman Course Team: Dr. Fazal Ur Rahman Associate Professor Allama Iqbal Open University, Islamabad Course Development Coordinator: Ms. Mubeshera Tufail Memebers: 1. Dr. Muhammad Athar Hussain Assistant Professor Allama Iqbal Open University, Islamabad 2. Dr. Rehmatullah Bhatti Assistant Professor Allama Iqbal Open University, Islamabad 3. Dr. Muhammad Samiullah Assistant Professor Allama Iqbal Open University, Islamabad 4. Dr. Syed Nasir Hussain Lecturer Allama Iqbal Open University, Islamabad 5. Dr. Saira Maqbool Assistant Professor Allama Iqbal Open University, Islamabad 6. Mr. Salman Khalil Lecturer Allama Iqbal Open University, Islamabad 7. Ms. Irsa Zahoor Allama Iqbal Open University, Islamabad 8. Ms. Memoona Bibi International Islamic University, Islamabad 9. Ms. Fauzia Saeed 10. Ms. Mubeshera Tufail Lecturer Allama Iqbal Open University, Islamabad Reviewers: 1. Dr. Fazal Ur Rahman 2. Dr. Muhammad Athar Hussain 3. Ms. Mubeshera Tufail Layout & Design: Malik Mateen Ishfaq iii ACKNOWLEDGEMENTS The purpose of language learning in early years is to develop language skills across four modalities: listening, speaking, reading and writing with a particular emphasis on latter two skills.
    [Show full text]
  • Multilingual Fonts for Arabic Script Jamil Khan* Department of Computer Science, University of Peshawar, Pakistan
    chnology Te & n S o o ti ft a w a m r r Khan, J Inform Tech Softw Eng 2015, 5:3 e o f E Journal of n n I g f DOI: 10.4172/2165-7866.1000154 i o n l e a e n r r i n u g o J ISSN: 2165-7866 Information Technology & Software Engineering Research Article Open Access Multilingual Fonts for Arabic Script Jamil Khan* Department of Computer Science, University of Peshawar, Pakistan Abstract This paper is about developing a multi-lingual font for Pashto, Arabic, Urdu and Persian in such a way that the text for these four languages can be typed in it. Main purpose for developing fonts was to handle and remove many problems faced in earlier fonts. Fonts developed as a part of this work are ligature-based and text for all four languages (Pashto, Arabic, Urdu, and Persian) can be written in a single font. The paper is structured as follows: Shortcomings of existing fonts based on Arabic script and some text editors are discussed in Section-I. Section-II discusses the proposed solution to all those limitations. In Section-III, is about implementation of suggested solution, and Section-IV discusses features of the suggested solution. Section-V concludes the paper and Section-VI discusses future work discussed for Arabic typography. Keywords: Component; Formatting; Style; Styling; Insert manually in Inpage or sometimes with the help of other software such as CorelDraw. Introduction Liwal pashto system for windows During this work, some existing fonts based on Arabic script, including Pashto, Urdu, and Persian, are used in different bilingual text Liwal Pashto System–introduced in 2004 for Windows-98 and editors, and their limitations are pointed out.
    [Show full text]
  • The Writing Revolution
    9781405154062_1_pre.qxd 8/8/08 4:42 PM Page iii The Writing Revolution Cuneiform to the Internet Amalia E. Gnanadesikan A John Wiley & Sons, Ltd., Publication 9781405154062_1_pre.qxd 8/8/08 4:42 PM Page iv This edition first published 2009 © 2009 Amalia E. Gnanadesikan Blackwell Publishing was acquired by John Wiley & Sons in February 2007. Blackwell’s publishing program has been merged with Wiley’s global Scientific, Technical, and Medical business to form Wiley-Blackwell. Registered Office John Wiley & Sons Ltd, The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, United Kingdom Editorial Offices 350 Main Street, Malden, MA 02148-5020, USA 9600 Garsington Road, Oxford, OX4 2DQ, UK The Atrium, Southern Gate, Chichester, West Sussex, PO19 8SQ, UK For details of our global editorial offices, for customer services, and for information about how to apply for permission to reuse the copyright material in this book please see our website at www.wiley.com/wiley-blackwell. The right of Amalia E. Gnanadesikan to be identified as the author of this work has been asserted in accordance with the Copyright, Designs and Patents Act 1988. All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, recording or otherwise, except as permitted by the UK Copyright, Designs and Patents Act 1988, without the prior permission of the publisher. Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be available in electronic books. Designations used by companies to distinguish their products are often claimed as trademarks.
    [Show full text]
  • Free Foreign Language Teacher Resources from the Language Resource Centers
    Free Foreign Language Teacher Resources from the Common LRC Website Language Resource Centers http://www.nflrc.org Download a PDF with hotlinks from the LRC Common Website at http://www.nflrc.org/lrc_broc_full.pdf Visit the LRC Common Website for regular updates and additional information. Visit the LRC Common Website for information on summer workshops, institutes, and scholarship opportunities Last updated 09/2013 Free Resources from Language Resource Centers For updated information visit the LRC common website at www.nflrc.org Page 1 of 26 Entries marked with * are new for 2012-2013 / Entries marked with ** have been expanded in 2012-2013 Teacher Guides & Tools – General Alphabet Charts Full color alphabet charts with transcription and sound examples, IPA symbols, and letter CeLCAR http://iub.edu/~celcar/language_informat names for Azerbaijani, Dari, Kazakh, Kyrgyz, Mongolian, Pashto, Tajiki, Turkmen, Uyghur, ional_materials.php Uzbek. Arabic Online Corpus Online corpus mostly from the Arabic press and search tools that provide students, teachers, NMELRC http://nmelrc.org/online-arabic-corpus and material developers powerful tools to access authentic language in context. Bringing the Standards to the Handbook for teachers on how to implement the standards in their classrooms. NFLRC – K12 archived Classroom: A Teacher’s Guide http://web.archive.org/web/2010110314 5446/http://nflrc.iastate.edu/pubs/standa rds/standards.html **California Subject Examinations for These materials attempt to provide CSET-Arabic takers with basic review notes and practice LARC http://larc.sdsu.edu/downloads/CSET/CSE Teachers (CSET): Arabic Language questions covering solely domains in Subtest I: General Linguistics and Linguistics of the TArabicLanguagePowerPoint.pdf Preparation Material Target Language, and Literary and Cultural Texts and Traditions.
    [Show full text]
  • A Critical Study of the Phonology of a Sub- Variety of Pakistani English Under the Influence of Pashto
    A Critical Study of the Phonology of a Sub- Variety of Pakistani English under the Influence of Pashto By Ayyaz Mahmood NATIONAL UNIVERSITY OF MODERN LANGUAGES ISLAMABAD December 2013 A Critical Study of the Phonology of a Sub-Variety of Pakistani English under the Influence of Pashto By Ayyaz Mahmood A THESIS SUBMITTED IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY In English Linguistics To FACULTY OF HIGHER STUDIES NATIONAL UNIVERSITY OF MODERN LANGUAGES, ISLAMABAD December 2013 Ayyaz Mahmood, 2013 iii THESIS/DISSERTATION AND DEFENSE APPROVAL FORM The undersigned certify that they have read the following thesis, examined the defense, are satisfied with the overall exam performance, and recommend the thesis to the Faculty of Higher Studies: Thesis Title: A Critical Study of the Phonology of a Sub-Variety of Pakistani English under the Influence of Pashto Submitted By: Ayyaz Mahmood Name of Student Registration #: 269-MPhil/Eng/2007(Jan) Doctor of Philosophy Degree Name in Full English Linguistics Name of Discipline Professor Dr Aziz Ahmad Khan Name of Research Supervisor Signature of Research Supervisor Professor Dr Shazra Mnnawer Name of Dean (FHS) Signature of Dean (FHS) Maj General (R) Masood Hasan Name of Rector Signature of Rector Date iv CANDIDATE DECLARATION FORM I Mr Ayyaz Mahmood Son of Mr Sultan Mahmood Registration # 269-MPhil/Eng/2007 Discipline: English Linguistics Candidate of Doctor of Philosophy at the National University of Modern Languages do hereby declare that the thesis titled A Critical Study of the Phonology of a Sub-Variety of Pakistani English under the Influence of Pashto submitted by me in partial fulfillment of PhD degree in the Department of Advanced Integrated Studies and Research, NUML, is my original work, and has not been submitted or published earlier.
    [Show full text]
  • Implications of the Unicode Arabic Model for the Warsh Orthography
    Implications of the Unicode Arabic model for the Warsh orthography Roozbeh Pournader, Google Inc. August 7, 2014 Summary In Evans 2014, the Warsh orthograph of Arabic is discussed, and three alternatives are suggested for an standard way to support the orthography in Unicode. This document documents the model followed by existing Arabic characters and recommends a way forward. The recent history of the present model Most of the Arabic letters encoded in Unicode follow a relatively simple model. Although the original model wasn’t very clear, the model was clarified, expanded and codified based on Pournader 2009, Mansour 2010, Priest and Hosken 2010, Pournader 2011, Pournader 2013 (among various other proposals), and the related UTC discussions happening around them. Pournader 2009 led to a clarified Arabic model that made the i’jam patterns of Arabic letters normative by splitting Joining_Group classes that had different dot patterns into different Joining_Groups, creating two new Joining_Groups FARSI YEH and NYA. The author was awarded the Bulldog Award in 2009 partially based on that work. The non-acceptance of the Mansour 2010 proposal about Yeh Barree, the various proposals for encoding Arabic digraphs considered letters in Turkic languages of China, and the various proposals asking for a unified hamza character with complex language-dependent contextual shaping was another piece of the puzzle. The UTC decisions made it clear that the Arabic letters in Unicode are intended to be stable, and do not always reflect the users’ or the linguists’ perception of a pedagogical, orthographic, or linguistic “letter” in a one-to-one mapping. Priest and Hosken 2010 and Pournader 2011 led to a clarification of what the UTC would encode as new Arabic letters, and what it would consider already encoded as a sequence of already existing characters.
    [Show full text]