Methods and Tools for Weak Problems of Translation Muhammad Ghulam Abbas Malik

Total Page:16

File Type:pdf, Size:1020Kb

Methods and Tools for Weak Problems of Translation Muhammad Ghulam Abbas Malik Methods and Tools for Weak Problems of Translation Muhammad Ghulam Abbas Malik To cite this version: Muhammad Ghulam Abbas Malik. Methods and Tools for Weak Problems of Translation. Computer Science [cs]. Université Joseph-Fourier - Grenoble I, 2010. English. tel-00502192 HAL Id: tel-00502192 https://tel.archives-ouvertes.fr/tel-00502192 Submitted on 13 Jul 2010 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Université de Grenoble N° attribué par la bibliothèque /_ /_ /_ /_ /_ /_ /_ /_ /_ / THÈSE pour obtenir le grade de DOCTEUR DE L’UNIVERSITÉ DE GRENOBLE Spécialité: “INFORMATIQUE” (Computer Science) dans le cadre de l’École Doctorale “Mathématiques, Sciences et Technologie de l’Information, Informatique” (Doctoral School “Mathematics, Information Science and Technology, Computer Science”) présentée et soutenue publiquement par Muhammad Ghulam Abbas MALIK le 9 juillet 2010 Méthodes et outils pour les problèmes faibles de traduction (Methods and Tools for Weak Problems of Translation) Thèse dirigée par M. Christian BOITET Directeur de thèse M. Pushpak BHATTACHARYYA Codirecteur de thèse JURY M. Arthur SOUCEMARIANADIN Président Mme Violaine PRINCE Rapporteur M. Patrice POGNAN Rapporteur M. Eric WEHRLI Rapporteur M. Vincent BERMENT Examinateur M. Alain DÉSOULIÈRES Examinateur M. Christian BOITET Directeur M. Pushpak BHATTACHARYYA Codirecteur Préparée au laboratoire GETALP–LIG (CNRS–INPG–UJF–UPMF) Table of Contents Introduction ..................................................................................................................... 1 Part I ...................................................................................................................... 7 Scriptural Translation and other Weak Translation Problems ...................... 7 Chapter 1. Scriptural Translation ................................................................................. 9 1.1. Scriptural Translation, Transliteration and Transcription ................................................... 9 1.2. Machine Transliteration in Natural Language Processing (NLP) ....................................... 9 1.2.1. Grapheme-based models ................................................................................................ 10 1.2.2. Phoneme-based models .................................................................................................. 11 1.2.3. Hybrid and correspondence-based models ..................................................................... 11 1.3. Scriptural Translation ........................................................................................................ 12 1.3.1. Challenges and barriers within the same language ........................................................ 14 1.3.1.1. Scriptural divide .......................................................................................................... 14 1.3.1.2. Under-resourced and under-written languages ........................................................... 15 1.3.1.3. Absence of necessary information .............................................................................. 15 1.3.1.4. Different spelling conventions .................................................................................... 16 1.3.1.5. Transliterational or transcriptional ambiguities .......................................................... 17 1.3.2. Challenges and barriers within dialects .......................................................................... 17 1.3.2.1. Distinctive sound inventories ...................................................................................... 17 1.3.2.2. Lexical divergence and translational ambiguities ....................................................... 18 1.3.3. Challenges and barriers between closely related languages ........................................... 19 1.3.3.1. Characteristics ............................................................................................................. 19 1.3.3.2. Under-resourced dialect or language pairs .................................................................. 19 1.4. Approaches for Scriptural Translation .............................................................................. 20 1.4.1. Direct programming approaches .................................................................................... 20 1.4.2. Finite-state approaches ................................................................................................... 21 1.4.3. Empirical, machine learning and SMT approaches ....................................................... 21 1.4.4. Hybrid approaches ......................................................................................................... 21 1.4.5. Interactive approaches .................................................................................................... 22 Chapter 2. Scriptural Translation Using FSTs and a Pivot UIT .......................... 23 2.1. Scriptural Translation for Indo-Pak Languages ................................................................ 24 2.1.1. Scripts of Indo-Pak languages ........................................................................................ 24 2.1.1.1. Scripts based on the Persio-Arabic script .................................................................... 24 2.1.1.2. Indic scripts ................................................................................................................. 25 2.1.2. Analysis of Indo-Pak scripts for scriptural translation ................................................... 26 2.1.2.1. Vowel analysis ............................................................................................................ 26 2.1.2.2. Consonant analysis ...................................................................................................... 27 2.1.2.2.1. Aspirated consonants ............................................................................................... 27 2.1.2.2.2. Non-aspirated consonants ........................................................................................ 28 2.1.2.3. Diacritics analysis ....................................................................................................... 28 2.1.2.4. Punctuations, digits and special characters analysis ................................................... 29 2.1.3. Indo-Pak scriptural translation problems ....................................................................... 29 2.2. Universal Intermediate Transcription (UIT) ..................................................................... 29 2.2.1. What is UIT? .................................................................................................................. 30 2.2.2. Principles of UIT ............................................................................................................ 31 2.2.3. UIT mappings for Indo-Pak languages .......................................................................... 31 2.3. Finite-state Scriptural Translation model .......................................................................... 32 i 2.3.1. Finite-state transducers ................................................................................................... 32 2.3.1.1. Non-deterministic transducers ..................................................................................... 33 2.3.1.1.1. Character mappings .................................................................................................. 33 2.3.1.1.2. Contextual mappings ................................................................................................ 37 2.3.1.2. Deterministic transducers ............................................................................................ 37 2.3.2. Finite-state model architecture ....................................................................................... 38 2.4. Experiments and results .................................................................................................... 39 2.4.1. Test data ......................................................................................................................... 39 2.4.2. Results and discussion .................................................................................................... 41 2.5. Conclusion ......................................................................................................................... 47 Part II .................................................................................................................. 49 Statistical, Hybrid and Interactive Models for Scriptural Translation ........ 49 Chapter 3. Empirical and Hybrid Methods for Scriptural Translation ............... 51 3.1. SMT Model for Scriptural Translation .............................................................................. 51 3.1.1. Training data .................................................................................................................. 52 3.1.2. Data alignments .............................................................................................................
Recommended publications
  • Urdu Alphabet 1 Urdu Alphabet
    Urdu alphabet 1 Urdu alphabet Urdu alphabet ﺍﺭﺩﻭ ﺗﮩﺠﯽ Example of writing in the Urdu alphabet: Urdu Type Abjad Languages Urdu, Balti, Burushaski, others Parent systems Proto-Sinaitic • Phoenician • Aramaic • Nabataean • Arabic • Perso-Arabic • Urdu alphabet ﺍﺭﺩﻭ ﺗﮩﺠﯽ [1] Unicode range U+0600 to U+06FF [2] U+0750 to U+077F [3] U+FB50 to U+FDFF [4] U+FE70 to U+FEFF Urdu alphabet ﮮ ﯼ ء ﮪ ﻩ ﻭ ﻥ ﻡ ﻝ ﮒ ﮎ ﻕ ﻑ ﻍ ﻉ ﻅ ﻁ ﺽ ﺹ ﺵ ﺱ ﮊ ﺯ ﮌ ﺭ ﺫ ﮈ ﺩ ﺥ ﺡ ﭺ ﺝ ﺙ ﭦ ﺕ ﭖ ﺏ ﺍ Extended Perso-Arabic script • History • Diacritics • Hamza • Numerals • Numeration The Urdu alphabet is the right-to-left alphabet used for the Urdu language. It is a modification of the Persian alphabet, which is itself a derivative of the Arabic alphabet. With 38 letters and no distinct letter cases, the Urdu alphabet is typically written in the calligraphic Nasta'liq script, whereas Arabic is more commonly in the Naskh style. Usually, bare transliterations of Urdu into Roman letters (called Roman Urdu) omit many phonemic elements that have no equivalent in English or other languages commonly written in the Latin script. The National Language Authority of Pakistan has developed a number of systems with specific notations to signify non-English sounds, but ﺥ ﻍ ﻁ these can only be properly read by someone already familiar with Urdu, Persian, or Arabic for letters such as Urdu alphabet 2 [citation needed].ﮌ and Hindi for letters such as ﻕ or ﺹ ﺡ ﻉ ﻅ ﺽ History The Urdu language emerged as a distinct register of Hindustani well before the Partition of India, and it is distinguished most by its extensive Persian influences (Persian having been the official language of the Mughal government and the most prominent lingua franca of the Indian subcontinent for several centuries prior to the solidification of British colonial rule during the 19th century).
    [Show full text]
  • Grammatical Analysis of Nastalique Writing Style of Urdu
    Grammatical Analysis of Nastalique Writing Style of Urdu 1 Historical Note Nastalique is one of the most intricate styles used for Arabic script, which makes it both beautiful and complex to model. This analysis has been conducted as part of the development process of Nafees Nastalique Font. The work has been conducted in 2002 and is being released for the general development of Nastalique writing style. The work has been supported by APDIP UNDP and IDRC. Authors Sarmad Hussain Shafiq ur Rahman Aamir Wali Atif Gulzar Syed Jamil ur Rahman December, 2002 2 Table of Contents HISTORICAL NOTE ............................................................................................................................................. 2 1. THE NASTALIQUE STYLE: AN INTRODUCTION .................................................................................... 5 2. URDU SCRIPT .................................................................................................................................................... 5 2.1. THE URDU ALPHABET .................................................................................................................................... 5 2.2. MAPPING BETWEEN NASTALIQUE AND URDU CHARACTERS ........................................................................... 6 2.3. BUILDING BLOCKS FOR AN URDU FONT ......................................................................................................... 7 2.3.1 Urdu Characters ....................................................................................................................................
    [Show full text]
  • 16-Sanskrit-In-JAPAN.Pdf
    A rich literary treasure of Sanskrit literature consisting of dharanis, tantras, sutras and other texts has been kept in Japan for nearly 1400 years. Entry of Sanskrit Buddhist scriptures into Japan was their identification with the central axis of human advance. Buddhism opened up unfathomed spheres of thought as soon as it reached Japan officially in AD 552. Prince Shotoku Taishi himself wrote commentaries and lectured on Saddharmapundarika-sutra, Srimala- devi-simhanada-sutra and Vimala-kirt-nirdesa-sutra. They can be heard in the daily recitation of the Japanese up to the day. Palmleaf manuscripts kept at different temples since olden times comprise of texts which carry immeasurable importance from the viewpoint of Sanskrit philology although some of them are incomplete Sanskrit manuscripts crossed the boundaries of India along with the expansion of Buddhist philosophy, art and thought and reached Japan via Central Asia and China. Thousands of Sanskrit texts were translated into Khotanese, Tokharian, Uigur and Sogdian in Central Asia, on their way to China. With destruction of monastic libraries, most of the Sanskrit literature perished leaving behind a large number of fragments which are discovered by the great explorers who went from Germany, Russia, British India, Sweden and Japan. These excavations have uncovered vast quantities of manuscripts in Sanskrit. Only those manuscripts and texts have survived which were taken to Nepal and Tibet or other parts of Asia. Their translations into Tibetan, Chinese and Mongolian fill the gap, but partly. A number of ancient Sanskrit manuscripts are strewn in the monasteries nestling among high mountains and waterless deserts.
    [Show full text]
  • Languages of New York State Is Designed As a Resource for All Education Professionals, but with Particular Consideration to Those Who Work with Bilingual1 Students
    TTHE LLANGUAGES OF NNEW YYORK SSTATE:: A CUNY-NYSIEB GUIDE FOR EDUCATORS LUISANGELYN MOLINA, GRADE 9 ALEXANDER FFUNK This guide was developed by CUNY-NYSIEB, a collaborative project of the Research Institute for the Study of Language in Urban Society (RISLUS) and the Ph.D. Program in Urban Education at the Graduate Center, The City University of New York, and funded by the New York State Education Department. The guide was written under the direction of CUNY-NYSIEB's Project Director, Nelson Flores, and the Principal Investigators of the project: Ricardo Otheguy, Ofelia García and Kate Menken. For more information about CUNY-NYSIEB, visit www.cuny-nysieb.org. Published in 2012 by CUNY-NYSIEB, The Graduate Center, The City University of New York, 365 Fifth Avenue, NY, NY 10016. [email protected]. ABOUT THE AUTHOR Alexander Funk has a Bachelor of Arts in music and English from Yale University, and is a doctoral student in linguistics at the CUNY Graduate Center, where his theoretical research focuses on the semantics and syntax of a phenomenon known as ‘non-intersective modification.’ He has taught for several years in the Department of English at Hunter College and the Department of Linguistics and Communications Disorders at Queens College, and has served on the research staff for the Long-Term English Language Learner Project headed by Kate Menken, as well as on the development team for CUNY’s nascent Institute for Language Education in Transcultural Context. Prior to his graduate studies, Mr. Funk worked for nearly a decade in education: as an ESL instructor and teacher trainer in New York City, and as a gym, math and English teacher in Barcelona.
    [Show full text]
  • Pronunciation of Alphabets: Educate the Children Scientifically M
    11459 M. Imran Qadir/ Elixir Edu. Tech. 52 (2012) 11459-11460 Available online at www.elixirpublishers.com (Elixir International Journal) Educational Technology Elixir Edu. Tech. 52 (2012) 11459-11460 Pronunciation of alphabets: educate the children scientifically M. Imran Qadir College of Pharmacy, GC University, Faisalabad, Pakistan ARTICLE INFO ABSTRACT Article history: Education may be regarded both as a science and an art since it consists of theoretical as Received: 18 September 2012; well as practical knowledge and skills derived through various artistic and scientific Received in revised form: methods. Like other science subjects, education also needs logic. So educate the children 15 November 2012; scientifically by teaching them the exact pronunciations of the alphabets. An English Accepted: 15 November 2012; alphabet should be pronounced exactly as it is pronounced in a word known as “Qadir’s Pronunciation” of Alphabet. Similarly, Urdu Hija should also be pronounced exactly as it is Keywords pronounced in a word known as “Qadir’s Pronunciation” of Urdu Hija. Qadir’s Pronunciation, © 2012 Elixir All rights reserved. English alphabets, Urdu Hijay. Introduction Similarly, he will pronounce “go” as “geeo”; “so” as Science is a systematic and precise body of knowledge in a “aeso”; “no” as “en-o”; and “ant” as “a-en-tee”, etc. particular field of the world. It seeks to discover the general laws He once again has to learn the exact pronunciations of all regulating the phenomena in that field through observation and the alphabets. experiments. As per this definition, education must be taken as a And waste time!! science since it is a systematic body of knowledge accumulated But if he memorizes the exact pronunciations, known as through ages by observation and experiments.
    [Show full text]
  • Title: Exploring Historical Letterforms to Design Unique Assamese Typeface for Digital Devices: Experimenting Possibilities
    Experimental Typography http://www.typoday.in Title: Exploring historical letterforms to design unique Assamese typeface for digital devices: Experimenting possibilities. Abhijit Padun, Central Institute of Technology Kokrajhar, Assam, India, [email protected] Prof. Amarendra Kumar Das, Indian Institute of Technology Guwahati, India, [email protected] Abstract: Assamese script has a long history of evolution. In its path to modernization from the early printing era to current digital era, a considerable numbers of typefaces were developed. But all these typefaces have their root from Bengali typeface design due to its similarity in structure and availability of resources for Bengali typeface. However historical evidences claimed that both Assamese and Bengali scripts possess differences in writing styles which is still unexplored. This article discusses about designing a typeface for Assamese script which could show a clear connection with historical Assamese writing style. An experimentation has been carried out to study the possibility of this connection by taking a medieval Assamese writing style called “Garhgaya” as reference. As Garhgaya style was initiated and extensively implemented by Ahom dynasty who ruled Assam for six hundred years, hence this style has been considered for reference apart from having characteristics like simplicity, legibility and symmetry in letter formation. The objective of doing this experiment is to design a typeface which could be recognized as a unique Assamese typeface created for digital devices, that has a trace from historical Assamese writing style. Key words: Experimental typeface design, Assamese digital typeface, Assamese typeface for digital devices, Assamese script, Assamese letterform design. Typography Day 2019 1 1.
    [Show full text]
  • Korean Hankul and the Þp'ags-Pa Script
    Writine in the Altaic World studia orientalia 87, Helsinki 1999, pp. 79-t00 Korean Hankul and the þP'ags-pa script Roger Finch From the annals of the reign period of King Sejong of Korea, the fourth king of the Yi Dynasty, who ruled from 1418 to 1450, we learn that in the l2th month of the 25th year of his reign (1443), the king invented the Korean alphabet. This alphabet, now known as Hankul, was called Hunmin Ceng'im,'Right Sounds for Instructing the People'l at the time it was introduced, and later simply En-mun'Vernacular Script'. According to popular tradition, the shapes of the letters, most of which are rectilinear or angular, were inspired by the shapes in the fretwork on Korean windows. This writing system is a true alphabet, and not a syllabary, though signs for discrete sounds are organized into single blocks of signs, each representing a syllable in the way that Chinese characters are complexes made up of various strokes organized into single blocks, each of which represent.s a syllable; a syllabary is a writing system in which each combination of consonants with a syllabic core is repre- sented by a unique sign as, for example, the Japanese Kana (Hiragana and Katakana). Almost a century earlier, in 1260, in the year of his election to the position of Great Khan of the Mongols and Emperor of China,2 Khubilai Khan appointed National Preceptor a Tibetan monk named hP'ags-pa he had met seven years earlier, when hP'ags-pa was only nineteen, and commissioned him to create a new Mongolian script.
    [Show full text]
  • World Writing Systems Other European and Middle Eastern Scripts a Large
    Contemporary Linguistics: An Introduction, 5th edition, Chapter 16: World writing systems, 1 World writing systems Other European and Middle Eastern Scripts A large number of alphabetic systems other than those of Greece and Rome evolved and flourished in Europe and the Middle East. In this Web site, we briefly present some of the ones that are of historical significance or interest. Runic writing Germanic tribes occupying the north of Italy developed an early offshoot of the Greek/Etruscan tradition of writing into a script known as Runic writing. This system emerged shortly after the beginning of the Christian era, and its developments were eventually found as far north as Scandinavia. Runic writing persisted until the sixteenth century in some areas before giving way to the Roman alphabet. Figure 1 illustrates some signs from one of the oldest known Runic inscriptions, which dates from about the third century A.D. The angular style of the letters arose because the alphabet was carved in wood or stone, the former especially not readily lending itself to curved lines. The script is read from right to left. Cyrillic script Another offshoot of the Greek script was created for the Slavic peoples in the ninth century A.D. The Greek missionary brothers Constantine (Cyril) and Methodius introduced a writing system for the translation of the Bible that is now known as Glagolitic script. A later development, which combined adaptations of Glagolitic letters with Greek and Hebrew characters, has come to be known as the Cyrillic alphabet. The current Russian, Byelorussian, Ukrainian, Serbian, Macedonian, and Bulgarian alphabets, as well as those used to represent many non-Slavic languages spoken in the former Soviet Union, have evolved from this early Cyrillic script.
    [Show full text]
  • Urdu, Arabic, Hindi and English
    Wilfrid Laurier University Scholars Commons @ Laurier Theses and Dissertations (Comprehensive) 2018 Reading across Different Orthographies: Urdu, Arabic, Hindi and English Amna Mirza [email protected] Follow this and additional works at: https://scholars.wlu.ca/etd Part of the Bilingual, Multilingual, and Multicultural Education Commons, Educational Methods Commons, and the Language and Literacy Education Commons Recommended Citation Mirza, Amna, "Reading across Different Orthographies: Urdu, Arabic, Hindi and English" (2018). Theses and Dissertations (Comprehensive). 2059. https://scholars.wlu.ca/etd/2059 This Dissertation is brought to you for free and open access by Scholars Commons @ Laurier. It has been accepted for inclusion in Theses and Dissertations (Comprehensive) by an authorized administrator of Scholars Commons @ Laurier. For more information, please contact [email protected]. Running Head: READING ACROSS DIFFERENT ORTHOGRAPHIES Reading across Different Orthographies: Urdu, Arabic, Hindi and English By Amna Mirza, M.A. Dissertation Submitted to the Department of Psychology In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in Psychology Wilfrid Laurier University June 18th, 2018 © Amna Mirza READING ACROSS DIFFERENT ORTHOGRAPHIES ii Abstract Research on relationships across literacy skills for multiple languages suggests the need for a complex framework that includes linguistic typology as well as cognitive and cultural variables (Schwartz, Geva, Share, & Leikin, 2007). Literature shows that bilinguals activate both languages they know for all linguistic tasks regardless of which language is being used at the time (Kroll & Bialystok, 2013). In that case, learning a third or any additional language is qualitatively different than second language (L2) acquisition. Findings for readers of Roman scripts demonstrate that L1 reading and L2 proficiency influences L2 reading (Cummins, 1979).
    [Show full text]
  • The Secret of Letters: Chronograms in Urdu Literary Culture1
    Edebiyˆat, 2003, Vol. 13, No. 2, pp. 147–158 The Secret of Letters: Chronograms in Urdu Literary Culture1 Mehr Afshan Farooqi University of Virginia Letters of the alphabet are more than symbols on a page. They provide an opening into new creative possibilities, new levels of understanding, and new worlds of experience. In mature literary traditions, the “literal meaning” of literal meaning can encompass a variety of arcane uses of letters, both in their mode as a graphemic entity and as a phonemic activity. Letters carry hidden meanings in literary languages at once assigned and intrinsic: the numeric and prophetic, the cryptic and esoteric, and the historic and commemoratory. In most literary traditions there appears to be at least a threefold value system assigned to letters: letters can be seen as phonetic signs, they have a semantic value, and they also have a numerical value. Each of the 28 letters of the Arabic alphabet can be used as a numeral. When used numerically, the letters of the alphabet have a special order, which is called the abjad or abujad. Abjad is an acronym referring to alif, be, j¯ım, d¯al, the first four letters in the numerical order which, in the system most widely used, runs from alif to ghain. The abjad order organizes the 28 characters of the Arabic alphabet into eight groups in a linear series: abjad, havvaz, hutt¯ı, kalaman, sa`fas, qarashat, sakhkha˙˙ z, zazzagh.2 In nearly every area where˙ ¨the¨ Arabic script ˙ was adopted, the abjad¨ ˙ ˙system gained popularity. Within the vast area in which the Arabic script was used, two abjad systems developed.
    [Show full text]
  • Report for the Berkeley Script Encoding Initiative
    Indonesian and Philippine Scripts and extensions not yet encoded or proposed for encoding in Unicode as of version 6.0 A report for the Script Encoding Initiative Christopher Miller 2011-03-11 Christopher Miller Report on Indonesian and the Philippine scripts and extensions Page 2 of 60 Table of Contents Introduction 4 The Philippines 5 Encoded script blocks 5 Tagalog 6 The modern Súlat Kapampángan script 9 The characters of the Calatagan pot inscription 12 The (non-Indic) Eskayan syllabary 14 Summary 15 Sumatra 16 The South Sumatran script group 16 The Rejang Unicode block 17 Central Malay extensions (Lembak, Pasemah, Serawai) 18 Tanjung Tanah manuscript extensions 19 Lampung 22 Kerinci script 26 Alleged indigenous Minangkabau scripts 29 The Angka bejagung numeral system 31 Summary 33 Sumatran post-Pallava or “Malayu” varieties 34 Sulawesi, Sumbawa and Flores islands 35 Buginese extensions 35 Christopher Miller Report on Indonesian and the Philippine scripts and extensions Page 3 of 60 The Buginese Unicode block 35 Obsolete palm leaf script letter variants 36 Luwu’ variants of Buginese script 38 Ende script extensions 39 Bimanese variants 42 “An alphabet formerly adopted in Bima but not now used” 42 Makassarese jangang-jangang (bird) script 43 The Lontara’ bilang-bilang cipher script 46 Old Minahasa script 48 Summary 51 Cipher scripts 52 Related Indian scripts 52 An extended Arabic-Indic numeral shape used in the Malay archipelago 53 Final summary 54 References 55 1. Introduction1 A large number of lesser-known scripts of Indonesia and the Philippines are not as yet represented in Unicode. Many of these scripts are attested in older sources, but have not yet been properly documented in the available scholarly literature.
    [Show full text]
  • Rare Collection About Kashmir: Survey and Documentation
    RARE COLLECTION ABOUT KASHMIR: SURVEY AND DOCUMENTATION Rosy Jan Shahina Islam Uzma Qadri Department of Library and Information Science University of Kashmir India ABSTRACT Kashmir has been a fascinating subject for authors and analysts. Volumes have been documented and published about its multi-faceted aspects in varied forms like manuscripts, rare books and images available in a number of institutions, libraries and museums worldwide. The study explores the institutions and libraries worldwide possessing rare books (published before 1920) about Kashmir using online survey method and documents their bibliographical details. The study aims to analyze subject, chronology and country wise collection strength. The study shows that the maximum collection of the rare books is on travelogue 32.48% followed by Shaivism1 8.7%. While as the collection on other subjects lies in the range of 2.54%-5.53% with least of 2.54% on Grammar. Literature of 20th century is preserved by maximum of libraries (53.89%) followed by 19th century (44.93%), 18th century (1.08%) and 17th century (0.09%) and none of the library except Cambridge University library possesses a publication of 17th century. The treasure of rare books lies maximum in United States of America (56.7%) followed by Great Britain (35%), Canada(6%), Australia (1.8%) with least in Thailand (0.45%). Keywords: Rare Books; Rare Books Collections; Kashmir; Manuscripts; Paintings. 1 INTROUCTION Kashmir has been a center of attraction for philosophers and litterateurs, beauty seekers and people of different interests from centuries. It is not only because of the scenic beauty of its snow caped mountain ranges, myriad lakes, changing hues of majestic china’s and crystal clear springs but also because of its rich cultural heritage and contribution to Philosophy, Religion, Literature, Art and Craft (KAW, https://doi.org/10.36311/1981-1640.2012.v6n1.05.p50 50 BJIS, Marília (SP), v.6, n.1, p.50-61, Jan./Jun.
    [Show full text]