B.A. 5 Semester (Honours) Examination, 2021

Copy Link

th B.A. 5 Semester (Honours) Examination, 2021 SANTALI Course ID: ID: 51011 Paper: AH/SNT-501C Course Title: Modern One Act Play Full Marks: 40 Time: 2 Hours The figures in the margin indicate full marks. Candidates are required to give their answers in their own words as far as practicable. [Candidates should be write answer Script in Santali (ol-chiki)] 1. jaNha ge moMe gotaf kukli renag Tela ol me| 5 X 2=10 o) sirjon gayan lekaTe – nowa sirjon re, nowa jegeND re manowa kowag asol TeTeD Do ceD kana? T) lukHi Do oka gayan ren curiT kanay? ona gayan ren onoliya. Do okoy?| g) po-pofza gayan re hara juwa.n koza ko okoyag nahag sereq aNjom Te daNda hilaw leDa? f) regeNj jala gayan re menay pa.hil kisa.z monOTOs mardi ren era ar hopon ag quTum Do ceD Tahe kana? l) miT kiTa. Gayan Do oka ko meTag kana ol me | a) miD bigHa. hasa Do okoykin Tala re ha:tiqog kan Tahe kana ol me| k) po pofza gayan Do okoy olag ar oka ha:tiq gayan kana? j) miD kiTa. Gayan ar maraf gayan Talare barya pHarak ol me | Page 1 of 2 2. jaNha ge punya kukli renag Tela ol me| 4 X 5=20 o) sirpa. Gayan re onoliya. Do oka lekan sirpa. Renag kaTHay men akaDa kHatoTe ol me| T) miD bigHa. hasa gayan TalaTe onoliya ceD saNDes e cal akaDa kHato Te ol me| g) po pofza gayan renag sar kaTHa ol me| f) reNgej jala gayan re , reNgej okTo re cHa.Tik ja.Tiko sanTaz ko oka leka ko rodojeD ko Taheg ol me| l) sirjon gayan re giTil oka leka Te porman leDay –manwa jiyon renag asol TeTED Do Dag kana menTe ol me| a) miD kiTa. gayan renag curiT ko ol soDor me| 3. jaNha ge miDtaf kukli renag Tela ol me| 1 X10=10 o) po_pofza gayan Tala Te onoliya. Do nahag po-pofza ko oka lekan saNDes e cal akaD kowa ol me | T) reNgej jala gayan re onoliya. Do reNgej jala renag oka lekan Dosa samaf akaDay ol me| ---0--- Page 2 of 2 .

Recommended publications

Malayalam Range: 0D00–0D7F

Malayalam Range: 0D00–0D7F This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
The Unicode Standard, Version 4.0--Online Edition

This PDF file is an excerpt from The Unicode Standard, Version 4.0, issued by the Unicode Consor- tium and published by Addison-Wesley. The material has been modified slightly for this online edi- tion, however the PDF files have not been modified to reflect the corrections found on the Updates and Errata page (http://www.unicode.org/errata/). For information on more recent versions of the standard, see http://www.unicode.org/standard/versions/enumeratedversions.html. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and Addison-Wesley was aware of a trademark claim, the designations have been printed in initial capital letters. However, not all words in initial capital letters are trademark designations. The Unicode® Consortium is a registered trademark, and Unicode™ is a trademark of Unicode, Inc. The Unicode logo is a trademark of Unicode, Inc., and may be registered in some jurisdictions. The authors and publisher have taken care in preparation of this book, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode®, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. Dai Kan-Wa Jiten used as the source of reference Kanji codes was written by Tetsuji Morohashi and published by Taishukan Shoten.
Punjabi Machine Transliteration Muhammad Ghulam Abbas Malik

Punjabi Machine Transliteration Muhammad Ghulam Abbas Malik To cite this version: Muhammad Ghulam Abbas Malik. Punjabi Machine Transliteration. 21st international Conference on Computational Linguistics (COLING) and the 44th Annual Meeting of the ACL, Jul 2006, Sydney, France. pp.1137-1144. hal-01002160 HAL Id: hal-01002160 https://hal.archives-ouvertes.fr/hal-01002160 Submitted on 15 Jan 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Punjabi Machine Transliteration M. G. Abbas Malik Department of Linguistics Denis Diderot, University of Paris 7 Paris, France [email protected] Transliteration refers to phonetic translation Abstract across two languages with different writing sys- tems (Knight & Graehl, 1998), such as Arabic to Machine Transliteration is to transcribe a English (Nasreen & Leah, 2003). Most prior word written in a script with approximate work has been done for Machine Translation phonetic equivalence in another lan- (MT) (Knight & Leah, 97; Paola & Sanjeev, guage. It is useful for machine transla- 2003; Knight & Stall, 1998) from English to tion, cross-lingual information retrieval, other major languages of the world like Arabic, multilingual text and speech processing. Chinese, etc. for cross-lingual information re- Punjabi Machine Transliteration (PMT) trieval (Pirkola et al, 2003), for the development is a special case of machine translitera- of multilingual resources (Yan et al, 2003; Kang tion and is a process of converting a word & Kim, 2000) and for the development of cross- from Shahmukhi (based on Arabic script) lingual applications.
An Introduction to Indic Scripts

An Introduction to Indic Scripts Richard Ishida W3C [email protected] HTML version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.html PDF version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.pdf Introduction This paper provides an introduction to the major Indic scripts used on the Indian mainland. Those addressed in this paper include specifically Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. I have used XHTML encoded in UTF-8 for the base version of this paper. Most of the XHTML file can be viewed if you are running Windows XP with all associated Indic font and rendering support, and the Arial Unicode MS font. For examples that require complex rendering in scripts not yet supported by this configuration, such as Bengali, Oriya, and Malayalam, I have used non- Unicode fonts supplied with Gamma's Unitype. To view all fonts as intended without the above you can view the PDF file whose URL is given above. Although the Indic scripts are often described as similar, there is a large amount of variation at the detailed implementation level. To provide a detailed account of how each Indic script implements particular features on a letter by letter basis would require too much time and space for the task at hand. Nevertheless, despite the detail variations, the basic mechanisms are to a large extent the same, and at the general level there is a great deal of similarity between these scripts. It is certainly possible to structure a discussion of the relevant features along the same lines for each of the scripts in the set.
Data Issues in English-To-Hindi Machine Translation

Data Issues in English-to-Hindi Machine Translation Ondřej Bojar, Pavel Straňák, Daniel Zeman Univerzita Karlova v Praze, Ústav formální a aplikované lingvistiky Malostranské náměstí 25, CZ-11800 Praha {bojar|stranak|zeman}@ufal.mff.cuni.cz http://ufal.mff.cuni.cz/umc/ Abstract Statistical machine translation to morphologically richer languages is a challenging task and more so if the source A dataset originally collected for the DARPA-TIDES surprise- and target languages differ in word order. Current state-of-the art MT systems thus deliver mediocre results. language contest in 2002, later refined at IIIT Hyderabad and Adding more parallel data often helps improve the results; if it does not, it may be caused by various problems such provided for the NLP Tools Contest at ICON 2008. Corpus Sentences En Tokens Hi Tokens as different domains, bad alignment or noise in the new data. We evaluate several available parallel data sources Tides.train 50,000 1,226,144 1,312,435 A journalist Daniel Pipes' website (http://www.danielpipes.org/) and provide cross-evaluation results on their combinations using two freely available statistical MT systems. We Tides.dev 1,000 22,485 24,363 demonstrate various problems encountered in the data and describe automatic methods of data cleaning and limited-domain articles about the Middle East. Written in English, Tides.test 1,000 27,169 28,574 normalization. We also show that the contents of two independently distributed data sets can unexpectedly overlap, many of them translated to up to 25 other languages. which negatively affects translation quality. Together with the error analysis, we also present a new tool for viewing Daniel Pipes 6,761 176,392 122,108 Monolingual, parallel and annotated corpora for fourteen South Emille 3,501 55,660 71,010 aligned corpora, which makes it easier to detect difficult parts in the data even for a developer not speaking the Asian languages (including Hindi) and English.
The Ramayana by R.K. Narayan

Table of Contents About the Author Title Page Copyright Page Introduction Dedication Chapter 1 - RAMA’S INITIATION Chapter 2 - THE WEDDING Chapter 3 - TWO PROMISES REVIVED Chapter 4 - ENCOUNTERS IN EXILE Chapter 5 - THE GRAND TORMENTOR Chapter 6 - VALI Chapter 7 - WHEN THE RAINS CEASE Chapter 8 - MEMENTO FROM RAMA Chapter 9 - RAVANA IN COUNCIL Chapter 10 - ACROSS THE OCEAN Chapter 11 - THE SIEGE OF LANKA Chapter 12 - RAMA AND RAVANA IN BATTLE Chapter 13 - INTERLUDE Chapter 14 - THE CORONATION Epilogue Glossary THE RAMAYANA R. K. NARAYAN was born on October 10, 1906, in Madras, South India, and educated there and at Maharaja’s College in Mysore. His first novel, Swami and Friends (1935), and its successor, The Bachelor of Arts (1937), are both set in the fictional territory of Malgudi, of which John Updike wrote, “Few writers since Dickens can match the effect of colorful teeming that Narayan’s fictional city of Malgudi conveys; its population is as sharply chiseled as a temple frieze, and as endless, with always, one feels, more characters round the corner.” Narayan wrote many more novels set in Malgudi, including The English Teacher (1945), The Financial Expert (1952), and The Guide (1958), which won him the Sahitya Akademi (India’s National Academy of Letters) Award, his country’s highest honor. His collections of short fiction include A Horse and Two Goats, Malgudi Days, and Under the Banyan Tree. Graham Greene, Narayan’s friend and literary champion, said, “He has offered me a second home. Without him I could never have known what it is like to be Indian.” Narayan’s fiction earned him comparisons to the work of writers including Anton Chekhov, William Faulkner, O.
Designing Devanagari Type

Designing Devanagari type The effect of technological restrictions on current practice Kinnat Sóley Lydon BA degree final project Iceland Academy of the Arts Department of Design and Architecture Designing Devanagari type: The effect of technological restrictions on current practice Kinnat Sóley Lydon Final project for a BA degree in graphic design Advisor: Gunnar Vilhjálmsson Graphic design Department of Design and Architecture December 2015 This thesis is a 6 ECTS final project for a Bachelor of Arts degree in graphic design. No part of this thesis may be reproduced in any form without the express consent of the author. Abstract This thesis explores the current process of designing typefaces for Devanagari, a script used to write several languages in India and Nepal. The typographical needs of the script have been insufficiently met through history and many Devanagari typefaces are poorly designed. As the various printing technologies available through the centuries have had drastic effects on the design of Devanagari, the thesis begins with an exploration of the printing history of the script. Through this exploration it is possible to understand which design elements constitute the script, and which ones are simply legacies of older technologies. Following the historic overview, the character set and unique behavior of the script is introduced. The typographical anatomy is analyzed, while pointing out specific design elements of the script. Although recent years has seen a rise of interest on the subject of Devanagari type design, literature on the topic remains sparse. This thesis references books and articles from a wide scope, relying heavily on the works of Fiona Ross and her extensive research on non-Latin typography.
Extending Gurmukhi Script Into the Twenty-First Century and Beyond

Extending Gurmukhi Script into the Twenty-first Century and Beyond Arvinder Singh Kang1,3,4, * and Amanpreet Singh Brar2,3,4 1University of Mississippi 2Red Hat Inc. 3Punjabi Open Source Team 4Project Satluj February 12, 2010 To appear in SIKHOLARS: Sikh Graduate Student Conference, Stanford University Campus, CA. February 20, 2010. *To whom correspondence should be addressed. Email: [email protected] List of Figures 1 Gurmukhi and Shahmukhi scripts.. 2 2 Unicode range for Gurmukhi.......................... 3 3 Photographs of Goindval Pothi. 4 4 Fedora Linux running in Punjabi interface. 6 5 Firefox browser in Punjabi. .......................... 7 1 Abstract As we move forward into the digital age, the availability of digitized and standardized Gurmukhi is even more important to preserve our libraries and texts and to record our lives in the language of our thoughts. However, without a standard for how an alphabet is encoded in a Punjabi font, different machines and browsers across the world interpret the alphabet differently. One of the solutions to this problem is embracing Unicode[14] standards. With Uni- code, a specific font has one and only one code point or digital signature across all machines around the world. Our online group of volunteers at Punjabi Open Source Team[1][33][2][3] have been working since 2004 to create and absorb Unicode stan- dards in Gurmukhi and translating open source softwares into Punjabi. Through this paper we hope to highlight the development and achievements of a group of globally separated volunteers who have been able to come together to become one of the most successful open source Indic language communities.
General Historical and Analytical / Writing Systems: Recent Script

9 Writing systems Edited by Elena Bashir 9,1. Introduction By Elena Bashir The relations between spoken language and the visual symbols (graphemes) used to represent it are complex. Orthographies can be thought of as situated on a con- tinuum from “deep” — systems in which there is not a one-to-one correspondence between the sounds of the language and its graphemes — to “shallow” — systems in which the relationship between sounds and graphemes is regular and trans- parent (see Roberts & Joyce 2012 for a recent discussion). In orthographies for Indo-Aryan and Iranian languages based on the Arabic script and writing system, the retention of historical spellings for words of Arabic or Persian origin increases the orthographic depth of these systems. Decisions on how to write a language always carry historical, cultural, and political meaning. Debates about orthography usually focus on such issues rather than on linguistic analysis; this can be seen in Pakistan, for example, in discussions regarding orthography for Kalasha, Wakhi, or Balti, and in Afghanistan regarding Wakhi or Pashai. Questions of orthography are intertwined with language ideology, language planning activities, and goals like literacy or standardization. Woolard 1998, Brandt 2014, and Sebba 2007 are valuable treatments of such issues. In Section 9.2, Stefan Baums discusses the historical development and general characteristics of the (non Perso-Arabic) writing systems used for South Asian languages, and his Section 9.3 deals with recent research on alphasyllabic writing systems, script-related literacy and language-learning studies, representation of South Asian languages in Unicode, and recent debates about the Indus Valley inscriptions.
Chapter 6, Writing Systems and Punctuation

The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1.
A Comparative Study of the Development of the Gurmukhi Script

A comparative study of the development of the Gurmukhi script A comparative study of the development of the Gurmukhi script: from the handwritten manuscript to the digital typeface. Emma Williams, September 2008 Part two: Development of the printed character Conclusion Bibliography Appendix Submitted in partial fulfilment for the requirements for the Master of Arts in Typeface Design, Department of Typography and Graphic Communication, at the University of Reading, United Kingdom Typeset in Minion Pro and Linotype Gurmukhi A comparative study of the development of the Gurmukhi script Development of the printed character This entire chapter is loosely organised upon the major technology developments over the past three hundred years: metal type, both hand and machine composed and the early and current digital format. Each will be elaborated upon when and where is necessary.49 The first three sub-chapters refer to text typefaces which were designed in metal for hand composition, and are arranged by location: India, England and Europe. In Typography for Devanagari, Naik explains and illustrates the various typesetting systems: degree and akhand50 [figure 52]. The degree system ‘is assembled in three steps.’51 The character is divided into a total of four ems:52 one em for the superior and subscript diacritics and two ems for the base character, or in instances where a Degree system subscript character is not necessary, a three em base character can be used to create the required total of four ems [figure 52]. Figure 53: Nomencalature for a metal The akhand system uses as few components as possible to create the sort (character).
PITS@Dravidian-Codemix-FIRE2020: Traditional Approach to Noisy Code-Mixed Sentiment Analysis

PITS@Dravidian-CodeMix-FIRE2020: Traditional Approach to Noisy Code-Mixed Sentiment Analysis Nikita Kanwara, Megha Agarwala and Rajesh Kumar Mundotiyab aPratap Institute of Technology & Science, India bIndian Institute of Technology (BHU), India Abstract Sentiment Analysis (SA) is a process for characterizing the response or opinion by which sentiment polarity of the text is decided. Nowadays, social media is a common platform to convey opinions, sug- gestions and much more in a user’s native language or multilingual in Roman script (for ease). In this task, Malayalam-English and Tamil-English code mixed dataset in the Roman script has provided for SA. To solve this task, we have generated syntax-based features and used trained logistic regression with as an under-sampling technique. We have obtained best F1-score of 0:71 and 0:62 on the blind test set of Malayalam-English and Tamil-English code mixed datasets, respectively. The code is available at Github1. Keywords Malayalam-English code-mixed, Tamil-English code-mixed, Sentiment Analysis, Logistic Regression, 1. Introduction Sentiment Analysis (SA) is a process for characterizing the response or opinion, which de- termines sentiment polarity of the text. In the last few years, social media platforms such as Facebook, Twitter and Youtube have become increasingly large, which in turn produced textual data as people express their feelings and opinions by writing reviews, comments on social media. A large number of texts on such platforms are available in either user’s native language, English or a mixture of both. According to Myers-Scotton (1993), code-mixing is defined as – “The interchangeable use of linguistic units, like morphs, words, and phrases from one language to another language while conversation (both speaking and writing)” [1].