Proposal to Encode the Grantha Script in Unicode §1. Introduction
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
A Method to Accommodate Backward Compatibility on the Learning Application-Based Transliteration to the Balinese Script
(IJACSA) International Journal of Advanced Computer Science and Applications, Vol. 12, No. 6, 2021 A Method to Accommodate Backward Compatibility on the Learning Application-based Transliteration to the Balinese Script 1 3 4 Gede Indrawan , I Gede Nurhayata , Sariyasa I Ketut Paramarta2 Department of Computer Science Department of Balinese Language Education Universitas Pendidikan Ganesha (Undiksha) Universitas Pendidikan Ganesha (Undiksha) Singaraja, Indonesia Singaraja, Indonesia Abstract—This research proposed a method to accommodate transliteration rules (for short, the older rules) from The backward compatibility on the learning application-based Balinese Alphabet document 1 . It exposes the backward transliteration to the Balinese Script. The objective is to compatibility method to accommodate the standard accommodate the standard transliteration rules from the transliteration rules (for short, the standard rules) from the Balinese Language, Script, and Literature Advisory Agency. It is Balinese Language, Script, and Literature Advisory Agency considered as the main contribution since there has not been a [7]. This Bali Province government agency [4] carries out workaround in this research area. This multi-discipline guidance and formulates programs for the maintenance, study, collaboration work is one of the efforts to preserve digitally the development, and preservation of the Balinese Language, endangered Balinese local language knowledge in Indonesia. The Script, and Literature. proposed method covered two aspects, i.e. (1) Its backward compatibility allows for interoperability at a certain level with This study was conducted on the developed web-based the older transliteration rules; and (2) Breaking backward transliteration learning application, BaliScript, for further compatibility at a certain level is unavoidable since, for the same ubiquitous Balinese Language learning since the proposed aspect, there is a contradictory treatment between the standard method reusable for the mobile application [8], [9]. -
Bibliography
BIBLIOGRAPHY 略 号 IIJ Indo-Iranian Journal. IJDL International Journal of Dravidian Linguistics, Dravidian Linguistics Association, Trivan- drum, 1972– (biannual). JAS Journal of the Institute of Asian Studies, Institute of Asian Studies, Madras, 1984– (biannu- al). JTS Journal of Tamil Studies, International Institute of Tamil Studies, Madras, Vol. 1 (1969), Vol. 2 (1970); No. 1– (1972–, biannual). S.I.S.S.W.P.S. The South India Saiva Siddhanta Works Publishing Society. TA The Tamilian Antiquary, Vol. I (No. 1–10), Vol. II (No. 1, 2), ed. by Pandit D.Savariroyan, T.A.Society, Trichinopoly, 1907–14: (reprint) Asian Educational Services, New Delhi, 1986. TC Tamil Culture, 12 vols., Tuticorin/Madras, 1952–66. (a) General 1. Aiyangar, M. Srinivasa, Tamil Studies, or Essays of the History of the Tamil People, Lan- guage, Religion and Literature, Guardian Press, Madras, 1914: reprint, Asian Educational Services, New Delhi, 1982. 2. Arunachalam, M., History of Tamil Literature Through the Centuries (in Tamil; original title, Tamil Ilakkiya Varalar¯ u), Gandhi Vidyalayam, Tiruchitrambalam, 1969– (8 vols. have been published¯ out of 25 vols.).¯ 3. Arunachalam, M., An Introduction to the History of Tamil Literature, Gandhi Vidyalayam, Tiruchitrambalam, 1974. 4. Burrow, T. and Emeneau, M.B., A Dravidian Etymological Dictionary (2nd ed.), Clarendon Press, Oxford, 1984. 5. Caldwell, Robert, A Comparative Grammar of the Dravidian or South-Indian Family of Languages, 1st ed., 1856: reprint, Oriental Books Reprint Corporation, New Delhi, 1974; Asian Educational Services, New Delhi, 1987. 6. Chitty, Simon Casie, The Tamil Plutarch: A Summary Account of the Lives of the Poets and Poetesses of Southern India and Ceylon, Asian Educational Services, New Delhi, 1982 (2nd revised ed.; 1st ed., 1859). -
S. of Shri Mali Chikkapapanna; B. June 5, 1937; M. Shrimati Kenchamma, 1 D.; Member, Rajya Sabha, 3-4-1980 to 2-4-1986
M MADDANNA, SHRI M. : Studied upto B.A.; Congress (I) (Karnataka); s. of Shri Mali Chikkapapanna; b. June 5, 1937; m. Shrimati Kenchamma, 1 d.; Member, Rajya Sabha, 3-4-1980 to 2-4-1986. Per. Add. : 5, III Cross, Annayappa Block, Kumara Park West, Bangalore (Karnataka). MADHAVAN, SHRI K. K. : B.A., LL.B.; Congress (U) (Kerala); s. of Shri Kunhan; b. July 23, 1917; m. Shrimati Devi, 1 s. and 1 d.; Member, (i) Kerala Legislative Assembly, 1965 and (ii) Rajya Sabha, 3-4-1976 to 2-4-1982; Died. Obit. on 21-10-1999. MADHAVAN, SHRI S. : B.Com., B.L.; A.I.A.D.M.K. (Tamil Nadu); s .of Shri Selliah Pillai; b . October 3, 1933; m. Shrimati Dhanalakshmi, 1 s. and 2 d.; Member, Tamil Nadu Legislative Assembly, 1962-76 and 1984-87; Minister, Government of Tamil Nadu, 1967-76; Member, Rajya Sabha, 3-4-1990 to 2-4- 1996. Per. Add. : 17, Sixth Main Road, Raja Annamalai Puram, Madras (Tamil Nadu). MADNI, SHRI MAULANA ASAD : Fazil (equivalent to M.A. in Islamic Theology); Congress (I) (Uttar Pradesh); s. of Maulana Hussain Ahmad Madni; b. 1928; m. Shrimati Barirah Bano, 4 s. and 2 d.; Vice-President, U.P.C.C.; Member, Rajya Sabha, 3-4-1968 to 2-4-1974, 5-7-1980 to 4-7-1986 and 3-4-1988 to 2-4-1994. Per. Add . : Madani Manzil , Deoband , District Saharanpur (Uttar Pradesh). MAHABIR PRASAD, DR. : M.A., Ph.D.; Janata Party (Bihar); s. of Shri Sahdev Yadav; b. 1939; m. Shrimati Chandra Kala Devi, 2 s. -
Effects and Mitigation of Out-Of-Vocabulary in Universal Language Models
Journal of Information Processing Vol.29 490–503 (July 2021) [DOI: 10.2197/ipsjjip.29.490] Regular Paper Effects and Mitigation of Out-of-vocabulary in Universal Language Models Sangwhan Moon1,†1,a) Naoaki Okazaki1,b) Received: December 8, 2020, Accepted: April 2, 2021 Abstract: One of the most important recent natural language processing (NLP) trends is transfer learning – using representations from language models implemented through a neural network to perform other tasks. While transfer learning is a promising and robust method, downstream task performance in transfer learning depends on the robust- ness of the backbone model’s vocabulary, which in turn represents both the positive and negative characteristics of the corpus used to train it. With subword tokenization, out-of-vocabulary (OOV) is generally assumed to be a solved problem. Still, in languages with a large alphabet such as Chinese, Japanese, and Korean (CJK), this assumption does not hold. In our work, we demonstrate the adverse effects of OOV in the context of transfer learning in CJK languages, then propose a novel approach to maximize the utility of a pre-trained model suffering from OOV. Additionally, we further investigate the correlation of OOV to task performance and explore if and how mitigation can salvage a model with high OOV. Keywords: Natural language processing, Machine learning, Transfer learning, Language models to reduce the size of the vocabulary while increasing the robust- 1. Introduction ness against out-of-vocabulary (OOV) in downstream tasks. This Using a large-scale neural language model as a pre-trained is especially powerful when combined with transfer learning. -
Child Readersв€™ Eye Movements in Reading Thai
Vision Research 123 (2016) 8–19 Contents lists available at ScienceDirect Vision Research journal homepage: www.elsevier.com/locate/visres Child readers’ eye movements in reading Thai ⇑ Benjawan Kasisopa a, , Ronan G. Reilly a,b, Sudaporn Luksaneeyanawin c, Denis Burnham a a MARCS Institute for Brain, Behaviour, and Development, Western Sydney University, Australia b Department of Computer Science, Maynooth University, Ireland c Centre for Research in Speech and Language Processing, Chulalongkorn University, Thailand article info abstract Article history: It has recently been found that adult native readers of Thai, an alphabetic scriptio continua language, Received 30 July 2014 engage similar oculomotor patterns as readers of languages written with spaces between words; despite Received in revised form 1 July 2015 the lack of inter-word spaces, first and last characters of a word appear to guide optimal placement of Accepted 23 July 2015 Thai readers’ eye movements, just to the left of word-centre. The issue addressed by the research Available online 12 May 2016 described here is whether eye movements of Thai children also show these oculomotor patterns. Here the effect of first and last character frequency and word frequency on the eye movements of 18 Thai chil- Keywords: dren when silently reading normal unspaced and spaced text was investigated. Linear mixed-effects Eye movements model analyses of viewing time measures (first fixation duration, single fixation duration, and gaze dura- Children’s reading Landing site distribution tion) and of landing site location revealed that Thai children’s eye movement patterns were similar to Thai text their adult counterparts. Both first character frequency and word frequency played important roles in Thai children’s landing sites; children tended to land their eyes further into words, close to the word cen- tre, if the word began with higher frequency first characters, and this effect was facilitated in higher fre- quency words. -
Follow-Up to Extended Tamil Proposal L2/10-256R §1
Follow-up to Extended Tamil proposal L2/10-256R Shriramana Sharma, jamadagni-at-gmail-dot-com, India 2010-Sep-30 This is a follow-up to my Extended Tamil proposal L2/10-256R. It reflects some further thought I have been giving to the matter of how Extended Tamil and related script-forms should be represented at the encoding level. It also describes use of Extended Tamil for contexts I had not considered earlier. No part of this document alters any of the Extended Tamil characters or their names or properties as proposed by L2/10-256R, however. The only intention is to clarify the details of the implementation and usage of Extended Tamil. §1. Encoding model of Extended Tamil and related script-forms §1.1. Tamil script for Tamil language Just to cover the entire spectrum, I first note that characters from the Tamil block are used to denote the Tamil language (obviously). The point is that Extended Tamil characters which are intended for the proposed Tamil Extended block are not used for this: (The above verse is the first verse from the Tirumantiram, a text on the Shaiva Vedanta religion attributed to one sage Tirumūlar.) §1.2. Grantha script for Sanskrit language At the other end of the spectrum, the Grantha script – to be precise, characters from the Grantha block – are used to denote Sanskrit as in this verse from the Bhagavad Gītā (18-66): । ॥ ї 1 §1.3. Extended Tamil script (Liberal variant) for Sanskrit language The same verse in Extended Tamil, using characters from the Tamil and Tamil Extended blocks and a font that displays those characters in the orthographic style we have called in our proposal L2/10-256R as ET-L or Extended Tamil Liberal: । ॥ The language is Sanskrit. -
Das Deutsche Schriftsystem (Compu-Skript)
PD Dr. Wolfgang Schindler. LMU München. Compu-Skript „Das deutsche Schriftsystem“. Version 27.09.2021. Seite 1 E-Mail: Wolfgang.Schindler[ätt]germanistik.uni-muenchen.de Web: http://wolfgang-schindler.userweb.mwn.de/index.html Das deutsche Schriftsystem 1 (Literaturverzeichnis am Ende!) 1 Zur Evolution von Schreibgrammatik Wenn wir in die römische Antike zurückblicken, fällt uns auf, dass es viele Schriftdokumente, bei- spielsweise Inschriften, gibt, die in CAPITALIS und in der sog. SCRIPTIO CONTINUA verfasst sind: Inschrift2 Trajan-Säule in Rom 112/113 n. Chr. Wenn wir das in einer großbuchstabigen durchgängigen Schreibung im Deutschen nachahmen, dann sieht das ungefähr so aus: 3 DIROMERHABENTATSEHLIHOFTERSONEWORTZWISENRAUMEGESRIBEN Die Römer haben tatsächlich öfters ohne Wortzwischenräume geschrieben 1 Dieses Typoskript ist aus einer Vorlesungsgrundlage entstanden. Es ist inspiriert von Beatrice Primus’ Arbeiten, ins- besondere ihrer Vorlesung "Das Schriftsystem des Deutschen“ (Köln, 2009)! Der Interpunktionsteil ist inspiriert von Ursula Bredels Arbeiten. Zu nennen sind auch die Einflüsse des Wortschreibungsteils in der Grammatik von Peter Eisenberg (Grundriss, Bd. 1., Kap. 8). Den Keim legte vor langer Zeit eine Vorlesung von Hartmut Günther, die er während meines Studiums in München hielt. Und natürlich sind meine Gedanken von den vielen Arbeiten inspiriert, die im Literaturverzeichnis am Ende aufgeführt sind. 2 Für die Interessierten hier eine Version mit einer Gliederung und mit einer ungefähren Übersetzung: SENATVS·POPVLVSQVE·ROMANVS / IMP·CAESARI·DIVI·NERVAE·F·NERVAE / TRAIANO·AVG·GERM·DACICO·PONTIF / MAXIMO·TRIB·POT·XVII·IMP·VI·COS·VI·P·P / AD·DECLARANDVM·QVANTAE·ALTITVDINIS / MONS·ET·LOCVS·TANT<IS·OPER>IBVS·SIT·EGESTVS Die Inschrift enthält zwar keine Spatien, aber MITTELPUNKTE, die meist worttrennend eingesetzt werden. -
Colonial Rule in Kerala and the Development of Malayalam Novels: Special Reference to the Early Malayalam Novels
RESEARCH REVIEW International Journal of Multidisciplinary 2021; 6(2):01-03 Research Paper ISSN: 2455-3085 (Online) https://doi.org/10.31305/rrijm.2021.v06.i02.001 Double Blind Peer Reviewed/Refereed Journal https://www.rrjournals.com/ Colonial Rule in Kerala and the Development of Malayalam Novels: Special Reference to the Early Malayalam Novels *Ramdas V H Research Scholar, Comparative Literature and Linguistics, Sree Sankaracharya University of Sanskrit, Kalady ABSTRACT Article Publication Malayalam literature has a predominant position among other literatures. From the Published Online: 14-Feb-2021 ‘Pattu’ moments it had undergone so many movements and theories to transform to the present style. The history of Malayalam literature witnessed some important Author's Correspondence changes and movements in the history. That means from the beginning to the present Ramdas V H situation, the literary works and literary movements and its changes helped the development of Malayalam literature. It is difficult to estimate the development Research Scholar, Comparative Literature periods of Malayalam literature. Because South Indian languages like Tamil, and Linguistics, Sree Sankaracharya Kannada, etc. gave their own contributions to the development of Malayalam University of Sanskrit, Kalady literature. Especially Tamil literature gave more important contributions to the Asst.Professor, Dept. of English development of Malayalam literature. The English education was another milestone Ilahia College of Arts and Science of the development of Malayalam literature. The establishment of printing presses, ramuvh[at]gmail.com the education minutes of Lord Macaulay, etc. helped the development process. People with the influence of English language and literature, began to produce a new style of writing in Malayalam literature. -
Schriftzeichen
Annette Hornbacher (Ethnologie), Sabine Neumann (Kunstgeschichte Ostasiens), Laura Willer (Papyrologie) Schriftzeichen Schriftzeichen lassen sich als die einzelnen Symbole definieren, aus denen ein Schriftsystem besteht. Den verschiedenen Typen von Schriftsystemen entsprechend gibt es verschiedene Arten von Schriftzeichen. Das System, das uns in Europa am vertrautesten ist, ist das alphabetische, bei dem sich mit Hilfe einer eng begrenzten Anzahl von Symbolen, den Buchstaben, jegliches Wort darstellen lässt, abhängig von der Kombination der Buchstaben. Zu dieser Art Schriftzeichen zählen das lateinische, griechische und kyrillische Alphabet. Sie funktionieren zwar alle auf dieselbe Weise, bestehen jedoch aus unterschiedlichen, voneinander abgeleiteten Schriftzeichen; die lateinischen und kyrillischen sind jeweils eine Weiterentwicklung der griechischen, die selbst eine Weiterentwicklung der phönizischen sind. Eine ältere Form des alphabetischen Systems stellen Schriften dar, die nur oder hauptsächlich aus Konsonantenzeichen bestehen. Zu noch heute gebräuch- lichen Schriftsystemen dieser Art gehören das Hebräische (Abb. 1) und Arabische. Im Arabischen werden kurze Vokale, falls überhaupt, mit Hilfe diakritischer Zeichen dargestellt.1 Im Hebräischen können lange Vokale durch Konsonantenzeichen, die sogenannten matres lectionis, oder sämtliche Vokale durch diakritische Zeichen dar- gestellt werden.2 Silbenschriften, bei denen ein Schriftzeichen, das Syllabogramm, den Lautwert einer ganzen Silbe wiedergibt, benötigen eine wesentlich höhere -
History Review Indian Economic & Social
Indian Economic & Social History Review http://ier.sagepub.com/ A Review Symposium : Literary Cultures in History Indian Economic Social History Review 2005 42: 377 DOI: 10.1177/001946460504200304 The online version of this article can be found at: http://ier.sagepub.com/content/42/3/377.citation Published by: http://www.sagepublications.com Additional services and information for Indian Economic & Social History Review can be found at: Email Alerts: http://ier.sagepub.com/cgi/alerts Subscriptions: http://ier.sagepub.com/subscriptions Reprints: http://www.sagepub.com/journalsReprints.nav Permissions: http://www.sagepub.com/journalsPermissions.nav >> Version of Record - Oct 18, 2005 What is This? Downloaded from ier.sagepub.com at SOAS London on June 28, 2012 A Review Symposium / 377 A Review Symposium: Literary Cultures in History Sheldon Pollock, ed., Literary Cultures in History: Reconstructions from South Asia, Berkeley/Delhi: University of California Press/Oxford University Press, 2003, xxix, 1,066 pp., price $80.00. The review symposium is not a form that readers of the IESHR are likely to have seen with great frequency in this journal, with the possible exception of the reviews some two decades ago of the two volumes of the Cambridge Economic History of India. Occasionally, however, the appearance of a work of great ambition and scope calls for such a response, and this is the case of the work under review here, itself the product of a long gestation process, initiated by Sheldon Pollock and Velcheru Narayana Rao, and brought to fruition, after a number of conferences and meetings, under the sole editorship of Pollock at the University of Chicago. -
Recognition of Punctuation in Voiced and Unvoiced Speech for Ib-CET*
Recognition of Punctuation in Voiced And Unvoiced Speech For Ib-CET Jiangang Liu School of Foreign Languages, Southeast University, Nanjing, Jiangsu Province 210096, China Institute of Modern Educational Technology Southeast University, Nanjing, Jiangsu Province 210096, China (Contact E-mail: [email protected]) Chen Li School of Foreign Languages, Southeast University, Nanjing, Jiangsu Province 210096, China Institute of Modern Educational Technology Southeast University, Nanjing, Jiangsu Province 210096, China Li Zhao Institute of Modern Educational Technology Southeast University, Nanjing, Jiangsu Province 210096, China School of Information Science and Engineering Southeast University, Nanjing, Jiangsu Province 210096, China Abstract The paper discusses the feasibility of punctuation recognition in oral delivery through voiced and unvoiced speech detection. The paper is firstly focused on introduction of iB-CET, in an attempt to clear away possible illiteracy of linguistic perception and speech recognition technology. Secondly, the paper providess some touches on how linguists interpret punctuation, thus inducing the third part in interpreting engineer’s concept on punctuation relying on Bark wavelet transform parameters and subspace analysis in speech punctuation recognition. Finally, through experiments, the paper convinces readers that punctuation in speech can be accomplished, hence leaves much thought for the notion of punctuation recognition in speech for oral delivery. Keywords: punctuation; voiced and unvoiced; iB-CET I. Introduction The present Internet-base College English Test (iB-CET for short) is getting popular, though some scholars are still arguing against its credibility and feasibility. Designed and planned in May, 2007(Jin Yan, & Wu Jiang, 2009), the iB-CET got started among 53 colleges and universities in 20th of December, 2008 so that China made a milestone in computer-based test rather than paper-based test for high education in history(Liu Jian-gang, & Dong Jing, 2011). -
04 Bedamatta Final
CLELEjournal, Volume 1, Issue 1, 2013 58 ________________________________________________________________________________________ Playing with Nonsense: Toward Language Bridging in a Multilingual Classroom Urmishree Bedamatta Abstract To meet the academic and educational needs of first generation school-goers, the Government of India has launched mother tongue based multilingual education for tribal education under the national flagship program of Sarva Shiksha Abhiyan (Universal Elementary Education). In the current multilingual education programme, education starts in the home language. But as the grade advances, curricular subjects begin to be divided between the home language and the school language in a realization of parallel monolingualism in which languages remain closed from each other. This paper proposes the introduction of nonsense texts, including children’s rhymes and folk rhymes and riddles, into the curricular content of language as a bridging subject. For this, I draw upon theoretical perspectives of language awareness, language play and the theories of nonsense. My focus is on the kinds of play that could be attempted with nonsense texts. School education envisages a mere cultural role for nonsense, as a homely, familiar game, and hence, teachers rarely make use of nonsense to initiate experiments with language or to open conceptual doors. I employ the help of some Indian multilingual nonsense texts to illustrate language play – from mimicking sounds and sound patterns to making linguistic connections and discoveries. Keywords – multilingual education, language bridging, language awareness, language play, nonsense texts Urmishree Bedamatta (PhD) is an Assistant Professor in the Department of English, at Ravenshaw University, India. She teaches linguistics and her research areas are language education in multilingual classrooms and multilingual folklore.