Orthographic Syllable As Basic Unit for SMT Between Related Languages

Total Page:16

File Type:pdf, Size:1020Kb

Load more

Orthographic Syllable as basic unit for SMT between Related Languages Anoop Kunchukuttan, Pushpak Bhattacharyya Center For Indian Language Technology, Department of Computer Science & Engineering Indian Institute of Technology Bombay {anoopk,pb}@cse.iitb.ac.in Abstract achieved by sub-word level transformations. For in- stance, lexical similarity can be modelled in the stan- We explore the use of the orthographic syl- dard SMT pipeline by transliteration of words while lable, a variable-length consonant-vowel se- decoding (Durrani et al., 2010) or post-processing quence, as a basic unit of translation between (Nakov and Tiedemann, 2012; Kunchukuttan et al., related languages which use abugida or alpha- 2014). betic scripts. We show that orthographic sylla- ble level translation significantly outperforms A different paradigm is to drop the notion of models trained over other basic units (word, word boundary and consider the character n-gram morpheme and character) when training over as the basic unit of translation (Vilar et al., 2007; small parallel corpora. Tiedemann, 2009a). Such character-level SMT bas been explored for closely related languages like Bulgarian-Macedonian, Indonesian-Malay with 1 Introduction modest success, with the short context of unigrams being a limiting factor (Tiedemann, 2012). The Related languages exhibit lexical and structural sim- use of character n-gram units to address this limi- ilarities on account of sharing a common ances- tation leads to data sparsity for higher order n-grams try (Indo-Aryan, Slavic languages) or being in pro- and provides little benefit (Tiedemann and Nakov, longed contact for a long period of time (Indian sub- 2013). continent, Standard Average European linguistic ar- eas) (Bhattacharyya et al., 2016). Translation be- In this work, we present a linguistically moti- tween related languages is an important requirement vated, variable length unit of translation — ortho- due to substantial government, business and social graphic syllable (OS) — which provides more con- communication among people speaking these lan- text for translation while limiting the number of ba- guages. However, most of these languages have few sic units. The OS consists of one or more conso- parallel corpora resources, an important requirement nants followed by a vowel and is inspired from the for building good quality SMT systems. akshara, a consonant-vowel unit, which is the funda- Modelling the lexical similarity among related mental organizing principle of Indic scripts (Sproat, languages is the key to building good-quality SMT 2003; Singh, 2006). It can be thought of as an ap- systems with limited parallel corpora. Lexical sim- proximate syllable with the onset and nucleus, but ilarity implies that the languages share many words no coda. While true syllabification is hard, ortho- with the similar form (spelling/pronunciation) and graphic syllabification can be easily done. Atreya et meaning e.g. blindness is andhapana in Hindi, al. (2016) and Ekbal et al. (2006) have shown that aandhaLepaNaa in Marathi. These words could the OS is a useful unit for transliteration involving be cognates, lateral borrowings or loan words from Indian languages. other languages. Translation for such words can be We show that orthographic syllable-level trans- lation significantly outperforms character-level and Basic Unit Example Transliteration strong word-level and morpheme-level baselines Word घरासमोरचा gharAsamoracA over multiple related language pairs (Indian as well Morph Segment घरा समोर चा gharA samora cA Orthographic Syllable घ रा स मो र चा gha rA sa mo racA as others). Character-level approaches have been Character unigram घ र ◌ा स म ◌ो र च ◌ा gha r A sa m o ra c A previously shown to work well for language pairs Character 3-gram घरा समो रचा gharA samo rcA with high lexical similarity. Our major finding is that something that is in front of home: ghara=home, samora=front, cA=of OS-level translation outperforms other approaches Table 1: Various translation units for a Marathi word even when the language pairs have relatively less lexical similarity or belong to different language families (but have sufficient contact relation). ing a word is considered to be an OS. 2 Orthographic Syllabification 3 Translation Models The orthographic syllable is a sequence of one or We compared the orthographic syllable level model more consonants followed by a vowel, i.e a C+V (O) with models based on other translation units that unit. We describe briefly procedures for ortho- have been reported in previous work: word (W), graphic syllabification of Indian scripts and non- morpheme (M), unigram (C) and trigram characters. Indic alphabetic scripts. Orthographic syllabifica- Table 1 shows examples of these representations. tion cannot be done for languages using logographic The first step to build these translation systems is and abjad scripts as these scripts do not have vowels. to transform sentences to the correct representation. Each word is segmented as the per the unit of rep- Indic Scripts: Indic scripts are abugida scripts, resentation, punctuations are retained and a special consisting of consonant-vowel sequences, with a word boundary marker character (_) is introduced consonant core (C+) and a dependent vowel (ma- to indicate word boundaries as shown here: tra). If no vowel follows a consonant, an implicit W: schwa vowel [IPA: ə] is assumed. Suppression of राजू , घराबाहेर जाऊ नको . schwa is indicated by the halanta character follow- O: रा जू _ , _ घ रा बा हे र _ जा ऊ _ न को _ . ing a consonant. This script design makes for a straightforward syllabification process( as shown) in For all units of representation, we trained phrase- lakShamI based SMT (PBSMT) systems. Since related lan- the following example.( e.g. लक्षमी) CVCCVCV is la kSha mI guages have similar word order, we used distance segmented as ल क्ष मी CVCCVCV . There are two exceptions to this scheme: (i) Indic scripts distin- based distortion model and monotonic decoding. For guish between dependent vowels (vowel diacritics) character and orthographic syllable level models, we and independent vowels, and the latter will consti- use higher order (10-gram) languages models since data sparsity is a lesser concern due to small vocabu- tute an OS on its own. e.g. मु륍बई (mumbaI) ! lary size (Vilar et al., 2007). As suggested by Nakov मु 륍ब ई (mu mba I) (ii) The characters anusvaara and chandrabindu are part of the OS to the left if and Tiedemann (2012), we used word-level tuning they represents nasalization of the vowel/consonant for character and orthographic syllable level models or start a new OS if they represent a nasal consonant. by post-processing n-best lists in each tuning step to Their exact role is determined by the character fol- calculate the usual word-based BLEU score. lowing the anusvaara. See Appendix A for details. While decoding, the word and morpheme level systems will not be able to translate OOV words. Non-Indic Alphabetic Scripts: We use a simpler Since the languages involved share vocabulary, we method for the alphabetic scripts used in our experi- transliterate the untranslated words resulting in the ments (Latin and Cyrillic). The OS is identified by a post-edited systems WX and MX corresponding to C+V+ sequence. e.g. lakshami!la ksha mi, mum- the systems W and M respectively. Following de- bai!mu mbai. The OS could contains multiple ter- coding, we used a simple method to regenerate minal vowel characters representing long vowels (oo words from sub-word level units: Since we represent in cool) or diphthongs (ai in mumbai). A vowel start- word boundaries using a word boundary marker, we IA!IA DR!DR IA!DR parallel corpora for character, morpheme and OS ben-hin 52.30 mal-tam 39.04 hin-mal 33.24 level LMs. pan-hin 67.99 tel-mal 39.18 DR!IA kok-mar 54.51 mal-hin 33.24 System details: PBSMT systems were trained us- IA: Indo-Aryan, DR: Dravidian ing the Moses system (Koehn et al., 2007), with the grow-diag-final-and heuristic for extracting phrases, Table 2: Language pairs used in experiments along and Batch MIRA (Cherry and Foster, 2012) for tun- with Lexical Similarity between them, in terms of ing (default parameters). We trained 5-gram LMs LCSR between training corpus sentences with Kneser-Ney smoothing for word and morpheme level models and 10-gram LMs for character and simply concat the output units between consecutive OS level models. We used the BrahmiNet translit- occurrences of the marker character. eration system (Kunchukuttan et al., 2015) for post- editing, which is based on the transliteration Mod- 4 Experimental Setup ule in Moses (Durrani et al., 2014). We used un- supervised morphological segmenters trained with Languages: Our experiments primarily concen- Morfessor (Virpioja et al., 2013) for obtaining mor- trated on multiple language pairs from the two ma- pheme representations. The unsupervised morpho- jor language families of the Indian sub-continent logical segmenters were trained on the ILCI corpus (Indo-Aryan branch of Indo-European and Dravid- and the Leipzig corpus (Quasthoff et al., 2006).The ian). These languages have been in contact for a morph-segmenters and our implementation of ortho- long time, hence there are many lexical and gram- graphic syllabification are made available as part of matical similarities among them, leading to the sub- the Indic NLP Library1. continent being considered a linguistic area (Eme- neau, 1956). Specifically, there is overlap between Evaluation: We use BLEU (Papineni et al., 2002) the vocabulary of these languages to varying de- and Le-BLEU (Virpioja and Grönroos, 2015) for grees due to cognates, language contact and loan- evaluation. Le-BLEU does fuzzy matches of words words from Sanskrit (throughout history) and En- and hence is suitable for evaluating SMT systems glish (in recent times).
Recommended publications
  • Poetry and History: Bengali Maṅgal-Kābya and Social Change in Precolonial Bengal David L

    Poetry and History: Bengali Maṅgal-Kābya and Social Change in Precolonial Bengal David L

    Western Washington University Western CEDAR A Collection of Open Access Books and Books and Monographs Monographs 2008 Poetry and History: Bengali Maṅgal-kābya and Social Change in Precolonial Bengal David L. Curley Western Washington University, [email protected] Follow this and additional works at: https://cedar.wwu.edu/cedarbooks Part of the Near Eastern Languages and Societies Commons Recommended Citation Curley, David L., "Poetry and History: Bengali Maṅgal-kābya and Social Change in Precolonial Bengal" (2008). A Collection of Open Access Books and Monographs. 5. https://cedar.wwu.edu/cedarbooks/5 This Book is brought to you for free and open access by the Books and Monographs at Western CEDAR. It has been accepted for inclusion in A Collection of Open Access Books and Monographs by an authorized administrator of Western CEDAR. For more information, please contact [email protected]. Table of Contents Acknowledgements. 1. A Historian’s Introduction to Reading Mangal-Kabya. 2. Kings and Commerce on an Agrarian Frontier: Kalketu’s Story in Mukunda’s Candimangal. 3. Marriage, Honor, Agency, and Trials by Ordeal: Women’s Gender Roles in Candimangal. 4. ‘Tribute Exchange’ and the Liminality of Foreign Merchants in Mukunda’s Candimangal. 5. ‘Voluntary’ Relationships and Royal Gifts of Pan in Mughal Bengal. 6. Maharaja Krsnacandra, Hinduism and Kingship in the Contact Zone of Bengal. 7. Lost Meanings and New Stories: Candimangal after British Dominance. Index. Acknowledgements This collection of essays was made possible by the wonderful, multidisciplinary education in history and literature which I received at the University of Chicago. It is a pleasure to thank my living teachers, Herman Sinaiko, Ronald B.
  • LCSH Section K

    LCSH Section K

    K., Rupert (Fictitious character) Motion of K stars in line of sight Ka-đai language USE Rupert (Fictitious character : Laporte) Radial velocity of K stars USE Kadai languages K-4 PRR 1361 (Steam locomotive) — Orbits Ka’do Herdé language USE 1361 K4 (Steam locomotive) UF Galactic orbits of K stars USE Herdé language K-9 (Fictitious character) (Not Subd Geog) K stars—Galactic orbits Ka’do Pévé language UF K-Nine (Fictitious character) BT Orbits USE Pévé language K9 (Fictitious character) — Radial velocity Ka Dwo (Asian people) K 37 (Military aircraft) USE K stars—Motion in line of sight USE Kadu (Asian people) USE Junkers K 37 (Military aircraft) — Spectra Ka-Ga-Nga script (May Subd Geog) K 98 k (Rifle) K Street (Sacramento, Calif.) UF Script, Ka-Ga-Nga USE Mauser K98k rifle This heading is not valid for use as a geographic BT Inscriptions, Malayan K.A.L. Flight 007 Incident, 1983 subdivision. Ka-houk (Wash.) USE Korean Air Lines Incident, 1983 BT Streets—California USE Ozette Lake (Wash.) K.A. Lind Honorary Award K-T boundary Ka Iwi National Scenic Shoreline (Hawaii) USE Moderna museets vänners skulpturpris USE Cretaceous-Paleogene boundary UF Ka Iwi Scenic Shoreline Park (Hawaii) K.A. Linds hederspris K-T Extinction Ka Iwi Shoreline (Hawaii) USE Moderna museets vänners skulpturpris USE Cretaceous-Paleogene Extinction BT National parks and reserves—Hawaii K-ABC (Intelligence test) K-T Mass Extinction Ka Iwi Scenic Shoreline Park (Hawaii) USE Kaufman Assessment Battery for Children USE Cretaceous-Paleogene Extinction USE Ka Iwi National Scenic Shoreline (Hawaii) K-B Bridge (Palau) K-TEA (Achievement test) Ka Iwi Shoreline (Hawaii) USE Koro-Babeldaod Bridge (Palau) USE Kaufman Test of Educational Achievement USE Ka Iwi National Scenic Shoreline (Hawaii) K-BIT (Intelligence test) K-theory Ka-ju-ken-bo USE Kaufman Brief Intelligence Test [QA612.33] USE Kajukenbo K.
  • OSU WPL # 27 (1983) 140- 164. the Elimination of Ergative Patterns Of

    OSU WPL # 27 (1983) 140- 164. the Elimination of Ergative Patterns Of

    OSU WPL # 27 (1983) 140- 164. The Elimination of Ergative Patterns of Case-Marking and Verbal Agreement in Modern Indic Languages Gregory T. Stump Introduction. As is well known, many of the modern Indic languages are partially ergative, showing accusative patterns of case- marking and verbal agreement in nonpast tenses, but ergative patterns in some or all past tenses. This partial ergativity is not at all stable in these languages, however; what I wish to show in the present paper, in fact, is that a large array of factors is contributing to the elimination of partial ergativity in the modern Indic languages. The forces leading to the decay of ergativity are diverse in nature; and any one of these may exert a profound influence on the syntactic development of one language but remain ineffectual in another. Before discussing this erosion of partial ergativity in Modern lndic, 1 would like to review the history of what the I ndian grammar- ians call the prayogas ('constructions') of a past tense verb with its subject and direct object arguments; the decay of Indic ergativity is, I believe, best envisioned as the effect of analogical develop- ments on or within the system of prayogas. There are three prayogas in early Modern lndic. The first of these is the kartariprayoga, or ' active construction' of intransitive verbs. In the kartariprayoga, the verb agrees (in number and p,ender) with its subject, which is in the nominative case--thus, in Vernacular HindOstani: (1) kartariprayoga: 'aurat chali. mard chala. woman (nom.) went (fern. sg.) man (nom.) went (masc.
  • Recognition of Online Handwritten Gurmukhi Strokes Using Support Vector Machine a Thesis

    Recognition of Online Handwritten Gurmukhi Strokes Using Support Vector Machine a Thesis

    Recognition of Online Handwritten Gurmukhi Strokes using Support Vector Machine A Thesis Submitted in partial fulfillment of the requirements for the award of the degree of Master of Technology Submitted by Rahul Agrawal (Roll No. 601003022) Under the supervision of Dr. R. K. Sharma Professor School of Mathematics and Computer Applications Thapar University Patiala School of Mathematics and Computer Applications Thapar University Patiala – 147004 (Punjab), INDIA June 2012 (i) ABSTRACT Pen-based interfaces are becoming more and more popular and play an important role in human-computer interaction. This popularity of such interfaces has created interest of lot of researchers in online handwriting recognition. Online handwriting recognition contains both temporal stroke information and spatial shape information. Online handwriting recognition systems are expected to exhibit better performance than offline handwriting recognition systems. Our research work presented in this thesis is to recognize strokes written in Gurmukhi script using Support Vector Machine (SVM). The system developed here is a writer independent system. First chapter of this thesis report consist of a brief introduction to handwriting recognition system and some basic differences between offline and online handwriting systems. It also includes various issues that one can face during development during online handwriting recognition systems. A brief introduction about Gurmukhi script has also been given in this chapter In the last section detailed literature survey starting from the 1979 has also been given. Second chapter gives detailed information about stroke capturing, preprocessing of stroke and feature extraction. These phases are considered to be backbone of any online handwriting recognition system. Recognition techniques that have been used in this study are discussed in chapter three.
  • Sanskrit Alphabet

    Sanskrit Alphabet

    Sounds Sanskrit Alphabet with sounds with other letters: eg's: Vowels: a* aa kaa short and long ◌ к I ii ◌ ◌ к kii u uu ◌ ◌ к kuu r also shows as a small backwards hook ri* rri* on top when it preceeds a letter (rpa) and a ◌ ◌ down/left bar when comes after (kra) lri lree ◌ ◌ к klri e ai ◌ ◌ к ke o au* ◌ ◌ к kau am: ah ◌ं ◌ः कः kah Consonants: к ka х kha ga gha na Ê ca cha ja jha* na ta tha Ú da dha na* ta tha Ú da dha na pa pha º ba bha ma Semivowels: ya ra la* va Sibilants: sa ш sa sa ha ksa** (**Compound Consonant. See next page) *Modern/ Hindi Versions a Other ऋ r ॠ rr La, Laa (retro) औ au aum (stylized) ◌ silences the vowel, eg: к kam झ jha Numero: ण na (retro) १ ५ ॰ la 1 2 3 4 5 6 7 8 9 0 @ Davidya.ca Page 1 Sounds Numero: 0 1 2 3 4 5 6 7 8 910 १॰ ॰ १ २ ३ ४ ६ ७ varient: ५ ८ (shoonya eka- dva- tri- catúr- pancha- sás- saptán- astá- návan- dásan- = empty) works like our Arabic numbers @ Davidya.ca Compound Consanants: When 2 or more consonants are together, they blend into a compound letter. The 12 most common: jna/ tra ttagya dya ddhya ksa kta kra hma hna hva examples: for a whole chart, see: http://www.omniglot.com/writing/devanagari_conjuncts.php that page includes a download link but note the site uses the modern form Page 2 Alphabet Devanagari Alphabet : к х Ê Ú Ú º ш @ Davidya.ca Page 3 Pronounce Vowels T pronounce Consonants pronounce Semivowels pronounce 1 a g Another 17 к ka v Kit 42 ya p Yoga 2 aa g fAther 18 х kha v blocKHead
  • Tugboat, Volume 15 (1994), No. 4 447 Indica, an Indic Preprocessor

    Tugboat, Volume 15 (1994), No. 4 447 Indica, an Indic Preprocessor

    TUGboat, Volume 15 (1994), No. 4 447 HPXC , an Indic preprocessor for T X script, ...inMalayalam, Indica E 1 A Sinhalese TEXSystem hpx...inSinhalese,etc. This justifies the choice of a common translit- Yannis Haralambous eration scheme for all Indic languages. But why is Abstract a preprocessor necessary, after all? A common characteristic of Indic languages is In this paper a two-fold project is described: the first the fact that the short vowel ‘a’ is inherent to con- part is a generalized preprocessor for Indic scripts (scripts of languages currently spoken in India—except Urdu—, sonants. Vowels are written by adding diacritical Sanskrit and Tibetan), with several kinds of input (LATEX marks (or smaller characters) to consonants. The commands, 7-bit ascii, CSX, ISO/IEC 10646/unicode) beauty (and complexity) of these scripts comes from and TEX output. This utility is written in standard Flex the fact that one needs a special way to denote the (the gnu version of Lex), and hence can be painlessly absence of vowel. There is a notorious diacritic, compiled on any platform. The same input methods are called “vir¯ama”, present in all Indic languages, which used for all Indic languages, so that the user does not is used for this reason. But it seems illogical to add a need to memorize different conventions and commands sign, to specify the absence of a sound. On the con- for each one of them. Moreover, the switch from one lan- trary, it seems much more logical to remove some- guage to another can be done by use of user-defineable thing, and what is done usually is that letters are preprocessor directives.
  • 5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721

    5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721

    Internet Engineering Task Force (IETF) P. Faltstrom, Ed. Request for Comments: 5892 Cisco Category: Standards Track August 2010 ISSN: 2070-1721 The Unicode Code Points and Internationalized Domain Names for Applications (IDNA) Abstract This document specifies rules for deciding whether a code point, considered in isolation or in context, is a candidate for inclusion in an Internationalized Domain Name (IDN). It is part of the specification of Internationalizing Domain Names in Applications 2008 (IDNA2008). Status of This Memo This is an Internet Standards Track document. This document is a product of the Internet Engineering Task Force (IETF). It represents the consensus of the IETF community. It has received public review and has been approved for publication by the Internet Engineering Steering Group (IESG). Further information on Internet Standards is available in Section 2 of RFC 5741. Information about the current status of this document, any errata, and how to provide feedback on it may be obtained at http://www.rfc-editor.org/info/rfc5892. Copyright Notice Copyright (c) 2010 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (http://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Simplified BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Simplified BSD License.
  • Inflectional Morphology Analyzer for Sanskrit

    Inflectional Morphology Analyzer for Sanskrit

    Inflectional Morphology Analyzer for Sanskrit Girish Nath Jha, Muktanand Agrawal, Subash, Sudhir K. Mishra, Diwakar Mani, Diwakar Mishra, Manji Bhadra, Surjit K. Singh [email protected] Special Centre for Sanskrit Studies Jawaharlal Nehru University New Delhi-110067 The paper describes a Sanskrit morphological analyzer that identifies and analyzes inflected noun- forms and verb-forms in any given sandhi-free text. The system which has been developed as java servlet RDBMS can be tested at http://sanskrit.jnu.ac.in (Language Processing Tools > Sanskrit Tinanta Analyzer/Subanta Analyzer) with Sanskrit data as unicode text. Subsequently, the separate systems of subanta and ti anta will be combined into a single system of sentence analysis with karaka interpretation. Currently, the system checks and labels each word as three basic POS categories - subanta, ti anta, and avyaya. Thereafter, each subanta is sent for subanta processing based on an example database and a rule database. The verbs are examined based on a database of verb roots and forms as well by reverse morphology based on Pāini an techniques. Future enhancements include plugging in the amarakosha ( http://sanskrit.jnu.ac.in/amara ) and other noun lexicons with the subanta system. The ti anta will be enhanced by the k danta analysis module being developed separately. 1. Introduction The authors in the present paper are describing the subanta and ti anta analysis systems for Sanskrit which are currently running at http://sanskrit.jnu.ac.in . Sanskrit is a heavily inflected language, and depends on nominal and verbal inflections for communication of meaning. A fully inflected unit is called pada .
  • Devan¯Agar¯I for TEX Version 2.17.1

    Devan¯Agar¯I for TEX Version 2.17.1

    Devanagar¯ ¯ı for TEX Version 2.17.1 Anshuman Pandey 6 March 2019 Contents 1 Introduction 2 2 Project Information 3 3 Producing Devan¯agar¯ıText with TEX 3 3.1 Macros and Font Definition Files . 3 3.2 Text Delimiters . 4 3.3 Example Input Files . 4 4 Input Encoding 4 4.1 Supplemental Notes . 4 5 The Preprocessor 5 5.1 Preprocessor Directives . 7 5.2 Protecting Text from Conversion . 9 5.3 Embedding Roman Text within Devan¯agar¯ıText . 9 5.4 Breaking Pre-Defined Conjuncts . 9 5.5 Supported LATEX Commands . 9 5.6 Using Custom LATEX Commands . 10 6 Devan¯agar¯ıFonts 10 6.1 Bombay-Style Fonts . 11 6.2 Calcutta-Style Fonts . 11 6.3 Nepali-Style Fonts . 11 6.4 Devan¯agar¯ıPen Fonts . 11 6.5 Default Devan¯agar¯ıFont (LATEX Only) . 12 6.6 PostScript Type 1 . 12 7 Special Topics 12 7.1 Delimiter Scope . 12 7.2 Line Spacing . 13 7.3 Hyphenation . 13 7.4 Captions and Date Formats (LATEX only) . 13 7.5 Customizing the date and captions (LATEX only) . 14 7.6 Using dvnAgrF in Sections and References (LATEX only) . 15 7.7 Devan¯agar¯ıand Arabic Numerals . 15 7.8 Devan¯agar¯ıPage Numbers and Other Counters (LATEX only) . 15 1 7.9 Category Codes . 16 8 Using Devan¯agar¯ıin X E LATEXand luaLATEX 16 8.1 Using Hindi with Polyglossia . 17 9 Using Hindi with babel 18 9.1 Installation . 18 9.2 Usage . 18 9.3 Language attributes . 19 9.3.1 Attribute modernhindi .
  • An Introduction to Indic Scripts

    An Introduction to Indic Scripts

    An Introduction to Indic Scripts Richard Ishida W3C [email protected] HTML version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.html PDF version: http://www.w3.org/2002/Talks/09-ri-indic/indic-paper.pdf Introduction This paper provides an introduction to the major Indic scripts used on the Indian mainland. Those addressed in this paper include specifically Bengali, Devanagari, Gujarati, Gurmukhi, Kannada, Malayalam, Oriya, Tamil, and Telugu. I have used XHTML encoded in UTF-8 for the base version of this paper. Most of the XHTML file can be viewed if you are running Windows XP with all associated Indic font and rendering support, and the Arial Unicode MS font. For examples that require complex rendering in scripts not yet supported by this configuration, such as Bengali, Oriya, and Malayalam, I have used non- Unicode fonts supplied with Gamma's Unitype. To view all fonts as intended without the above you can view the PDF file whose URL is given above. Although the Indic scripts are often described as similar, there is a large amount of variation at the detailed implementation level. To provide a detailed account of how each Indic script implements particular features on a letter by letter basis would require too much time and space for the task at hand. Nevertheless, despite the detail variations, the basic mechanisms are to a large extent the same, and at the general level there is a great deal of similarity between these scripts. It is certainly possible to structure a discussion of the relevant features along the same lines for each of the scripts in the set.
  • (IJTSRD) Volume 4 Issue 1, December 2019 Available Online: E-ISSN: 2456 – 6470

    (IJTSRD) Volume 4 Issue 1, December 2019 Available Online: E-ISSN: 2456 – 6470

    International Journal of Trend in Scientific Research and Development (IJTSRD) Volume 4 Issue 1, December 2019 Available Online: www.ijtsrd.com e-ISSN: 2456 – 6470 A Descriptive Study of Standard Dialect and Western Dialect of Odia Language in Terms of Linguistic Items Debiprasad Pany Assistant Professor, Department of English, IGIT (Indira Gandhi Institute of Technology), Sarang, Odisha, India ABSTRACT How to cite this paper : Debiprasad Pany Language is a unique blessing to human beings. Human beings are bestowed "A Descriptive Study of Standard Dialect with the faculty of language from very primitive age. Language makes human and Western Dialect of Odia Language in beings social and in a society human beings communicate with the help of Terms of Linguistic Items" Published in language. Odia is one among the constitutionally approved language of India. International Journal Odisha is situated in the eastern part of India. Presently, this state has thirty of Trend in Scientific districts. Odisha is bound to the north by the state Jharkhand, to the northeast Research and by the state West Bengal, to the east by the Bay- of- Bengal, to the south by the Development (ijtsrd), state Andhra Pradesh, and to the west by the state Chhattisgarh. The ISSN: 2456-6470, languages used by the neighboring states have a lot of influence on Odia Volume-4 | Issue-1, language. In this present study a modest attempt has been made to high light December 2019, IJTSRD29632 the differences between Standard Odia and Western Odia dialects. Various pp.626-630, URL: linguistic items used by the western Odia dialect users have marked www.ijtsrd.com/papers/ijtsrd29632.pdf differences compared to the standard Odia.
  • The Gurmukhi Astrolabe of the Maharaja of Patiala

    The Gurmukhi Astrolabe of the Maharaja of Patiala

    Indian Journal of History of Science, 47.1 (2012) 63-92 THE GURMUKHI ASTROLABE OF THE MAHARAJA OF PATIALA SREERAMULA RAJESWARA SARMA* (Received 14 August 2011; revised 23 January 2012) Even though the telescope was widely in use in India, the production of naked eye observational instruments continued throughout the nineteenth century. In Punjab, in particular, several instrument makers produced astrolabes, celestial globes and other instruments of observation, some inscribed in Sanskrit, some in Persian and yet some others in English. A new feature was the production of instruments with legends in Punjabi language and Gurmukhi script. Three such instruments are known so far. One of these is a splendid astrolabe produced for the Maharaja of Patiala in 1850. This paper offers a full technical description of this Gurmukhi astrolabe and discusses the geographical and astrological data engraved on it. Attention is drawn to the similarities and differences between this astrolabe and the other astrolabes produced in India. Key words: Astrolabe, Astrological tables, Bha– lu– mal, – Geographical gazetteer, Gurmukhi script, Kursi , Maharaja of Patiala, – – Malayendu Su– ri, Rahim Bakhsh, Rete, Rishikes, Shadow squares, Star pointers, Zodiac signs INTRODUCTION For the historian of science and technology in India, the nineteenth century is an interesting period when traditional sciences and technologies continued to exist side by side with modern western science and technology before ultimately giving way to the latter. In the field of astronomical