ISO/IEC JTC1/SC2/WG2 N3xxx L2/08-Xxx
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Managing Order Relations in Mlbibtex∗
Managing order relations in MlBibTEX∗ Jean-Michel Hufflen LIFC (EA CNRS 4157) University of Franche-Comté 16, route de Gray 25030 BESANÇON CEDEX France hufflen (at) lifc dot univ-fcomte dot fr http://lifc.univ-fcomte.fr/~hufflen Abstract Lexicographical order relations used within dictionaries are language-dependent. First, we describe the problems inherent in automatic generation of multilin- gual bibliographies. Second, we explain how these problems are handled within MlBibTEX. To add or update an order relation for a particular natural language, we have to program in Scheme, but we show that MlBibTEX’s environment eases this task as far as possible. Keywords Lexicographical order relations, dictionaries, bibliographies, colla- tion algorithm, Unicode, MlBibTEX, Scheme. Streszczenie Porządek leksykograficzny w słownikach jest zależny od języka. Najpierw omó- wimy problemy powstające przy automatycznym generowaniu bibliografii wielo- języcznych. Następnie wyjaśnimy, jak są one traktowane w MlBibTEX-u. Do- danie lub zaktualizowanie zasad sortowania dla konkretnego języka naturalnego umożliwia program napisany w języku Scheme. Pokażemy, jak bardzo otoczenie MlBibTEX-owe ułatwia to zadanie. Słowa kluczowe Zasady sortowania leksykograficznego, słowniki, bibliografie, algorytmy sortowania leksykograficznego, Unikod, MlBibTEX, Scheme. 0 Introduction these items w.r.t. the alphabetical order of authors’ Looking for a word in a dictionary or for a name names. But the bst language of bibliography styles in a phone book is a common task. We get used [14] only provides a SORT function [13, Table 13.7] to the lexicographic order over a long time. More suitable for the English language, the commands for precisely, we get used to our own lexicographic or- accents and other diacritical signs being ignored by der, because it belongs to our cultural background. -
Investigation of English Language Contact-Induced Features in Hungarian Cardiology Discharge Reports and Language Attitudes of Physicians and Patients
University of Szeged, Faculty of Arts PhD School in Linguistics PhD Program in English Applied Linguistics Investigation of English language contact-induced features in Hungarian cardiology discharge reports and language attitudes of physicians and patients Summary of PhD Dissertation Csilla Keresztes Supervisor: Anna Fenyvesi, PhD associate professor Szeged 2010 1. Introduction Since the 1950s English has become not just an important language in the field of medicine, but the predominant language of health sciences. The aim of this study is to describe a field, namely, a subregister of the Hungarian language of medicine, to reveal the English contact-induced features in this specific purpose language, and to investigate the attitude of various discourse communities affected by it towards the English language. The impact of some major European languages, among them the English language, on Hungarian and its lexicon has already been investigated, however, it has been looked at mainly from a puristic aspect so far and little sociolinguistic or contact linguistic research has been done in the field yet. This research is focused on only one field of medicine, cardiology, which was selected for a closer investigation, on the one hand, as it is a technologically sophisticated, professionalized, institutionalized, and highly invasive medical discipline. On the other hand, heart diseases are the leading causes of death in several countries of the world including Hungary. Numerous studies have been published on medical English, but studies on medical Hungarian are limited in number, and very little has been published on the language of cardiology. Hospital discharge reports (or summaries) are written documents prepared when the patient is discharged from a health institution after receiving management. -
The Festvox Indic Frontend for Grapheme-To-Phoneme Conversion
The Festvox Indic Frontend for Grapheme-to-Phoneme Conversion Alok Parlikar, Sunayana Sitaram, Andrew Wilkinson and Alan W Black Carnegie Mellon University Pittsburgh, USA aup, ssitaram, aewilkin, [email protected] Abstract Text-to-Speech (TTS) systems convert text into phonetic pronunciations which are then processed by Acoustic Models. TTS frontends typically include text processing, lexical lookup and Grapheme-to-Phoneme (g2p) conversion stages. This paper describes the design and implementation of the Indic frontend, which provides explicit support for many major Indian languages, along with a unified framework with easy extensibility for other Indian languages. The Indic frontend handles many phenomena common to Indian languages such as schwa deletion, contextual nasalization, and voicing. It also handles multi-script synthesis between various Indian-language scripts and English. We describe experiments comparing the quality of TTS systems built using the Indic frontend to grapheme-based systems. While this frontend was designed keeping TTS in mind, it can also be used as a general g2p system for Automatic Speech Recognition. Keywords: speech synthesis, Indian language resources, pronunciation 1. Introduction in models of the spectrum and the prosody. Another prob- lem with this approach is that since each grapheme maps Intelligible and natural-sounding Text-to-Speech to a single “phoneme” in all contexts, this technique does (TTS) systems exist for a number of languages of the world not work well in the case of languages that have pronun- today. However, for low-resource, high-population lan- ciation ambiguities. We refer to this technique as “Raw guages, such as languages of the Indian subcontinent, there Graphemes.” are very few high-quality TTS systems available. -
ISO Basic Latin Alphabet
ISO basic Latin alphabet The ISO basic Latin alphabet is a Latin-script alphabet and consists of two sets of 26 letters, codified in[1] various national and international standards and used widely in international communication. The two sets contain the following 26 letters each:[1][2] ISO basic Latin alphabet Uppercase Latin A B C D E F G H I J K L M N O P Q R S T U V W X Y Z alphabet Lowercase Latin a b c d e f g h i j k l m n o p q r s t u v w x y z alphabet Contents History Terminology Name for Unicode block that contains all letters Names for the two subsets Names for the letters Timeline for encoding standards Timeline for widely used computer codes supporting the alphabet Representation Usage Alphabets containing the same set of letters Column numbering See also References History By the 1960s it became apparent to thecomputer and telecommunications industries in the First World that a non-proprietary method of encoding characters was needed. The International Organization for Standardization (ISO) encapsulated the Latin script in their (ISO/IEC 646) 7-bit character-encoding standard. To achieve widespread acceptance, this encapsulation was based on popular usage. The standard was based on the already published American Standard Code for Information Interchange, better known as ASCII, which included in the character set the 26 × 2 letters of the English alphabet. Later standards issued by the ISO, for example ISO/IEC 8859 (8-bit character encoding) and ISO/IEC 10646 (Unicode Latin), have continued to define the 26 × 2 letters of the English alphabet as the basic Latin script with extensions to handle other letters in other languages.[1] Terminology Name for Unicode block that contains all letters The Unicode block that contains the alphabet is called "C0 Controls and Basic Latin". -
Iso/Iec Jtc1/Sc2/Wg2 N4120 2011-07-05
ISO/IEC JTC1/SC2/WG2 N4120 2011-07-05 Universal Multiple-Octet Coded Character Set International Organization for Standardization Organisation Internationale de Normalisation Международная организация по стандартизации Doc Type: Working Group Document Title: Response to the Ad-hoc Report N4110 about the Rovas scripts Source: Gábor Hosszú (Hungarian National Body) Status: National Body Contribution Action: For consideration by JTC1/SC2/WG2 Date: 2011-07-05 This document gives the position of the Hungarian Standards Institution (Hungarian National Body) evaluation of the report of the ad-hoc committee on Hungarian met in Helsinki on 2011-06-08. Please send any response regarding to this document to Gábor Hosszú (email: [email protected]). Contents 1. Agreement..........................................................................................................................................................1 2. Disagreement .....................................................................................................................................................2 2.1. Naming of the script: barrier to the encoding.................................................................................................................. 2 2.2. Refused, but necessary Szekely-Hungarian Rovas characters......................................................................................... 3 2.3. Names of the characters ................................................................................................................................................. -
Early-Alphabets-3.Pdf
Early Alphabets Alphabetic characteristics 1 Cretan Pictographs 11 Hieroglyphics 16 The Phoenician Alphabet 24 The Greek Alphabet 31 The Latin Alphabet 39 Summary 53 GDT-101 / HISTORY OF GRAPHIC DESIGN / EARLY ALPHABETS 1 / 53 Alphabetic characteristics 3,000 BCE Basic building blocks of written language GDT-101 / HISTORY OF GRAPHIC DESIGN / EARLY ALPHABETS / Alphabetic Characteristics 2 / 53 Early visual language systems were disparate and decentralized 3,000 BCE Protowriting, Cuneiform, Heiroglyphs and far Eastern writing all functioned differently Rebuses, ideographs, logograms, and syllabaries · GDT-101 / HISTORY OF GRAPHIC DESIGN / EARLY ALPHABETS / Alphabetic Characteristics 3 / 53 HIEROGLYPHICS REPRESENTING THE REBUS PRINCIPAL · BEE & LEAF · SEA & SUN · BELIEF AND SEASON GDT-101 / HISTORY OF GRAPHIC DESIGN / EARLY ALPHABETS / Alphabetic Characteristics 4 / 53 PETROGLYPHIC PICTOGRAMS AND IDEOGRAPHS · CIRCA 200 BCE · UTAH, UNITED STATES GDT-101 / HISTORY OF GRAPHIC DESIGN / EARLY ALPHABETS / Alphabetic Characteristics 5 / 53 LUWIAN LOGOGRAMS · CIRCA 1400 AND 1200 BCE · TURKEY GDT-101 / HISTORY OF GRAPHIC DESIGN / EARLY ALPHABETS / Alphabetic Characteristics 6 / 53 OLD PERSIAN SYLLABARY · 600 BCE GDT-101 / HISTORY OF GRAPHIC DESIGN / EARLY ALPHABETS / Alphabetic Characteristics 7 / 53 Alphabetic structure marked an enormous societal leap 3,000 BCE Power was reserved for those who could read and write · GDT-101 / HISTORY OF GRAPHIC DESIGN / EARLY ALPHABETS / Alphabetic Characteristics 8 / 53 What is an alphabet? Definition An alphabet is a set of visual symbols or characters used to represent the elementary sounds of a spoken language. –PM · GDT-101 / HISTORY OF GRAPHIC DESIGN / EARLY ALPHABETS / Alphabetic Characteristics 9 / 53 What is an alphabet? Definition They can be connected and combined to make visual configurations signifying sounds, syllables, and words uttered by the human mouth. -
The Origins of the Alphabet but Their Numbers Tally Into of Greeks and Later Through Their Manuscripts
1 ORIGINS OF THE ALPHABET gchgchgchgchgchgchgchgcggchgchgchgchgchgchggchch disadvantages to this system: not the development of the alphabet, letters developed as a natural result of only were the symbols complex, for it was through the auspices the use of the fl exible reed pen for writing The origins of the alphabet but their numbers tally into of Greeks and later through their manuscripts. Lowercase letters for the begin within the shadowy realms of the tens of thousands, making cultural admirers, the Romans, most part require fewer strokes for their prehistory. At some time within that learning diffi cult and writing slow. that the alphabet was to fi nally formation, allowing the scribes to fi t more shadowed prehistory, humankind An early culture that take on a distinct resemblance letters in a line of type. The combination began to communicate visually. inspired to innovate a simpler, to the modern Western alphabet. of speed and space conservation was Motivated by the need to communicate more effi cient writing system was important to monks writing lengthy facts about the environment around the Phoenicians. The Phoenicians manuscripts on expensive parchment, them, humans made simple drawings v were a wide-fl ung trading people When the Romans adopted and the use of lowercase letters was widely of everyday objects such as people, of great vitality, with complex the Greek alphabet, along with adopted in animals, weapons and so forth. These business transactions that other aspects of Greek culture, a relatively object drawings are called pictographs. required accurate record keeping. they continued the development short time It was through their need for and usage of the alphabet. -
Old Cyrillic in Unicode*
Old Cyrillic in Unicode* Ivan A Derzhanski Institute for Mathematics and Computer Science, Bulgarian Academy of Sciences [email protected] The current version of the Unicode Standard acknowledges the existence of a pre- modern version of the Cyrillic script, but its support thereof is limited to assigning code points to several obsolete letters. Meanwhile mediæval Cyrillic manuscripts and some early printed books feature a plethora of letter shapes, ligatures, diacritic and punctuation marks that want proper representation. (In addition, contemporary editions of mediæval texts employ a variety of annotation signs.) As generally with scripts that predate printing, an obvious problem is the abundance of functional, chronological, regional and decorative variant shapes, the precise details of whose distribution are often unknown. The present contents of the block will need to be interpreted with Old Cyrillic in mind, and decisions to be made as to which remaining characters should be implemented via Unicode’s mechanism of variation selection, as ligatures in the typeface, or as code points in the Private space or the standard Cyrillic block. I discuss the initial stage of this work. The Unicode Standard (Unicode 4.0.1) makes a controversial statement: The historical form of the Cyrillic alphabet is treated as a font style variation of modern Cyrillic because the historical forms are relatively close to the modern appearance, and because some of them are still in modern use in languages other than Russian (for example, U+0406 “I” CYRILLIC CAPITAL LETTER I is used in modern Ukrainian and Byelorussian). Some of the letters in this range were used in modern typefaces in Russian and Bulgarian. -
Myanmar Script Learning Guide Burmese 1,2,3 Tone System Is Developed by Naing Tin-Nyunt-Pu Copyright © 2013-2017, Naing Tinnyuntpu & Asia Pearl Travels
Myanmar Script Learning guide (Revision F) by Naing Tinnyuntpu WITH FREE ONLINE AUDIO SUPPORT https://www.asiapearltravels.com/language/myanmar-script-learning-guide.php 1 of 104 Menu ≡ Introduction ........................................................................................4 Vowel: A .............................................................................................6 Vowel: E or I ....................................................................................13 Vowel: U ..........................................................................................20 Vowel: O ..........................................................................................24 Vowel: Ay .........................................................................................28 Vowel: Au ........................................................................................33 Vowel: Un or An ..............................................................................37 Vowel: In ..........................................................................................42 Vowel: Eare ......................................................................................46 Vowel: Ain .......................................................................................51 Vowel: Ome .....................................................................................53 Vowel: Ine ........................................................................................56 Vowel: Oon ......................................................................................59 -
Hungarian Place Name Pronunciation for English-Speakers: the Hungarian Language Is Part of the Uralic Language Family, Which Also Includes Finnish and Estonian
Hungarian Place Name Pronunciation for English-speakers: The Hungarian language is part of the Uralic language family, which also includes Finnish and Estonian. The name Uralic derives from the hypothesized homeland of the languages in the vicinity of the Ural Mountains. More specifically, Hungarian is an Ugric language, with its closest relatives believed to be Khanty and Mansi – two obscure languages each spoken by around 10,000 people in western Siberia. The Hungarian language contains several sounds that don’t exist in English, so this pronunciation guide for map locations is approximate. Organized by Chip Saltsman, Hungarian pronunciation kindly provided by Scott Moore. A QUICK GUIDE TO PRONOUNCING HUNGARIAN There are a few things you need to know before even trying to pronounce any Hungarian words: 1) All Hungarian words are stressed on the first syllable – in the list below, the stressed syllable will be shown in bold. This can be tricky to do if you aren’t used to it. Try it with the words ‘American Express’ in English. Normally these words are both stressed on the second syllable i.e. uh-mair-uh-kuhn ek-spres (US pronunciation) or uh-me-ri-kuhn ik- spres (UK pronunciation). Try to say it like this: Uh-mair-uh-kuhn Ik-spres. Difficult isn’t it? 2) Hungarian is highly phonetic. That means that any given written letter is always pronounced the same way. English is not very phonetic, especially with names. For example, Worcestershire is pronounced: wuh-stuh-shu. 3) The Hungarian alphabet has 40 letters (or 44 in the extended version)! As there aren’t enough symbols in the Latin alphabet, they’ve had to be creative in the use of: a. -
An Introduction to Old Persian Prods Oktor Skjærvø
An Introduction to Old Persian Prods Oktor Skjærvø Copyright © 2016 by Prods Oktor Skjærvø Please do not cite in print without the author’s permission. This Introduction may be distributed freely as a service to teachers and students of Old Iranian. In my experience, it can be taught as a one-term full course at 4 hrs/w. My thanks to all of my students and colleagues, who have actively noted typos, inconsistencies of presentation, etc. TABLE OF CONTENTS Select bibliography ................................................................................................................................... 9 Sigla and Abbreviations ........................................................................................................................... 12 Lesson 1 ..................................................................................................................................................... 13 Old Persian and old Iranian. .................................................................................................................... 13 Script. Origin. .......................................................................................................................................... 14 Script. Writing system. ........................................................................................................................... 14 The syllabary. .......................................................................................................................................... 15 Logograms. ............................................................................................................................................ -
Alphabets, Letters and Diacritics in European Languages (As They Appear in Geography)
1 Vigleik Leira (Norway): [email protected] Alphabets, Letters and Diacritics in European Languages (as they appear in Geography) To the best of my knowledge English seems to be the only language which makes use of a "clean" Latin alphabet, i.d. there is no use of diacritics or special letters of any kind. All the other languages based on Latin letters employ, to a larger or lesser degree, some diacritics and/or some special letters. The survey below is purely literal. It has nothing to say on the pronunciation of the different letters. Information on the phonetic/phonemic values of the graphic entities must be sought elsewhere, in language specific descriptions. The 26 letters a, b, c, d, e, f, g, h, i, j, k, l, m, n, o, p, q, r, s, t, u, v, w, x, y, z may be considered the standard European alphabet. In this article the word diacritic is used with this meaning: any sign placed above, through or below a standard letter (among the 26 given above); disregarding the cases where the resulting letter (e.g. å in Norwegian) is considered an ordinary letter in the alphabet of the language where it is used. Albanian The alphabet (36 letters): a, b, c, ç, d, dh, e, ë, f, g, gj, h, i, j, k, l, ll, m, n, nj, o, p, q, r, rr, s, sh, t, th, u, v, x, xh, y, z, zh. Missing standard letter: w. Letters with diacritics: ç, ë. Sequences treated as one letter: dh, gj, ll, rr, sh, th, xh, zh.