Spacing Modifier Letters Range: 02B0–02FF the Unicode Standard
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Unicode Request for Cyrillic Modifier Letters Superscript Modifiers
Unicode request for Cyrillic modifier letters L2/21-107 Kirk Miller, [email protected] 2021 June 07 This is a request for spacing superscript and subscript Cyrillic characters. It has been favorably reviewed by Sebastian Kempgen (University of Bamberg) and others at the Commission for Computer Supported Processing of Medieval Slavonic Manuscripts and Early Printed Books. Cyrillic-based phonetic transcription uses superscript modifier letters in a manner analogous to the IPA. This convention is widespread, found in both academic publication and standard dictionaries. Transcription of pronunciations into Cyrillic is the norm for monolingual dictionaries, and Cyrillic rather than IPA is often found in linguistic descriptions as well, as seen in the illustrations below for Slavic dialectology, Yugur (Yellow Uyghur) and Evenki. The Great Russian Encyclopedia states that Cyrillic notation is more common in Russian studies than is IPA (‘Transkripcija’, Bol’šaja rossijskaja ènciplopedija, Russian Ministry of Culture, 2005–2019). Unicode currently encodes only three modifier Cyrillic letters: U+A69C ⟨ꚜ⟩ and U+A69D ⟨ꚝ⟩, intended for descriptions of Baltic languages in Latin script but ubiquitous for Slavic languages in Cyrillic script, and U+1D78 ⟨ᵸ⟩, used for nasalized vowels, for example in descriptions of Chechen. The requested spacing modifier letters cannot be substituted by the encoded combining diacritics because (a) some authors contrast them, and (b) they themselves need to be able to take combining diacritics, including diacritics that go under the modifier letter, as in ⟨ᶟ̭̈⟩BA . (See next section and e.g. Figure 18. ) In addition, some linguists make a distinction between spacing superscript letters, used for phonetic detail as in the IPA tradition, and spacing subscript letters, used to denote phonological concepts such as archiphonemes. -
The Unicode Cookbook for Linguists: Managing Writing Systems Using Orthography Profiles
Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2017 The Unicode Cookbook for Linguists: Managing writing systems using orthography profiles Moran, Steven ; Cysouw, Michael DOI: https://doi.org/10.5281/zenodo.290662 Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-135400 Monograph The following work is licensed under a Creative Commons: Attribution 4.0 International (CC BY 4.0) License. Originally published at: Moran, Steven; Cysouw, Michael (2017). The Unicode Cookbook for Linguists: Managing writing systems using orthography profiles. CERN Data Centre: Zenodo. DOI: https://doi.org/10.5281/zenodo.290662 The Unicode Cookbook for Linguists Managing writing systems using orthography profiles Steven Moran & Michael Cysouw Change dedication in localmetadata.tex Preface This text is meant as a practical guide for linguists, and programmers, whowork with data in multilingual computational environments. We introduce the basic concepts needed to understand how writing systems and character encodings function, and how they work together. The intersection of the Unicode Standard and the International Phonetic Al- phabet is often not met without frustration by users. Nevertheless, thetwo standards have provided language researchers with a consistent computational architecture needed to process, publish and analyze data from many different languages. We bring to light common, but not always transparent, pitfalls that researchers face when working with Unicode and IPA. Our research uses quantitative methods to compare languages and uncover and clarify their phylogenetic relations. However, the majority of lexical data available from the world’s languages is in author- or document-specific orthogra- phies. -
Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only. -
MUFI Character Recommendation V. 3.0: Code Chart Order
MUFI character recommendation Characters in the official Unicode Standard and in the Private Use Area for Medieval texts written in the Latin alphabet ⁋ ※ ð ƿ ᵹ ᴆ ※ ¶ ※ Part 2: Code chart order ※ Version 3.0 (5 July 2009) ※ Compliant with the Unicode Standard version 5.1 ____________________________________________________________________________________________________________________ ※ Medieval Unicode Font Initiative (MUFI) ※ www.mufi.info ISBN 978-82-8088-403-9 MUFI character recommendation ※ Part 2: code chart order version 3.0 p. 2 / 245 Editor Odd Einar Haugen, University of Bergen, Norway. Background Version 1.0 of the MUFI recommendation was published electronically and in hard copy on 8 December 2003. It was the result of an almost two-year-long electronic discussion within the Medieval Unicode Font Initiative (http://www.mufi.info), which was established in July 2001 at the International Medi- eval Congress in Leeds. Version 1.0 contained a total of 828 characters, of which 473 characters were selected from various charts in the official part of the Unicode Standard and 355 were located in the Private Use Area. Version 1.0 of the recommendation is compliant with the Unicode Standard version 4.0. Version 2.0 is a major update, published electronically on 22 December 2006. It contains a few corrections of misprints in version 1.0 and 516 additional char- acters (of which 123 are from charts in the official part of the Unicode Standard and 393 are additions to the Private Use Area). There are also 18 characters which have been decommissioned from the Private Use Area due to the fact that they have been included in later versions of the Unicode Standard (and, in one case, because a character has been withdrawn). -
The Brill Typeface User Guide & Complete List of Characters
The Brill Typeface User Guide & Complete List of Characters Version 2.06, October 31, 2014 Pim Rietbroek Preamble Few typefaces – if any – allow the user to access every Latin character, every IPA character, every diacritic, and to have these combine in a typographically satisfactory manner, in a range of styles (roman, italic, and more); even fewer add full support for Greek, both modern and ancient, with specialised characters that papyrologists and epigraphers need; not to mention coverage of the Slavic languages in the Cyrillic range. The Brill typeface aims to do just that, and to be a tool for all scholars in the humanities; for Brill’s authors and editors; for Brill’s staff and service providers; and finally, for anyone in need of this tool, as long as it is not used for any commercial gain.* There are several fonts in different styles, each of which has the same set of characters as all the others. The Unicode Standard is rigorously adhered to: there is no dependence on the Private Use Area (PUA), as it happens frequently in other fonts with regard to characters carrying rare diacritics or combinations of diacritics. Instead, all alphabetic characters can carry any diacritic or combination of diacritics, even stacked, with automatic correct positioning. This is made possible by the inclusion of all of Unicode’s combining characters and by the application of extensive OpenType Glyph Positioning programming. Credits The Brill fonts are an original design by John Hudson of Tiro Typeworks. Alice Savoie contributed to Brill bold and bold italic. The black-letter (‘Fraktur’) range of characters was made by Karsten Lücke. -
L2/06-244R 6
ISO/IEC JTC 1/SC 2/WG 2 PROPOSAL SUMMARY FORM TO ACCOMPANY SUBMISSIONS FOR ADDITIONS TO THE REPERTOIRE OF ISO/IEC 106461 Please fill all the sections A, B and C below. Please read Principles and Procedures Document (P & P) from http://www.dkuug.dk/JTC1/SC2/WG2/docs/principles.html for guidelines and details before filling this form. Please ensure you are using the latest Form from http://www.dkuug.dk/JTC1/SC2/WG2/docs/summaryform.html. See also http://www.dkuug.dk/JTC1/SC2/WG2/docs/roadmaps.html for latest Roadmaps. A. Administrative 1. Title: Proposal to Encode Modifier Letter Low Circumflex Accent 2. Requester's name: Lorna A. Priest 3. Requester type (Member body/Liaison/Individual contribution): Individual contribution 4. Submission date: 20 July 2006 (updated 5 Sept 2006) 5. Requester's reference (if applicable): L2/06-244R 6. Choose one of the following: This is a complete proposal: Yes or, More information will be provided later: No B. Technical – General 1. Choose one of the following: a. This proposal is for a new script (set of characters): No Proposed name of script: b. The proposal is for addition of character(s) to an existing block: Yes Name of the existing block: Spacing Modifier Letters 2. Number of characters in proposal: 1 3. Proposed category (select one from below - see section 2.2 of P&P document): A-Contemporary x B.1-Specialized (small collection) B.2-Specialized (large collection) C-Major extinct D-Attested extinct E-Minor extinct F-Archaic Hieroglyphic or Ideographic G-Obscure or questionable usage symbols 4. -
Junicode, V. 0.6.5
Junicode, v. 0.6.5 Table of Contents What is Junicode?...............................................................................................................2 GNU General Public License..............................................................................................3 How the GNU General Public License Applies to Junicode..............................................11 How to install Junicode.....................................................................................................12 1. Microsoft Windows..................................................................................................12 2. Macintosh OS X.......................................................................................................12 3. Linux.......................................................................................................................12 Reading the Code Charts..................................................................................................12 Basic Latin (0000-007F)....................................................................................................14 Latin 1 Supplement (0080-00FF)......................................................................................15 Latin Extended A (0100-017F).........................................................................................16 Latin Extended B (0180-024F).........................................................................................17 IPA Extensions (0250-02AF)............................................................................................18 -
Doulos SIL Font 2006-01-31 Page 1 of 27
Doulos SIL Font 2006-01-31 Page 1 of 27 Doulos SIL Font NRSI staff, SIL Non-Roman Script Initiative (NRSI) 2006-01-31 Note Updates to this font and the documentation are available online at: http://scripts.sil.org/DoulosSILfont. Table of Contents Introduction ................................................................................................................................................2 Introduction to the Doulos SIL Font Package...................................................................................2 Overview of the Doulos SIL Font .......................................................................................................2 Documentation ...........................................................................................................................................3 System requirements..........................................................................................................................3 Features of the font.............................................................................................................................3 Samples ...............................................................................................................................................4 Supported character ranges ..............................................................................................................4 Private-use (PUA) characters.............................................................................................................5 Advanced typographic -
The Unicode Cookbook for Linguists
The Unicode Cookbook for Linguists Managing writing systems using orthography profiles Steven Moran Michael Cysouw language Translation and Multilingual Natural science press Language Processing 10 Translation and Multilingual Natural Language Processing Editors: Oliver Czulo (Universität Leipzig), Silvia Hansen-Schirra (Johannes Gutenberg-Universität Mainz), Reinhard Rapp (Johannes Gutenberg-Universität Mainz) In this series: 1. Fantinuoli, Claudio & Federico Zanettin (eds.). New directions in corpus-based translation studies. 2. Hansen-Schirra, Silvia & Sambor Grucza (eds.). Eyetracking and Applied Linguistics. 3. Neumann, Stella, Oliver Čulo & Silvia Hansen-Schirra (eds.). Annotation, exploitation and evaluation of parallel corpora: TC3 I. 4. Czulo, Oliver & Silvia Hansen-Schirra (eds.). Crossroads between Contrastive Linguistics, Translation Studies and Machine Translation: TC3 II. 5. Rehm, Georg, Felix Sasaki, Daniel Stein & Andreas Witt (eds.). Language technologies for a multilingual Europe: TC3 III. 6. Menzel, Katrin, Ekaterina Lapshinova-Koltunski & Kerstin Anna Kunz (eds.). New perspectives on cohesion and coherence: Implications for translation. 7. Hansen-Schirra, Silvia, Oliver Czulo & Sascha Hofmann (eds). Empirical modelling of translation and interpreting. 8. Svoboda, Tomáš, Łucja Biel & Krzysztof Łoboda (eds.). Quality aspects in institutional translation. 9. Fox, Wendy. Can integrated titles improve the viewing experience? Investigating the impact of subtitling on the reception and enjoyment of film using eye tracking and questionnaire data. 10. Moran, Steven & Michael Cysouw. The Unicode cookbook for linguists: Managing writing systems using orthography profiles ISSN: 2364-8899 The Unicode Cookbook for Linguists Managing writing systems using orthography profiles Steven Moran Michael Cysouw language science press Steven Moran & Michael Cysouw. 2018. The Unicode Cookbook for Linguists: Managing writing systems using orthography profiles (Translation and Multilingual Natural Language Processing 10). -
Spacing Modifier Letters Range: 02B0–02FF
Spacing Modifier Letters Range: 02B0–02FF This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation. -
Phonetic Symbols in Word Processing and on the Web
Phonetic symbols in word processing and on the web J.C. Wells University College London E-mail: [email protected] ii. Custom one-byte fonts. These became available during ABSTRACT the 1990s. For non-ASCII characters a special font (in our case, a phonetic font) is used. It must be selected when a Unicode provides a single coding system for all scripts used special symbol is required, and deselected when the in printing the languages of the world, and includes the standard Latin alphabet is required. As far as phonetic fonts entire International Phonetic Alphabet. A standard are concerned, various proprietary and free fonts are Unicode-based phonetic font is now routinely bundled with available, including those provided by the Summer Institute the software supplied for new personal computers. Unlike of Linguists (www.sil.org). This is a reasonably satisfactory the situation four years ago, most current browsers, solution, and is what many phoneticians still use. Its main word-processing packages, fonts and printers support disadvantage is that the special font has to be installed in Unicode. These welcome developments render obsolete the every computer involved (unless for some specific unstandardized and proprietary phonetic fonts hitherto in document it is embedded into the file, as with .pdf files). use. They are, however, poorly documented and have not Furthermore, the lack of standardization means that been widely publicized. Several Unicode-based phonetic different fonts use different coding and different keyboard fonts are now available, and are listed and compared. The layouts, so that conversion from one to another is difficult main problem outstanding is keyboarding: how does the or impractical. -
Unicode Request for IPA Modifier-Letters (A), Pulmonic Background
Unicode request for IPA modifier-letters (a), pulmonic Kirk Miller, [email protected] Michael Ashby (President, IPA), [email protected] 2020 September 25 This proposal, officially supported by the International Phonetic Association (see letter on page 8), requests complete modifier support for current IPA letters, as well as for a few retired IPA letters, within the scope detailed below. Requested ⟨ ⟩ are also supported by the extIPA proposal recently accepted by the UTC in July 2020. The modifier lateral fricatives ⟨ ⟩ were accepted by the UTC at that time and are not repeated here, but note that they have the support of the IPA in addition to the International Clinical Phonetics and Linguistics Association. Due to the differing nature of the argument for acceptance by Unicode, letters for non-pulmonic consonants are requested in Unicode request for IPA modifier-letter support (b), non-pulmonic, for separate consideration by the UTC. In addition, the question of whether the IPA forms of modifier beta and chi should be Greek or Latin was only asked of the IPA in September, after SAH rejected Latin forms as not being sufficiently distinct for separate encoding. An adequate formal response from the IPA council may take some time, so evidence for those letters will be presented in the separate Unicode request for IPA modifier-letter support (c), Greek letters. Thanks to Deborah Anderson of the Universal Scripts Project for her assistance. Background This request expands on Peter Constable’s 2003 ‘Proposal to Encode Additional Phonetic Modifier Letters in the UCS’ (https://www.unicode.org/L2/L2003/03180-add-mod-ltr.pdf), and illustrates several characters that were requested in that proposal, but not illustrated and therefore not accepted at the time.