Unicode Notes

Total Page:16

File Type:pdf, Size:1020Kb

Unicode Notes Unicode Notes NOTE: This is a “living document” and is regularly updated. I recommend book- marking it at one or more of the URLs on page 5 and updating frequently. Table of Contents Document Conventions and Styles.......................................................................................................4 Language..........................................................................................................................................4 Emacs and Emacs Lisp Terminology...............................................................................................4 File Format.......................................................................................................................................4 Document Availability.....................................................................................................................5 Organised Adversary Regional Download..................................................................................5 Dropbox......................................................................................................................................5 Unicode Version...................................................................................................................................6 Unicode Quick Stats........................................................................................................................6 Frequently checked references.........................................................................................................6 Code Point Insertion Methods..............................................................................................................7 Hotkeys............................................................................................................................................7 Unicode and File or Data Formats........................................................................................................9 Unicode and XML...........................................................................................................................9 Unicode and the Darwin Information Typing Architecture (DITA)............................................9 Unicode and DITA for Publishers (D4P)....................................................................................9 Unicode, HTML, XHTML and XML.........................................................................................9 Unicode and HTML 3.2 and earlier..........................................................................................10 Unicode and HTML 4.x............................................................................................................10 Unicode and XHTML...............................................................................................................10 Unicode and HTML 5...............................................................................................................10 Unicode and HTML 5 with XML.............................................................................................10 Unicode, HTML, XML and Ebooks.........................................................................................10 Unicode, HTML 3.2 and Open E-Book Publication Structure (OEBPS).................................11 OEBPS and Mobipocket...........................................................................................................11 Unicode, XHTML and EPUB 2.0.1..........................................................................................12 Unicode, HTML 5, XML and EPUB 3.0.1...............................................................................12 Unicode, HTML 5 and EPUB 3.1.............................................................................................12 Unicode, HTML 5 and above EPUB 3.1..................................................................................12 Embedding Fonts in EPUB Files...................................................................................................14 Font Obfuscation.......................................................................................................................14 Embedding Fonts for Apple iBooks..........................................................................................14 Recommended Fonts for Embedding........................................................................................15 Unicode and LaTeX.......................................................................................................................16 PDF and PostScript........................................................................................................................19 PostScript..................................................................................................................................19 PDF/A-1....................................................................................................................................19 PDF/X-3:2002...........................................................................................................................19 Last edit: 04/06/2020 Copyright © Benjamin D. McGinnes, 2010-2020 Page 1 of 78 File Names and Locations..............................................................................................................20 The Domain Name System and Unicode..................................................................................20 Unicode and Programming Languages..............................................................................................22 Unicode and Python.......................................................................................................................22 Python and Punycode................................................................................................................24 Python Handling of URIs, URLs and URNs............................................................................24 Lisp and Emacs Lisp......................................................................................................................25 Scheme and Guile..........................................................................................................................26 Unicode and Operating Systems........................................................................................................27 Unicode and Mac OS X.................................................................................................................27 Unicode and Linux.........................................................................................................................27 Unicode and BSD..........................................................................................................................27 Unicode and Windows...................................................................................................................27 Unicode and Software........................................................................................................................29 Unicode and Emacs.......................................................................................................................29 Unicode and Atom.........................................................................................................................30 Unicode and Vi and/or Vim...........................................................................................................30 Unicode and LibreOffice...............................................................................................................30 Unicode and oXygenXML Editor..................................................................................................31 Unicode and IRC...........................................................................................................................31 Unicode and HexChat or XChat...............................................................................................32 Unicode Utilities............................................................................................................................32 Unum.........................................................................................................................................32 Unicode and Related Subject Matter Experts................................................................................33 Joel Spolsky on Developers and Unicode.................................................................................33 Dan Sugalski on C, Unicode, strings and character encoding..................................................33 The USA's National Security Agency (NSA) on PDFs and Metadata......................................34 Conversions and Transformations......................................................................................................36 Emacs Modes.................................................................................................................................36 Htmlize......................................................................................................................................36 Org Mode..................................................................................................................................36 Pandoc............................................................................................................................................37 Docutils and reStructuredText.......................................................................................................38
Recommended publications
  • Circular 1 Copyright Basics
    CIRCULAR 1 Copyright Basics Copyright is a form of protection Copyright is a form of protection provided by the laws of the provided by U.S. law to authors of United States to the authors of “original works of authorship” that are fixed in a tangible form of expression. An original “original works of authorship” from work of authorship is a work that is independently created by the time the works are created in a a human author and possesses at least some minimal degree of creativity. A work is “fixed” when it is captured (either fixed form. This circular provides an by or under the authority of an author) in a sufficiently overview of basic facts about copyright permanent medium such that the work can be perceived, and copyright registration with the reproduced, or communicated for more than a short time. Copyright protection in the United States exists automatically U.S. Copyright Office. It covers from the moment the original work of authorship is fixed.1 • Works eligible for protection • Rights of copyright owners What Works Are Protected? • Who can claim copyright • Duration of copyright Examples of copyrightable works include • Literary works • Musical works, including any accompanying words • Dramatic works, including any accompanying music • Pantomimes and choreographic works • Pictorial, graphic, and sculptural works • Motion pictures and other audiovisual works • Sound recordings, which are works that result from the fixation of a series of musical, spoken, or other sounds • Architectural works These categories should be viewed broadly for the purpose of registering your work. For example, computer programs and certain “compilations” can be registered as “literary works”; maps and technical drawings can be registered as “pictorial, graphic, and sculptural works.” w copyright.gov note: Before 1978, federal copyright was generally secured by publishing a work with an appro- priate copyright notice.
    [Show full text]
  • Dynamic and Interactive R Graphics for the Web: the Gridsvg Package
    JSS Journal of Statistical Software MMMMMM YYYY, Volume VV, Issue II. http://www.jstatsoft.org/ Dynamic and Interactive R Graphics for the Web: The gridSVG Package Paul Murrell Simon Potter The Unversity of Auckland The Unversity of Auckland Abstract This article describes the gridSVG package, which provides functions to convert grid- based R graphics to an SVG format. The package also provides a function to associate hyperlinks with components of a plot, a function to animate components of a plot, a function to associate any SVG attribute with a component of a plot, and a function to add JavaScript code to a plot. The last two of these provides a basis for adding interactivity to the SVG version of the plot. Together these tools provide a way to generate dynamic and interactive R graphics for use in web pages. Keywords: world-wide web, graphics, R, SVG. 1. Introduction Interactive and dynamic plots within web pages are becomingly increasingly popular, as part of a general trend towards making data sets more open and accessible on the web, for example, GapMinder (Rosling 2008) and ManyEyes (Viegas, Wattenberg, van Ham, Kriss, and McKeon 2007). The R language and environment for statistical computing and graphics (R Development Core Team 2011) has many facilities for producing plots, and it can produce graphics formats that are suitable for including in web pages, but the core graphics facilities in R are largely focused on static plots. This article describes an R extension package, gridSVG, that is designed to embellish and transform a standard, static R plot and turn it into a dynamic and interactive plot that can be embedded in a web page.
    [Show full text]
  • The Unicode Cookbook for Linguists: Managing Writing Systems Using Orthography Profiles
    Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2017 The Unicode Cookbook for Linguists: Managing writing systems using orthography profiles Moran, Steven ; Cysouw, Michael DOI: https://doi.org/10.5281/zenodo.290662 Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-135400 Monograph The following work is licensed under a Creative Commons: Attribution 4.0 International (CC BY 4.0) License. Originally published at: Moran, Steven; Cysouw, Michael (2017). The Unicode Cookbook for Linguists: Managing writing systems using orthography profiles. CERN Data Centre: Zenodo. DOI: https://doi.org/10.5281/zenodo.290662 The Unicode Cookbook for Linguists Managing writing systems using orthography profiles Steven Moran & Michael Cysouw Change dedication in localmetadata.tex Preface This text is meant as a practical guide for linguists, and programmers, whowork with data in multilingual computational environments. We introduce the basic concepts needed to understand how writing systems and character encodings function, and how they work together. The intersection of the Unicode Standard and the International Phonetic Al- phabet is often not met without frustration by users. Nevertheless, thetwo standards have provided language researchers with a consistent computational architecture needed to process, publish and analyze data from many different languages. We bring to light common, but not always transparent, pitfalls that researchers face when working with Unicode and IPA. Our research uses quantitative methods to compare languages and uncover and clarify their phylogenetic relations. However, the majority of lexical data available from the world’s languages is in author- or document-specific orthogra- phies.
    [Show full text]
  • Fira Code: Monospaced Font with Programming Ligatures
    Personal Open source Business Explore Pricing Blog Support This repository Sign in Sign up tonsky / FiraCode Watch 282 Star 9,014 Fork 255 Code Issues 74 Pull requests 1 Projects 0 Wiki Pulse Graphs Monospaced font with programming ligatures 145 commits 1 branch 15 releases 32 contributors OFL-1.1 master New pull request Find file Clone or download lf- committed with tonsky Add mintty to the ligatures-unsupported list (#284) Latest commit d7dbc2d 16 days ago distr Version 1.203 (added `__`, closes #120) a month ago showcases Version 1.203 (added `__`, closes #120) a month ago .gitignore - Removed `!!!` `???` `;;;` `&&&` `|||` `=~` (closes #167) `~~~` `%%%` 3 months ago FiraCode.glyphs Version 1.203 (added `__`, closes #120) a month ago LICENSE version 0.6 a year ago README.md Add mintty to the ligatures-unsupported list (#284) 16 days ago gen_calt.clj Removed `/**` `**/` and disabled ligatures for `/*/` `*/*` sequences … 2 months ago release.sh removed Retina weight from webfonts 3 months ago README.md Fira Code: monospaced font with programming ligatures Problem Programmers use a lot of symbols, often encoded with several characters. For the human brain, sequences like -> , <= or := are single logical tokens, even if they take two or three characters on the screen. Your eye spends a non-zero amount of energy to scan, parse and join multiple characters into a single logical one. Ideally, all programming languages should be designed with full-fledged Unicode symbols for operators, but that’s not the case yet. Solution Download v1.203 · How to install · News & updates Fira Code is an extension of the Fira Mono font containing a set of ligatures for common programming multi-character combinations.
    [Show full text]
  • Discourse, Materiality and Power: Dietary Supplements and Their Users Susan Roberton Bidwell a Thesis Submitted for the Degree
    Discourse, Materiality and Power: Dietary Supplements and their Users Susan Roberton Bidwell A thesis submitted for the degree of Doctor of Philosophy of the University of Otago, Dunedin March 2020 Abstract Throughout human existence people have used herbs and other medicinal substances to protect themselves against illness and treat their ailments. Gathering wild herbs has, however, been replaced today by the many products on the shelves of health stores and pharmacies in developed countries with health systems similar to New Zealand. Previous studies of supplements and their use have largely focused on how many people use them, or subsumed supplement use within wider studies of complementary and alternative medicine (CAM). Supplement use, however, has characteristics that make it different from CAM therapies more broadly. Yet there have been only a few investigations where supplement users have been asked directly about their practices, and those that have been done have tended to be under- theorised and so lack depth. This study used a constructionist approach, within which supplements and their users were examined from both humanist and post-humanist perspectives. I used semi-structured interviews to generate data with 36 participants who were regular users of supplements. The interviews were supplemented by observations of the displays of products that participants brought to my attention in the home and retail settings where the interviews took place. A critical theoretical analysis was undertaken, framed by Deleuze and Guattari’s concept of the rhizomatic assemblage of multiple, interconnected actants which are always in a process of change. Within this wider framework, aspects of the data were examined using other theoretical concepts including deconstruction, the agency of matter, and Foucault’s ideas of power and caring for the self.
    [Show full text]
  • Suitcase Fusion 8 Getting Started
    Copyright © 2014–2018 Celartem, Inc., doing business as Extensis. This document and the software described in it are copyrighted with all rights reserved. This document or the software described may not be copied, in whole or part, without the written consent of Extensis, except in the normal use of the software, or to make a backup copy of the software. This exception does not allow copies to be made for others. Licensed under U.S. patents issued and pending. Celartem, Extensis, LizardTech, MrSID, NetPublish, Portfolio, Portfolio Flow, Portfolio NetPublish, Portfolio Server, Suitcase Fusion, Type Server, TurboSync, TeamSync, and Universal Type Server are registered trademarks of Celartem, Inc. The Celartem logo, Extensis logos, LizardTech logos, Extensis Portfolio, Font Sense, Font Vault, FontLink, QuickComp, QuickFind, QuickMatch, QuickType, Suitcase, Suitcase Attaché, Universal Type, Universal Type Client, and Universal Type Core are trademarks of Celartem, Inc. Adobe, Acrobat, After Effects, Creative Cloud, Creative Suite, Illustrator, InCopy, InDesign, Photoshop, PostScript, Typekit and XMP are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States and/or other countries. Apache Tika, Apache Tomcat and Tomcat are trademarks of the Apache Software Foundation. Apple, Bonjour, the Bonjour logo, Finder, iBooks, iPhone, Mac, the Mac logo, Mac OS, OS X, Safari, and TrueType are trademarks of Apple Inc., registered in the U.S. and other countries. macOS is a trademark of Apple Inc. App Store is a service mark of Apple Inc. IOS is a trademark or registered trademark of Cisco in the U.S. and other countries and is used under license. Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S.
    [Show full text]
  • Old Cyrillic in Unicode*
    Old Cyrillic in Unicode* Ivan A Derzhanski Institute for Mathematics and Computer Science, Bulgarian Academy of Sciences [email protected] The current version of the Unicode Standard acknowledges the existence of a pre- modern version of the Cyrillic script, but its support thereof is limited to assigning code points to several obsolete letters. Meanwhile mediæval Cyrillic manuscripts and some early printed books feature a plethora of letter shapes, ligatures, diacritic and punctuation marks that want proper representation. (In addition, contemporary editions of mediæval texts employ a variety of annotation signs.) As generally with scripts that predate printing, an obvious problem is the abundance of functional, chronological, regional and decorative variant shapes, the precise details of whose distribution are often unknown. The present contents of the block will need to be interpreted with Old Cyrillic in mind, and decisions to be made as to which remaining characters should be implemented via Unicode’s mechanism of variation selection, as ligatures in the typeface, or as code points in the Private space or the standard Cyrillic block. I discuss the initial stage of this work. The Unicode Standard (Unicode 4.0.1) makes a controversial statement: The historical form of the Cyrillic alphabet is treated as a font style variation of modern Cyrillic because the historical forms are relatively close to the modern appearance, and because some of them are still in modern use in languages other than Russian (for example, U+0406 “I” CYRILLIC CAPITAL LETTER I is used in modern Ukrainian and Byelorussian). Some of the letters in this range were used in modern typefaces in Russian and Bulgarian.
    [Show full text]
  • CSS 2.1) Specification
    Cascading Style Sheets Level 2 Revision 1 (CSS 2.1) Specification W3C Editors Draft 26 February 2014 This version: http://www.w3.org/TR/2014/ED-CSS2-20140226 Latest version: http://www.w3.org/TR/CSS2 Previous versions: http://www.w3.org/TR/2011/REC-CSS2-20110607 http://www.w3.org/TR/2008/REC-CSS2-20080411/ Editors: Bert Bos <bert @w3.org> Tantek Çelik <tantek @cs.stanford.edu> Ian Hickson <ian @hixie.ch> Håkon Wium Lie <howcome @opera.com> Please refer to the errata for this document. This document is also available in these non-normative formats: plain text, gzip'ed tar file, zip file, gzip'ed PostScript, PDF. See also translations. Copyright © 2011 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply. Abstract This specification defines Cascading Style Sheets, level 2 revision 1 (CSS 2.1). CSS 2.1 is a style sheet language that allows authors and users to attach style (e.g., fonts and spacing) to structured documents (e.g., HTML documents and XML applications). By separating the presentation style of documents from the content of documents, CSS 2.1 simplifies Web authoring and site maintenance. CSS 2.1 builds on CSS2 [CSS2] which builds on CSS1 [CSS1]. It supports media- specific style sheets so that authors may tailor the presentation of their documents to visual browsers, aural devices, printers, braille devices, handheld devices, etc. It also supports content positioning, table layout, features for internationalization and some properties related to user interface. CSS 2.1 corrects a few errors in CSS2 (the most important being a new definition of the height/width of absolutely positioned elements, more influence for HTML's "style" attribute and a new calculation of the 'clip' property), and adds a few highly requested features which have already been widely implemented.
    [Show full text]
  • FONT GUIDE Workbook to Help You Find the Right One for Your Brand
    FONT GUIDE Workbook to help you find the right one for your brand. www.ottocreative.com.au Choosing the right font for your brand YOUR BRAND VALUES: How different font styles can be used to make up your brand: Logo Typeface: Is usually a bit more special and packed with your brands personality. This font should be used sparingly and kept for special occasions. Headings font: Logo Font This font will reflect the same brand values as your logo font - eg in this example both fonts are feminine and elegant. Headings Unlike your logo typeface, this font should be easier to read and look good a number of different sizes and thicknesses. Body copy Body font: The main rule here is that this font MUST be easy to read, both digitally and for print. If there is already alot going on in your logo and heading font, keep this style simple. Typefaces, common associations & popular font styles San Serif: Clean, Modern, Neutral Try these: Roboto, Open Sans, Lato, Montserrat, Raleway Serif: Classic, Traditional, reliable Try these: Playfair Display, Lora, Source Serif Pro, Prata, Gentium Basic Slab Serif: Youthful, modern, approachable Try these: Roboto Slab, Merriweather, Slabo 27px, Bitter, Arvo Script: Feminine, Romantic, Elegant Try these: Dancing Script, Pacifico, Satisfy, Courgette, Great Vibes Monotype:Simple, Technical, Futuristic Try these: Source Code Pro, Nanum Gothic Coding, Fira Mono, Cutive Mono Handwritten: Authentic, casual, creative Try these: Indie Flower, Shadows into light, Amatic SC, Caveat, Kalam Display: Playful, fun, personality galore Try these: Lobster, Abril Fatface, Luckiest Guy, Bangers, Monoton NOTE: Be careful when using handwritten and display fonts, as they can be hard to read.
    [Show full text]
  • 264 Tugboat, Volume 37 (2016), No. 3 Typographers' Inn Peter Flynn
    264 TUGboat, Volume 37 (2016), No. 3 A Typographers’ Inn X LE TEX Peter Flynn Back at the ranch, we have been experimenting with X LE ATEX in our workflow, spurred on by two recent Dashing it off requests to use a specific set of OpenType fonts for A I recently put up a new version of Formatting Infor- some GNU/Linux documentation. X LE TEX offers A mation (http://latex.silmaril.ie), and in the two major improvements on pdfLTEX: the use of section on punctuation I described the difference be- OpenType and TrueType fonts, and the handling of tween hyphens, en rules, em rules, and minus signs. UTF-8 multibyte characters. In particular I explained how to type a spaced Font packages. You can’t easily use the font pack- dash — like that, using ‘dash~---Ђlike’ to put a A ages you use with pdfLTEX because the default font tie before the dash and a normal space afterwards, encoding is EU1 in the fontspec package which is key so that if the dash occurred near a line-break, it to using OTF/TTF fonts, rather than the T1 or OT1 would never end up at the start of a line, only at A conventionally used in pdfLTEX. But late last year the end. I somehow managed to imply that a spaced Herbert Voß kindly posted a list of the OTF/TTF dash was preferable to an unspaced one (probably fonts distributed with TEX Live which have packages because it’s my personal preference, but certainly A of their own for use with X LE TEX [6].
    [Show full text]
  • ART DIRECTION (Design Elements) Art Direction — Tallit Lines 01
    01 ART DIRECTION (design elements) Art Direction — Tallit lines 01 TALLIT (prayer shawl) What / Design / Jewish prayer shawl worn during prayers/ a special The lines of the tallit is something that Jews can easily occassion. It has special twined and knotted fringes identify with and relate to. They can be graphically known as tzitzit attached to it’s four corners. This intepreted into line dividers/ borders/ decorative literally means cloak or sheet. It has remained elements etc and would reiforce the prayer aspect an inseparable part of Jewish worship. of the event, conveying our support for Israel. Art Direction — Tallit lines 02 TALLIT (prayer shawl) Art Direction — Tzitzit 03 TZITZIT (tassels) What / Design / These are knotted fringes tied at the four corners of The intricacy of how the tzitzit threads intertwines is a the tallit. They must be made with intent, with specific beautiful and meaningful symbolism of the rich jewish number of threads that winds around and hangs losely culture and it’s significance today. Many jews still at the end. The number of winds adds up to represent attach tzitzits on their clothings as a reminder of the direct speilling of God’s name. The total number their Jewish laws and traditions. This would be an of threads adds up to the Jewish laws. ideal visual to celebrate their traditions and culture. Art Direction — Tzitzit 04 TZITZIT (tassels) Possible Design Elements a. b. c. (zoom out) (zoom out) . minimal . intricate . intricate . uniform & consistent clean lines . uniform & consistent clean lines . sketchy & fluid strokes . not accurate (no. of strokes) . accurate (no.
    [Show full text]
  • The Fontspec Package Font Selection for XƎLATEX and Lualatex
    The fontspec package Font selection for XƎLATEX and LuaLATEX Will Robertson and Khaled Hosny [email protected] 2013/05/12 v2.3b Contents 7.5 Different features for dif- ferent font sizes . 14 1 History 3 8 Font independent options 15 2 Introduction 3 8.1 Colour . 15 2.1 About this manual . 3 8.2 Scale . 16 2.2 Acknowledgements . 3 8.3 Interword space . 17 8.4 Post-punctuation space . 17 3 Package loading and options 4 8.5 The hyphenation character 18 3.1 Maths fonts adjustments . 4 8.6 Optical font sizes . 18 3.2 Configuration . 5 3.3 Warnings .......... 5 II OpenType 19 I General font selection 5 9 Introduction 19 9.1 How to select font features 19 4 Font selection 5 4.1 By font name . 5 10 Complete listing of OpenType 4.2 By file name . 6 font features 20 10.1 Ligatures . 20 5 Default font families 7 10.2 Letters . 20 6 New commands to select font 10.3 Numbers . 21 families 7 10.4 Contextuals . 22 6.1 More control over font 10.5 Vertical Position . 22 shape selection . 8 10.6 Fractions . 24 6.2 Math(s) fonts . 10 10.7 Stylistic Set variations . 25 6.3 Miscellaneous font select- 10.8 Character Variants . 25 ing details . 11 10.9 Alternates . 25 10.10 Style . 27 7 Selecting font features 11 10.11 Diacritics . 29 7.1 Default settings . 11 10.12 Kerning . 29 7.2 Changing the currently se- 10.13 Font transformations . 30 lected features .
    [Show full text]