Basic Latin Range: 0000–007F

Total Page:16

File Type:pdf, Size:1020Kb

Basic Latin Range: 0000–007F C0 Controls and Basic Latin Range: 0000–007F This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation. Copying characters from the character code tables or list of character names is not recommended, because for production reasons the PDF files for the code charts cannot guarantee that the correct character codes will always be copied. Fonts The shapes of the reference glyphs used in these code charts are not prescriptive. Considerable variation is to be expected in actual fonts. The particular fonts used in these charts were provided to the Unicode Consortium by a number of different font designers, who own the rights to the fonts. See https://www.unicode.org/charts/fonts.html for a list. Terms of Use You may freely use these code charts for personal or internal business uses only. You may not incorporate them either wholly or in part into any product or publication, or otherwise distribute them without express written permission from the Unicode Consortium. However, you may provide links to these charts. The fonts and font data used in production of these code charts may NOT be extracted, or used in any other way in any product or publication, without permission or license granted by the typeface owner(s). The Unicode Consortium is not liable for errors or omissions in this file or the standard itself. Information on characters added to the Unicode Standard since the publication of the most recent version of the Unicode Standard, as well as on characters currently being considered for addition to the Unicode Standard can be found on the Unicode web site. See https://www.unicode.org/pending/pending.html and https://www.unicode.org/alloc/Pipeline.html. Copyright © 1991-2021 Unicode, Inc. All rights reserved. 0000 C0 Controls and Basic Latin 007F 000 001 002 003 004 005 006 007 0 0 @ P ` p 0000 0010 0020 0030 0040 0050 0060 0070 1 ! 1 A Q a q 0001 0011 0021 0031 0041 0051 0061 0071 2 " 2 B R b r 0002 0012 0022 0032 0042 0052 0062 0072 3 # 3 C S c s 0003 0013 0023 0033 0043 0053 0063 0073 4 $ 4 D T d t 0004 0014 0024 0034 0044 0054 0064 0074 5 % 5 E U e u 0005 0015 0025 0035 0045 0055 0065 0075 6 & 6 F V f v 0006 0016 0026 0036 0046 0056 0066 0076 7 ' 7 G W g w 0007 0017 0027 0037 0047 0057 0067 0077 8 ( 8 H X h x 0008 0018 0028 0038 0048 0058 0068 0078 9 ) 9 I Y i y 0009 0019 0029 0039 0049 0059 0069 0079 A * : J Z j z 000A 001A 002A 003A 004A 005A 006A 007A B + ; K [ k { 000B 001B 002B 003B 004B 005B 006B 007B C , < L \ l | 000C 001C 002C 003C 004C 005C 006C 007C D - = M ] m } 000D 001D 002D 003D 004D 005D 006D 007D E . > N ^ n ~ 000E 001E 002E 003E 004E 005E 006E 007E F / ? O _ o 000F 001F 002F 003F 004F 005F 006F 007F The Unicode Standard 14.0, Copyright © 1991-2021 Unicode, Inc. All rights reserved. 0000 C0 Controls and Basic Latin 0022 C0 controls 001A <control> Alias names are those for ISO/IEC 6429:1992. Commonly used = SUBSTITUTE alternative aliases are also shown. → FFFD replacement character <control> 0000 <control> 001B = ESCAPE = NULL <control> 0001 <control> 001C = START OF HEADING = INFORMATION SEPARATOR FOUR <control> = file separator (FS) 0002 <control> = START OF TEXT 001D <control> = INFORMATION SEPARATOR THREE 0003 = group separator (GS) = END OF TEXT <control> <control> 001E 0004 = INFORMATION SEPARATOR TWO = END OF TRANSMISSION = record separator (RS) <control> 0005 001F <control> = ENQUIRY = INFORMATION SEPARATOR ONE <control> 0006 = unit separator (US) = ACKNOWLEDGE 0007 <control> ASCII punctuation and symbols = BELL Based on ISO/IEC 646. 0008 <control> 0020 SPACE = BACKSPACE • sometimes considered a control code 0009 <control> • other space characters: 2000 –200A = CHARACTER TABULATION → 00A0 no-break space = horizontal tabulation (HT) 200B zero width space = tab → <control> → 202F narrow no-break space 000A 2060 word joiner = LINE FEED (LF) → ␠ symbol for space = new line (NL) → 2420 = end of line (EOL) → 2422 ␢ blank symbol 000B <control> → 2423 ␣ open box = LINE TABULATION → 3000 ideographic space = vertical tabulation (VT) → FEFF zero width no-break space EXCLAMATION MARK 000C <control> 0021 ! = FORM FEED (FF) = factorial 000D <control> = bang = CARRIAGE RETURN (CR) → 00A1 ¡ inverted exclamation mark 000E <control> → 01C3 ǃ latin letter retroflex click = SHIFT OUT → 203C ‼ double exclamation mark • known as LOCKING-SHIFT ONE in 8-bit → 203D ‽ interrobang environments → 26A0 ⚠ warning sign 000F <control> → 2757 ❗ heavy exclamation mark symbol = SHIFT IN → 2762 ❢ heavy exclamation mark ornament • known as LOCKING-SHIFT ZERO in 8-bit → 2E53 ⹓ medieval exclamation mark environments → A71D ꜝ modifier letter raised exclamation mark 0010 <control> 0022 " QUOTATION MARK = DATA LINK ESCAPE = double quote 0011 <control> • neutral (vertical), used as opening or closing = DEVICE CONTROL ONE quotation mark 0012 <control> • preferred characters in English for paired = DEVICE CONTROL TWO quotation marks are 201C “ & 201D ” is preferred for gershayim when writing ״ <control> • 05F4 0013 = DEVICE CONTROL THREE Hebrew 0014 <control> → 02BA ʺ modifier letter double prime = DEVICE CONTROL FOUR → 02DD ˝ double acute accent 0015 <control> → 02EE ˮ modifier letter double apostrophe = NEGATIVE ACKNOWLEDGE → 030B $̋ combining double acute accent 0016 <control> → 030E $̎ combining double vertical line above hebrew punctuation gershayim ״ SYNCHRONOUS IDLE → 05F4 = 0017 <control> → 201C “ left double quotation mark = END OF TRANSMISSION BLOCK → 201D ” right double quotation mark <control> 0018 → 2033 ″ double prime = CANCEL → 3003 〃 ditto mark 0019 <control> = END OF MEDIUM The Unicode Standard 14.0, Copyright © 1991-2021 Unicode, Inc. All rights reserved. 0023 C0 Controls and Basic Latin 002F 0023 # NUMBER SIGN 002A * ASTERISK = pound sign (weight) = star = hashtag, hash • can have five or six spokes = crosshatch, octothorpe arabic five pointed star ٭ 066D → • for denoting musical sharp 266F ♯ is preferred → 2042 ⁂ asterism → 2114 ℔ l b bar symbol → 204E ⁎ low asterisk → 2116 № numero sign → 2051 ⁑ two asterisks aligned vertically → 2317 ⌗ viewdata square → 20F0 $⃰ combining asterisk above → 266F ♯ music sharp sign → 2217 ∗ asterisk operator → 29E3 ⧣ equals sign and slanted parallel → 26B9 ⚹ sextile DOLLAR SIGN 0024 $ → 2731 ✱ heavy asterisk = milréis, escudo → A673 ꙳ slavonic asterisk • used for many peso currencies in Latin America → 1F7B6 medium six spoked asterisk and elsewhere • glyph may have one or two vertical bars ASCII math operator • other currency symbol characters start at 002B + PLUS SIGN 20A0 ₠ → 02D6 ˖ modifier letter plus sign → 00A2 ¢ cent sign → 2212 − minus sign → 00A4 ¤ currency sign → 2795 ➕ heavy plus sign → 20B1 ₱ peso sign → FB29 ﬩ hebrew letter alternative plus sign → 1F4B2 heavy dollar sign → 1F7A2 light greek cross 0025 % PERCENT SIGN ASCII punctuation 066A ٪ arabic percent sign 002C , COMMA → → 2030 ‰ per mille sign = decimal separator → 2031 ‱ per ten thousand sign 060C ، arabic comma → → 2052 ⁒ commercial minus sign 066B arabic decimal separator ٫ → AMPERSAND 0026 & 201A ‚ single low-9 quotation mark = and → 2E41 ⹁ reversed comma originally derived from a ligature of ‘e’ and ‘t’ → • 2E4C ⹌ medieval comma 204A ⁊ tironian sign et → → → 3001 、 ideographic comma → 214B ⅋ turned ampersand 002D - HYPHEN-MINUS → 1F674 heavy ampersand ornament = hyphen, dash 0027 ' APOSTROPHE = minus sign = apostrophe-quote (1.0) • used generically for hyphen, minus sign or en = single quote dash, all of which have dedicated alternatives = APL quote → 00AD soft hyphen • neutral (vertical) glyph with mixed usage → 02D7 ˗ modifier letter minus sign • 2019 ’ is preferred for apostrophe → 2010 ‐ hyphen • preferred characters in English for paired 2011 non-breaking hyphen quotation marks are 2018 ‘ & 2019 ’ → → 2012 ‒ figure dash is preferred for geresh when writing en dash ׳ 05F3 • Hebrew → 2013 – 2027 hyphenation point 02B9 ʹ modifier letter prime → ‧ → 2043 hyphen bullet 02BC ʼ modifier letter apostrophe → ⁃ → 2212 minus sign 02C8 ˈ modifier letter vertical line → − → 10191 roman uncia sign 0301
Recommended publications
  • Research Brief March 2017 Publication #2017-16
    Research Brief March 2017 Publication #2017-16 Flourishing From the Start: What Is It and How Can It Be Measured? Kristin Anderson Moore, PhD, Child Trends Christina D. Bethell, PhD, The Child and Adolescent Health Measurement Introduction Initiative, Johns Hopkins Bloomberg School of Every parent wants their child to flourish, and every community wants its Public Health children to thrive. It is not sufficient for children to avoid negative outcomes. Rather, from their earliest years, we should foster positive outcomes for David Murphey, PhD, children. Substantial evidence indicates that early investments to foster positive child development can reap large and lasting gains.1 But in order to Child Trends implement and sustain policies and programs that help children flourish, we need to accurately define, measure, and then monitor, “flourishing.”a Miranda Carver Martin, BA, Child Trends By comparing the available child development research literature with the data currently being collected by health researchers and other practitioners, Martha Beltz, BA, we have identified important gaps in our definition of flourishing.2 In formerly of Child Trends particular, the field lacks a set of brief, robust, and culturally sensitive measures of “thriving” constructs critical for young children.3 This is also true for measures of the promotive and protective factors that contribute to thriving. Even when measures do exist, there are serious concerns regarding their validity and utility. We instead recommend these high-priority measures of flourishing
    [Show full text]
  • 1 Introduction 1
    The Unicode® Standard Version 13.0 – Core Specification To learn about the latest version of the Unicode Standard, see http://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2020 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at http://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see http://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 13.0. Includes index. ISBN 978-1-936213-26-9 (http://www.unicode.org/versions/Unicode13.0.0/) 1.
    [Show full text]
  • A Ruse Secluded Character Set for the Source
    Mukt Shabd Journal Issn No : 2347-3150 A Ruse Secluded character set for the Source Mr. J Purna Prakash1, Assistant Professor Mr. M. Rama Raju 2, Assistant Professor Christu Jyothi Institute of Technology & Science Abstract We are rich in data, but information is poor, typically world wide web and data streams. The effective and efficient analysis of data in which is different forms becomes a challenging task. Searching for knowledge to match the exact keyword is big task in Internet such as search engine. Now a days using Unicode Transform Format (UTF) is extended to UTF-16 and UTF-32. With helps to create more special characters how we want. China has GB 18030-character set. Less number of website are using ASCII format in china, recently. While searching some keyword we are unable get the exact webpage in search engine in top place. Issues in certain we face this problem in results announcement, notifications, latest news, latest products released. Mainly on government websites are not shown in the front page. To avoid this trap from common people, we require special character set to match the exact unique keyword. Most of the keywords are encoded with the ASCII format. While searching keyword called cbse net results thousands of websites will have the common keyword as cbse net results. Matching the keyword, it is already encoded in all website as ASCII format. Most of the government websites will not offer search engine optimization. Match a unique keyword in government, banking, Institutes, Online exam purpose. Proposals is to create a character set from A to Z and a to z, for the purpose of data cleaning.
    [Show full text]
  • The Unicode Cookbook for Linguists: Managing Writing Systems Using Orthography Profiles
    Zurich Open Repository and Archive University of Zurich Main Library Strickhofstrasse 39 CH-8057 Zurich www.zora.uzh.ch Year: 2017 The Unicode Cookbook for Linguists: Managing writing systems using orthography profiles Moran, Steven ; Cysouw, Michael DOI: https://doi.org/10.5281/zenodo.290662 Posted at the Zurich Open Repository and Archive, University of Zurich ZORA URL: https://doi.org/10.5167/uzh-135400 Monograph The following work is licensed under a Creative Commons: Attribution 4.0 International (CC BY 4.0) License. Originally published at: Moran, Steven; Cysouw, Michael (2017). The Unicode Cookbook for Linguists: Managing writing systems using orthography profiles. CERN Data Centre: Zenodo. DOI: https://doi.org/10.5281/zenodo.290662 The Unicode Cookbook for Linguists Managing writing systems using orthography profiles Steven Moran & Michael Cysouw Change dedication in localmetadata.tex Preface This text is meant as a practical guide for linguists, and programmers, whowork with data in multilingual computational environments. We introduce the basic concepts needed to understand how writing systems and character encodings function, and how they work together. The intersection of the Unicode Standard and the International Phonetic Al- phabet is often not met without frustration by users. Nevertheless, thetwo standards have provided language researchers with a consistent computational architecture needed to process, publish and analyze data from many different languages. We bring to light common, but not always transparent, pitfalls that researchers face when working with Unicode and IPA. Our research uses quantitative methods to compare languages and uncover and clarify their phylogenetic relations. However, the majority of lexical data available from the world’s languages is in author- or document-specific orthogra- phies.
    [Show full text]
  • PCL PC-8, Code Page 437 Page 1 of 5 PCL PC-8, Code Page 437
    PCL PC-8, Code Page 437 Page 1 of 5 PCL PC-8, Code Page 437 PCL Symbol Set: 10U Unicode glyph correspondence tables. Contact:[email protected] http://pcl.to -- -- -- -- $90 U00C9 Ê Uppercase e acute $21 U0021 Ë Exclamation $91 U00E6 Ì Lowercase ae diphthong $22 U0022 Í Neutral double quote $92 U00C6 Î Uppercase ae diphthong $23 U0023 Ï Number $93 U00F4 & Lowercase o circumflex $24 U0024 ' Dollar $94 U00F6 ( Lowercase o dieresis $25 U0025 ) Per cent $95 U00F2 * Lowercase o grave $26 U0026 + Ampersand $96 U00FB , Lowercase u circumflex $27 U0027 - Neutral single quote $97 U00F9 . Lowercase u grave $28 U0028 / Left parenthesis $98 U00FF 0 Lowercase y dieresis $29 U0029 1 Right parenthesis $99 U00D6 2 Uppercase o dieresis $2A U002A 3 Asterisk $9A U00DC 4 Uppercase u dieresis $2B U002B 5 Plus $9B U00A2 6 Cent sign $2C U002C 7 Comma, decimal separator $9C U00A3 8 Pound sterling $2D U002D 9 Hyphen $9D U00A5 : Yen sign $2E U002E ; Period, full stop $9E U20A7 < Pesetas $2F U002F = Solidus, slash $9F U0192 > Florin sign $30 U0030 ? Numeral zero $A0 U00E1 ê Lowercase a acute $31 U0031 A Numeral one $A1 U00ED B Lowercase i acute $32 U0032 C Numeral two $A2 U00F3 D Lowercase o acute $33 U0033 E Numeral three $A3 U00FA F Lowercase u acute $34 U0034 G Numeral four $A4 U00F1 H Lowercase n tilde $35 U0035 I Numeral five $A5 U00D1 J Uppercase n tilde $36 U0036 K Numeral six $A6 U00AA L Female ordinal (a) http://www.pclviewer.com (c) RedTitan Technology 2005 PCL PC-8, Code Page 437 Page 2 of 5 $37 U0037 M Numeral seven $A7 U00BA N Male ordinal (o) $38 U0038
    [Show full text]
  • Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
    1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only.
    [Show full text]
  • Percent R, X and Z Based on Transformer KVA
    SHORT CIRCUIT FAULT CALCULATIONS Short circuit fault calculations as required to be performed on all electrical service entrances by National Electrical Code 110-9, 110-10. These calculations are made to assure that the service equipment will clear a fault in case of short circuit. To perform the fault calculations the following information must be obtained: 1. Available Power Company Short circuit KVA at transformer primary : Contact Power Company, may also be given in terms of R + jX. 2. Length of service drop from transformer to building, Type and size of conductor, ie., 250 MCM, aluminum. 3. Impedance of transformer, KVA size. A. %R = Percent Resistance B. %X = Percent Reactance C. %Z = Percent Impedance D. KVA = Kilovoltamp size of transformer. ( Obtain for each transformer if in Bank of 2 or 3) 4. If service entrance consists of several different sizes of conductors, each must be adjusted by (Ohms for 1 conductor) (Number of conductors) This must be done for R and X Three Phase Systems Wye Systems: 120/208V 3∅, 4 wire 277/480V 3∅ 4 wire Delta Systems: 120/240V 3∅, 4 wire 240V 3∅, 3 wire 480 V 3∅, 3 wire Single Phase Systems: Voltage 120/240V 1∅, 3 wire. Separate line to line and line to neutral calculations must be done for single phase systems. Voltage in equations (KV) is the secondary transformer voltage, line to line. Base KVA is 10,000 in all examples. Only those components actually in the system have to be included, each component must have an X and an R value. Neutral size is assumed to be the same size as the phase conductors.
    [Show full text]
  • The Constraint on Public Debt When R<G but G<M
    The constraint on public debt when r < g but g < m Ricardo Reis LSE March 2021 Abstract With real interest rates below the growth rate of the economy, but the marginal prod- uct of capital above it, the public debt can be lower than the present value of primary surpluses because of a bubble premia on the debt. The government can run a deficit forever. In a model that endogenizes the bubble premium as arising from the safety and liquidity of public debt, more government spending requires a larger bubble pre- mium, but because people want to hold less debt, there is an upper limit on spending. Inflation reduces the fiscal space, financial repression increases it, and redistribution of wealth or income taxation have an unconventional effect on fiscal capacity through the bubble premium. JEL codes: D52, E62, G10, H63. Keywords: Debt limits, debt sustainability, incomplete markets, misallocation. * Contact: [email protected]. I am grateful to Adrien Couturier and Rui Sousa for research assistance, to John Cochrane, Daniel Cohen, Fiorella de Fiore, Xavier Gabaix, N. Gregory Mankiw, Jean-Charles Rochet, John Taylor, Andres Velasco, Ivan Werning, and seminar participants at the ASSA, Banque de France - PSE, BIS, NBER Economic Fluctuations group meetings, Princeton University, RIDGE, and University of Zurich for comments. This paper was written during a Lamfalussy fellowship at the BIS, whom I thank for its hospitality. This project has received funding from the European Union’s Horizon 2020 research and innovation programme, INFL, under grant number No. GA: 682288. First draft: November 2020. 1 Introduction Almost every year in the past century (and maybe longer), the long-term interest rate on US government debt (r) was below the growth rate of output (g).
    [Show full text]
  • Chinese Script Generation Panel Document
    Chinese Script Generation Panel Document Proposal for the Generation Panel for the Chinese Script Label Generation Ruleset for the Root Zone 1. General Information Chinese script is the logograms used in the writing of Chinese and some other Asian languages. They are called Hanzi in Chinese, Kanji in Japanese and Hanja in Korean. Since the Hanzi unification in the Qin dynasty (221-207 B.C.), the most important change in the Chinese Hanzi occurred in the middle of the 20th century when more than two thousand Simplified characters were introduced as official forms in Mainland China. As a result, the Chinese language has two writing systems: Simplified Chinese (SC) and Traditional Chinese (TC). Both systems are expressed using different subsets under the Unicode definition of the same Han script. The two writing systems use SC and TC respectively while sharing a large common “unchanged” Hanzi subset that occupies around 60% in contemporary use. The common “unchanged” Hanzi subset enables a simplified Chinese user to understand texts written in traditional Chinese with little difficulty and vice versa. The Hanzi in SC and TC have the same meaning and the same pronunciation and are typical variants. The Japanese kanji were adopted for recording the Japanese language from the 5th century AD. Chinese words borrowed into Japanese could be written with Chinese characters, while Japanese words could be written using the character for a Chinese word of similar meaning. Finally, in Japanese, all three scripts (kanji, and the hiragana and katakana syllabaries) are used as main scripts. The Chinese script spread to Korea together with Buddhism from the 2nd century BC to the 5th century AD.
    [Show full text]
  • Reading Source Data in R
    Reading Source Data in R R is useful for analyzing data, once there is data to be analyzed. Real datasets come from numerous places in various formats, many of which actually can be leveraged by the savvy R programmer with the right set of packages and practices. *** Note: It is possible to stream data in R, but it’s probably outside of the scope of most educational uses. We will only be worried about static data. *** Understanding the flow of data How many times have you accessed some information in a browser or software package and wished you had that data as a spreadsheet or csv instead? R, like most programming languages, can serve as the bridge between the information you have and the information you want. A significant amount of programming is an exercise in changing file formats. Source Result Data R Data To use R for data manipulation and analysis, source data must be read in, analyzed or manipulated as necessary, and the results are output in some meaningful format. This document focuses on the red arrow: How is source data read into R? The ability to access data is fundamental to building useful, productive systems, and is sometimes the biggest challenge Immediately preceding the scope of this document is the source data itself. It is left to the reader to understand the source data: How and where it is stored, in what file formats, file ownership and permissions, etc. Immediately beyond the scope of this document is actual programming. It is left to the reader to determine which tools and methods are best applied to the source data to yield the desired results.
    [Show full text]
  • Legacy Character Sets & Encodings
    Legacy & Not-So-Legacy Character Sets & Encodings Ken Lunde CJKV Type Development Adobe Systems Incorporated bc ftp://ftp.oreilly.com/pub/examples/nutshell/cjkv/unicode/iuc15-tb1-slides.pdf Tutorial Overview dc • What is a character set? What is an encoding? • How are character sets and encodings different? • Legacy character sets. • Non-legacy character sets. • Legacy encodings. • How does Unicode fit it? • Code conversion issues. • Disclaimer: The focus of this tutorial is primarily on Asian (CJKV) issues, which tend to be complex from a character set and encoding standpoint. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations dc • GB (China) — Stands for “Guo Biao” (国标 guóbiâo ). — Short for “Guojia Biaozhun” (国家标准 guójiâ biâozhün). — Means “National Standard.” • GB/T (China) — “T” stands for “Tui” (推 tuî ). — Short for “Tuijian” (推荐 tuîjiàn ). — “T” means “Recommended.” • CNS (Taiwan) — 中國國家標準 ( zhôngguó guójiâ biâozhün) in Chinese. — Abbreviation for “Chinese National Standard.” 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations (Cont’d) dc • GCCS (Hong Kong) — Abbreviation for “Government Chinese Character Set.” • JIS (Japan) — 日本工業規格 ( nihon kôgyô kikaku) in Japanese. — Abbreviation for “Japanese Industrial Standard.” — 〄 • KS (Korea) — 한국 공업 규격 (韓國工業規格 hangug gongeob gyugyeog) in Korean. — Abbreviation for “Korean Standard.” — ㉿ — Designation change from “C” to “X” on August 20, 1997. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated Terminology & Abbreviations (Cont’d) dc • TCVN (Vietnam) — Tiu Chun Vit Nam in Vietnamese. — Means “Vietnamese Standard.” • CJKV — Chinese, Japanese, Korean, and Vietnamese. 15th International Unicode Conference Copyright © 1999 Adobe Systems Incorporated What Is A Character Set? dc • A collection of characters that are intended to be used together to create meaningful text.
    [Show full text]
  • Unicode Alphabets for L ATEX
    Unicode Alphabets for LATEX Specimen Mikkel Eide Eriksen March 11, 2020 2 Contents MUFI 5 SIL 21 TITUS 29 UNZ 117 3 4 CONTENTS MUFI Using the font PalemonasMUFI(0) from http://mufi.info/. Code MUFI Point Glyph Entity Name Unicode Name E262 � OEligogon LATIN CAPITAL LIGATURE OE WITH OGONEK E268 � Pdblac LATIN CAPITAL LETTER P WITH DOUBLE ACUTE E34E � Vvertline LATIN CAPITAL LETTER V WITH VERTICAL LINE ABOVE E662 � oeligogon LATIN SMALL LIGATURE OE WITH OGONEK E668 � pdblac LATIN SMALL LETTER P WITH DOUBLE ACUTE E74F � vvertline LATIN SMALL LETTER V WITH VERTICAL LINE ABOVE E8A1 � idblstrok LATIN SMALL LETTER I WITH TWO STROKES E8A2 � jdblstrok LATIN SMALL LETTER J WITH TWO STROKES E8A3 � autem LATIN ABBREVIATION SIGN AUTEM E8BB � vslashura LATIN SMALL LETTER V WITH SHORT SLASH ABOVE RIGHT E8BC � vslashuradbl LATIN SMALL LETTER V WITH TWO SHORT SLASHES ABOVE RIGHT E8C1 � thornrarmlig LATIN SMALL LETTER THORN LIGATED WITH ARM OF LATIN SMALL LETTER R E8C2 � Hrarmlig LATIN CAPITAL LETTER H LIGATED WITH ARM OF LATIN SMALL LETTER R E8C3 � hrarmlig LATIN SMALL LETTER H LIGATED WITH ARM OF LATIN SMALL LETTER R E8C5 � krarmlig LATIN SMALL LETTER K LIGATED WITH ARM OF LATIN SMALL LETTER R E8C6 UU UUlig LATIN CAPITAL LIGATURE UU E8C7 uu uulig LATIN SMALL LIGATURE UU E8C8 UE UElig LATIN CAPITAL LIGATURE UE E8C9 ue uelig LATIN SMALL LIGATURE UE E8CE � xslashlradbl LATIN SMALL LETTER X WITH TWO SHORT SLASHES BELOW RIGHT E8D1 æ̊ aeligring LATIN SMALL LETTER AE WITH RING ABOVE E8D3 ǽ̨ aeligogonacute LATIN SMALL LETTER AE WITH OGONEK AND ACUTE 5 6 CONTENTS
    [Show full text]