Unicode Support for Mathematics

Total Page:16

File Type:pdf, Size:1020Kb

Unicode Support for Mathematics L2/11‐434 Technical Reports Proposed Update Unicode Technical Report #25 UNICODE SUPPORT FOR MATHEMATICS Authors Barbara Beeton [email protected], Asmus Freytag [email protected], Murray Sargent III [email protected] Date 2011‐11‐02 This Version http://www.unicode.org/reports/tr25/tr25‐13.pdf Previous Version http://www.unicode.org/reports/tr25/tr25‐12.pdf Latest Version http://www.unicode.org/reports/tr25 Source Document http://www.unicode.org/reports/tr25/tr25‐13.docx Revision 13 Summary The Unicode Standard includes virtually all standard characters used in mathematics. This set supports a wide variety of math usage on computers, including in document presentation lan‐ guages like TEX, in math markup languages like MathML and OpenMath, in internal represen‐ tations of mathematics for applications like Mathematica, Maple, and MathCAD, in computer programs, and in plain text. This technical report describes the Unicode support for mathe‐ matics and gives some of the imputed default math properties for Unicode characters. Status A Unicode Technical Report (UTR) contains informative material. Conformance to the Standard does not imply conformance to any UTR. Other specifications, however, are free to make normative references to a UTR. Please submit corrigenda and other comments with the online reporting form [Feedback]. Re‐ lated information that is useful in understanding this document is found in the References. For the latest version of the Unicode Standard see [Unicode]. For a list of current Unicode Tech‐ nical Reports see [Reports]. For more information about versions of the Unicode Standard, see [Versions]. Unicode Technical Report #25 1 Unicode Support for Mathematics Contents UNICODE SUPPORT FOR MATHEMATICS .................................................................................................................. 1 1. OVERVIEW............................................................................................................................................................................................... 3 2. MATHEMATICAL CHARACTER REPERTOIRE ......................................................................................................................... 4 2.1 Mathematical Alphanumeric Symbols Block ........................................................................................................... 4 2.2 Mathematical Alphabets ......................................................................................................................................... 5 2.3 Fonts Used for Mathematical Alphabets ................................................................................................................. 8 2.3.1 Representative Glyphs for Greek Phi ............................................................................................................. 10 2.3.2 Representative Glyphs for U+2278 and U+2279 ............................................................................................ 10 2.4 Locating Mathematical Characters ........................................................................................................................ 11 2.5 Duplicated Characters ........................................................................................................................................... 11 2.6 Accented Characters .............................................................................................................................................. 12 2.7 Operators ............................................................................................................................................................... 13 2.8 Superscripts and Subscripts ................................................................................................................................... 15 2.9 Arrows ................................................................................................................................................................... 15 2.10 Delimiters ............................................................................................................................................................ 16 2.11 Geometrical Shapes ............................................................................................................................................. 19 2.12 Other Symbols ..................................................................................................................................................... 22 2.13 Symbol Pieces ...................................................................................................................................................... 22 2.14 Invisible Operators ............................................................................................................................................... 23 2.15 Fraction Slash and Other Diagonals ..................................................................................................................... 24 2.16 Other Characters ................................................................................................................................................. 26 2.17 Negations ............................................................................................................................................................. 26 2.18 Variation Selector ................................................................................................................................................ 28 2.19 Novel Symbols not yet in Unicode ....................................................................................................................... 29 3. MATHEMATICAL CHARACTER PROPERTIES ........................................................................................................................ 30 3.1 Classification by Degree of Mathematical Usage ........................................................................................... 30 3.1.1 Strongly Mathematical Characters ................................................................................................................. 30 3.1.2 Weakly Mathematical Characters .................................................................................................................. 31 3.1.3 Other .............................................................................................................................................................. 33 3.2 Classification by Typographical Behavior ....................................................................................................... 33 3.2.1 Alphabetic ..................................................................................................................................................... 33 3.2.2 Operators ...................................................................................................................................................... 33 3.2.3 Large Operators .............................................................................................................................................. 34 3.2.4 Digits .............................................................................................................................................................. 34 3.2.5 Delimiters ...................................................................................................................................................... 34 3.2.6 Fences ............................................................................................................................................................ 34 3.2.7 Combining Marks........................................................................................................................................... 35 4. IMPLEMENTATION GUIDELINES ................................................................................................................................................ 35 4.1 Use of Normalization with Mathematical Text ...................................................................................................... 35 4.2 Bidirectional Layout of Mathematical Text ........................................................................................................... 36 4.3 Input of Mathematical and Other Unicode Characters ......................................................................................... 37 4.4 Use of Math Characters in Computer Programs .................................................................................................... 38 4.5 Recognizing Mathematical Expressions ................................................................................................................. 39 4.6 Some Examples of Mathematical Notation ........................................................................................................... 40 5. DATA FILES ........................................................................................................................................................................................... 41 5.1 Mathematical Classification ........................................................................................................................... 41 5.2 Mapping to other Standards .......................................................................................................................... 42 6. SECURITY CONSIDERATIONS ......................................................................................................................................................
Recommended publications
  • Proposal to Add U+2B95 Rightwards Black Arrow to Unicode Emoji
    Proposal to add U+2B95 Rightwards Black Arrow to Unicode Emoji J. S. Choi, 2015‐12‐12 Abstract In the Unicode Standard 7.0 from 2014, ⮕ U+2B95 was added with the intent to complete the family of black arrows encoded by ⬅⬆⬇ U+2B05–U+2B07. However, due to historical timing, ⮕ U+2B95 was not yet encoded when the Unicode Emoji were frst encoded in 2009–2010, and thus the family of four emoji black arrows were mapped not only to ⬅⬆⬇ U+2B05–U+2B07 but also to ➡ U+27A1—a compatibility character for ITC Zapf Dingbats—instead of ⮕ U+2B95. It is thus proposed that ⮕ U+2B95 be added to the set of Unicode emoji characters and be given emoji‐ and text‐style standardized variants, in order to match the properties of its siblings ⬅⬆⬇ U+2B05–U+2B07, with which it is explicitly unifed. 1 Introduction Tis document primarily discusses fve encoded characters, already in Unicode as of 2015: ⮕ U+2B95 Rightwards Black Arrow: Te main encoded character being discussed. Located in the Miscellaneous Symbols and Arrows block. ⬅⬆⬇ U+2B05–U+2B07 Leftwards, Upwards, and Downwards Black Arrow: Te three black arrows that ⮕ U+2B95 completes. Also located in the Miscellaneous Symbols and Arrows block. ➡ U+27A1 Black Rightwards Arrow: A compatibility character for ITC Zapf Dingbats. Located in the Dingbats block. Tis document proposes the addition of ⮕ U+2B95 to the set of emoji characters as defned by Unicode Technical Report (UTR) #51: “Unicode Emoji”. In other words, it proposes: 1. A property change: ⮕ U+2B95 should be given the Emoji property defned in UTR #51.
    [Show full text]
  • Assessment of Options for Handling Full Unicode Character Encodings in MARC21 a Study for the Library of Congress
    1 Assessment of Options for Handling Full Unicode Character Encodings in MARC21 A Study for the Library of Congress Part 1: New Scripts Jack Cain Senior Consultant Trylus Computing, Toronto 1 Purpose This assessment intends to study the issues and make recommendations on the possible expansion of the character set repertoire for bibliographic records in MARC21 format. 1.1 “Encoding Scheme” vs. “Repertoire” An encoding scheme contains codes by which characters are represented in computer memory. These codes are organized according to a certain methodology called an encoding scheme. The list of all characters so encoded is referred to as the “repertoire” of characters in the given encoding schemes. For example, ASCII is one encoding scheme, perhaps the one best known to the average non-technical person in North America. “A”, “B”, & “C” are three characters in the repertoire of this encoding scheme. These three characters are assigned encodings 41, 42 & 43 in ASCII (expressed here in hexadecimal). 1.2 MARC8 "MARC8" is the term commonly used to refer both to the encoding scheme and its repertoire as used in MARC records up to 1998. The ‘8’ refers to the fact that, unlike Unicode which is a multi-byte per character code set, the MARC8 encoding scheme is principally made up of multiple one byte tables in which each character is encoded using a single 8 bit byte. (It also includes the EACC set which actually uses fixed length 3 bytes per character.) (For details on MARC8 and its specifications see: http://www.loc.gov/marc/.) MARC8 was introduced around 1968 and was initially limited to essentially Latin script only.
    [Show full text]
  • Geometry and Art LACMA | | April 5, 2011 Evenings for Educators
    Geometry and Art LACMA | Evenings for Educators | April 5, 2011 ALEXANDER CALDER (United States, 1898–1976) Hello Girls, 1964 Painted metal, mobile, overall: 275 x 288 in., Art Museum Council Fund (M.65.10) © Alexander Calder Estate/Artists Rights Society (ARS), New York/ADAGP, Paris EOMETRY IS EVERYWHERE. WE CAN TRAIN OURSELVES TO FIND THE GEOMETRY in everyday objects and in works of art. Look carefully at the image above and identify the different, lines, shapes, and forms of both GAlexander Calder’s sculpture and the architecture of LACMA’s built environ- ment. What is the proportion of the artwork to the buildings? What types of balance do you see? Following are images of artworks from LACMA’s collection. As you explore these works, look for the lines, seek the shapes, find the patterns, and exercise your problem-solving skills. Use or adapt the discussion questions to your students’ learning styles and abilities. 1 Language of the Visual Arts and Geometry __________________________________________________________________________________________________ LINE, SHAPE, FORM, PATTERN, SYMMETRY, SCALE, AND PROPORTION ARE THE BUILDING blocks of both art and math. Geometry offers the most obvious connection between the two disciplines. Both art and math involve drawing and the use of shapes and forms, as well as an understanding of spatial concepts, two and three dimensions, measurement, estimation, and pattern. Many of these concepts are evident in an artwork’s composition, how the artist uses the elements of art and applies the principles of design. Problem-solving skills such as visualization and spatial reasoning are also important for artists and professionals in math, science, and technology.
    [Show full text]
  • L2/14-274 Title: Proposed Math-Class Assignments for UTR #25
    L2/14-274 Title: Proposed Math-Class Assignments for UTR #25 Revision 14 Source: Laurențiu Iancu and Murray Sargent III – Microsoft Corporation Status: Individual contribution Action: For consideration by the Unicode Technical Committee Date: 2014-10-24 1. Introduction Revision 13 of UTR #25 [UTR25], published in April 2012, corresponds to Unicode Version 6.1 [TUS61]. As of October 2014, Revision 14 is in preparation, to update UTR #25 and its data files to the character repertoire of Unicode Version 7.0 [TUS70]. This document compiles a list of characters proposed to be assigned mathematical classes in Revision 14 of UTR #25. In this document, the term math-class is being used to refer to the character classification in UTR #25. While functionally similar to a UCD character property, math-class is applicable only within the scope of UTR #25. Math-class is different from the UCD binary character property Math [UAX44]. The relation between Math and math-class is that the set of characters with the property Math=Yes is a proper subset of the set of characters assigned any math-class value in UTR #25. As of Revision 13, the set relation between Math and math-class is invalidated by the collection of Arabic mathematical alphabetic symbols in the range U+1EE00 – U+1EEFF. This is a known issue [14-052], al- ready discussed by the UTC [138-C12 in 14-026]. Once those symbols are added to the UTR #25 data files, the set relation will be restored. This document proposes only UTR #25 math-class values, and not any UCD Math property values.
    [Show full text]
  • The Case of Basic Geometric Shapes
    International Journal of Progressive Education, Volume 15 Number 3, 2019 © 2019 INASED Children’s Geometric Understanding through Digital Activities: The Case of Basic Geometric Shapes Bilal Özçakır i Kırşehir Ahi Evran University Ahmet Sami Konca ii Kırşehir Ahi Evran University Nihat Arıkan iii Kırşehir Ahi Evran University Abstract Early mathematics education bases a foundation of academic success in mathematics for higher grades. Studies show that introducing mathematical contents in preschool level is a strong predictor of success in mathematics for children during their progress in other school levels. Digital technologies can support children’s learning mathematical concepts by means of the exploration and the manipulation of concrete representations. Therefore, digital activities provide opportunities for children to engage with experimental mathematics. In this study, the effects of digital learning tools on learning about geometric shapes in early childhood education were investigated. Hence, this study aimed to investigate children progresses on digital learning activities in terms of recognition and discrimination of basic geometric shapes. Participants of the study were six children from a kindergarten in Kırşehir, Turkey. Six digital learning activities were engaged by children with tablets about four weeks in learning settings. Task-based interview sessions were handled in this study. Results of this study show that these series of activities helped children to achieve higher cognitive levels. They improved their understanding through digital activities. Keywords: Digital Learning Activities, Early Childhood Education, Basic Geometric Shape, Geometry Education DOI: 10.29329/ijpe.2019.193.8 ------------------------------- i Bilal Özçakır, Res. Assist Dr., Kırşehir Ahi Evran University, Mathematics Education. Correspondence: [email protected] ii Ahmet Sami Konca, Res.
    [Show full text]
  • Letterlike Symbols Range: 2100–214F
    Letterlike Symbols Range: 2100–214F This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 14.0 This file may be changed at any time without notice to reflect errata or other updates to the Unicode Standard. See https://www.unicode.org/errata/ for an up-to-date list of errata. See https://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See https://www.unicode.org/charts/PDF/Unicode-14.0/ for charts showing only the characters added in Unicode 14.0. See https://www.unicode.org/Public/14.0.0/charts/ for a complete archived file of character code charts for Unicode 14.0. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 14.0 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 14.0, online at https://www.unicode.org/versions/Unicode14.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, #44, #45, and #50, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online. See https://www.unicode.org/ucd/ and https://www.unicode.org/reports/ A thorough understanding of the information contained in these additional sources is required for a successful implementation.
    [Show full text]
  • The Unicode Standard 5.1 Code Charts
    Letterlike Symbols Range: 2100–214F The Unicode Standard, Version 5.1 This file contains an excerpt from the character code tables and list of character names for The Unicode Standard, Version 5.1. Characters in this chart that are new for The Unicode Standard, Version 5.1 are shown in conjunction with any existing characters. For ease of reference, the new characters have been highlighted in the chart grid and in the names list. This file will not be updated with errata, or when additional characters are assigned to the Unicode Standard. See http://www.unicode.org/errata/ for an up-to-date list of errata. See http://www.unicode.org/charts/ for access to a complete list of the latest character code charts. See http://www.unicode.org/charts/PDF/Unicode-5.1/ for charts showing only the characters added in Unicode 5.1. See http://www.unicode.org/Public/5.1.0/charts/ for a complete archived file of character code charts for Unicode 5.1. Disclaimer These charts are provided as the online reference to the character contents of the Unicode Standard, Version 5.1 but do not provide all the information needed to fully support individual scripts using the Unicode Standard. For a complete understanding of the use of the characters contained in this file, please consult the appropriate sections of The Unicode Standard, Version 5.0 (ISBN 0-321-48091-0), online at http://www.unicode.org/versions/Unicode5.0.0/, as well as Unicode Standard Annexes #9, #11, #14, #15, #24, #29, #31, #34, #38, #41, #42, and #44, the other Unicode Technical Reports and Standards, and the Unicode Character Database, which are available online.
    [Show full text]
  • Character Properties 4
    The Unicode® Standard Version 14.0 – Core Specification To learn about the latest version of the Unicode Standard, see https://www.unicode.org/versions/latest/. Many of the designations used by manufacturers and sellers to distinguish their products are claimed as trademarks. Where those designations appear in this book, and the publisher was aware of a trade- mark claim, the designations have been printed with initial capital letters or in all capitals. Unicode and the Unicode Logo are registered trademarks of Unicode, Inc., in the United States and other countries. The authors and publisher have taken care in the preparation of this specification, but make no expressed or implied warranty of any kind and assume no responsibility for errors or omissions. No liability is assumed for incidental or consequential damages in connection with or arising out of the use of the information or programs contained herein. The Unicode Character Database and other files are provided as-is by Unicode, Inc. No claims are made as to fitness for any particular purpose. No warranties of any kind are expressed or implied. The recipient agrees to determine applicability of information provided. © 2021 Unicode, Inc. All rights reserved. This publication is protected by copyright, and permission must be obtained from the publisher prior to any prohibited reproduction. For information regarding permissions, inquire at https://www.unicode.org/reporting.html. For information about the Unicode terms of use, please see https://www.unicode.org/copyright.html. The Unicode Standard / the Unicode Consortium; edited by the Unicode Consortium. — Version 14.0. Includes index. ISBN 978-1-936213-29-0 (https://www.unicode.org/versions/Unicode14.0.0/) 1.
    [Show full text]
  • The Comprehensive LATEX Symbol List
    The Comprehensive LATEX Symbol List Scott Pakin <[email protected]>∗ 22 September 2005 Abstract This document lists 3300 symbols and the corresponding LATEX commands that produce them. Some of these symbols are guaranteed to be available in every LATEX 2ε system; others require fonts and packages that may not accompany a given distribution and that therefore need to be installed. All of the fonts and packages used to prepare this document—as well as this document itself—are freely available from the Comprehensive TEX Archive Network (http://www.ctan.org/). Contents 1 Introduction 6 1.1 Document Usage . 6 1.2 Frequently Requested Symbols . 6 2 Body-text symbols 7 Table 1: LATEX 2ε Escapable “Special” Characters . 7 Table 2: Predefined LATEX 2ε Text-mode Commands . 7 Table 3: LATEX 2ε Commands Defined to Work in Both Math and Text Mode . 7 Table 4: AMS Commands Defined to Work in Both Math and Text Mode . 7 Table 5: Non-ASCII Letters (Excluding Accented Letters) . 8 Table 6: Letters Used to Typeset African Languages . 8 Table 7: Letters Used to Typeset Vietnamese . 8 Table 8: Punctuation Marks Not Found in OT1 . 8 Table 9: pifont Decorative Punctuation Marks . 8 Table 10: tipa Phonetic Symbols . 9 Table 11: tipx Phonetic Symbols . 10 Table 13: wsuipa Phonetic Symbols . 10 Table 14: wasysym Phonetic Symbols . 11 Table 15: phonetic Phonetic Symbols . 11 Table 16: t4phonet Phonetic Symbols . 12 Table 17: semtrans Transliteration Symbols . 12 Table 18: Text-mode Accents . 12 Table 19: tipa Text-mode Accents . 12 Table 20: extraipa Text-mode Accents .
    [Show full text]
  • Letterlike Symbols Number Forms
    ISO/IEC JTC1/SC2/WG2 N2392 Title: A Report of Korean Script ad hoc group meeting on Oct. 15, 2001 Participants: Kim Kyongsok (ROK), Mun Hwang Ryong, Park Dong Ki, Yang Song Jin, Yun Chang Hwa (four from D P R of Korea), Kobayashi (not present when the report was written) Source : Korean script ad hoc group. Date: 2001-10-16 References: WG2 N2374, WG2 N2376, WG2 N2390, WG2 N2243 1. D P R of Korea nominated Mr. YANG, Song Jin as a co-chair from D P R of Korea. 2. Adding a 6th column to CJK and CJK Ext. A tables of ISO/IEC 10646-1:2000 [WG2 N2376] - D P R of Korea proposed that they would prepare a sample output of one page so that IRG and WG2 can review it, on the condition that IRG Rapporteur, IRG Technical Editor and Contributing Editor provide D P R of Korea with current CJK fonts and related software used to produce the current CJK tables. When the sample output proves acceptable, DPRK would prepare CJK and CJK Ext. A tables. - Detailed milestones can be discussed at WG2. 3. Adding 70 symbols [WG2 N2374, WG2 N2390] 3.1 For the following 47 characters, no issues were raised and propose that they be added to BMP. Letterlike Symbols Proposed Shape Character Name UCS LIMITED LIABILITY SIGN U+214C # 004C 0054 0044 PARTNERSHIP SIGN U+214D # 0050 0054 0045 FACSIMILE SIGN U+214E # 0046 0041 0058 Number Forms Proposed Shape Character Name UCS VULGAR FRACTION ONE HALF WITH U+2151 HORIZONTAL BAR # 00BD VULGAR FRACTION ONE THIRD WITH U+2184 HORIZONTAL BAR # 2153 VULGAR FRACTION TWO THIRDS WITH U+2185 HORIZONTAL BAR # 2154 VULGAR FRACTION
    [Show full text]
  • Unicode Character Properties
    Unicode character properties Document #: P1628R0 Date: 2019-06-17 Project: Programming Language C++ Audience: SG-16, LEWG Reply-to: Corentin Jabot <[email protected]> 1 Abstract We propose an API to query the properties of Unicode characters as specified by the Unicode Standard and several Unicode Technical Reports. 2 Motivation This API can be used as a foundation for various Unicode algorithms and Unicode facilities such as Unicode-aware regular expressions. Being able to query the properties of Unicode characters is important for any application hoping to correctly handle any textual content, including compilers and parsers, text editors, graphical applications, databases, messaging applications, etc. static_assert(uni::cp_script('C') == uni::script::latin); static_assert(uni::cp_block(U'[ ') == uni::block::misc_pictographs); static_assert(!uni::cp_is<uni::property::xid_start>('1')); static_assert(uni::cp_is<uni::property::xid_continue>('1')); static_assert(uni::cp_age(U'[ ') == uni::version::v10_0); static_assert(uni::cp_is<uni::property::alphabetic>(U'ß')); static_assert(uni::cp_category(U'∩') == uni::category::sm); static_assert(uni::cp_is<uni::category::lowercase_letter>('a')); static_assert(uni::cp_is<uni::category::letter>('a')); 3 Design Consideration 3.1 constexpr An important design decision of this proposal is that it is fully constexpr. Notably, the presented design allows an implementation to only link the Unicode tables that are actually used by a program. This can reduce considerably the size requirements of an Unicode-aware executable as most applications often depend on a small subset of the Unicode properties. While the complete 1 Unicode database has a substantial memory footprint, developers should not pay for the table they don’t use. It also ensures that developers can enforce a specific version of the Unicode Database at compile time and get a consistent and predictable run-time behavior.
    [Show full text]
  • Unicode Characters in Proofpower Through Lualatex
    Unicode Characters in ProofPower through Lualatex Roger Bishop Jones Abstract This document serves to establish what characters render like in utf8 ProofPower documents prepared using lualatex. Created 2019 http://www.rbjones.com/rbjpub/pp/doc/t055.pdf © Roger Bishop Jones; Licenced under Gnu LGPL Contents 1 Prelude 2 2 Changes 2 2.1 Recent Changes .......................................... 2 2.2 Changes Under Consideration ................................... 2 2.3 Issues ............................................... 2 3 Introduction 3 4 Mathematical operators and symbols in Unicode 3 5 Dedicated blocks 3 5.1 Mathematical Operators block .................................. 3 5.2 Supplemental Mathematical Operators block ........................... 4 5.3 Mathematical Alphanumeric Symbols block ........................... 4 5.4 Letterlike Symbols block ..................................... 6 5.5 Miscellaneous Mathematical Symbols-A block .......................... 7 5.6 Miscellaneous Mathematical Symbols-B block .......................... 7 5.7 Miscellaneous Technical block .................................. 7 5.8 Geometric Shapes block ...................................... 8 5.9 Miscellaneous Symbols and Arrows block ............................. 9 5.10 Arrows block ........................................... 9 5.11 Supplemental Arrows-A block .................................. 10 5.12 Supplemental Arrows-B block ................................... 10 5.13 Combining Diacritical Marks for Symbols block ......................... 11 5.14
    [Show full text]