Mysql Globalization Abstract

Total Page:16

File Type:pdf, Size:1020Kb

Mysql Globalization Abstract MySQL Globalization Abstract This is the MySQL Globalization extract from the MySQL 8.0 Reference Manual. For legal information, see the Legal Notices. For help with using MySQL, please visit the MySQL Forums, where you can discuss your issues with other MySQL users. Document generated on: 2021-09-27 (revision: 70895) Table of Contents Preface and Legal Notices .................................................................................................................. v 1 Character Sets, Collations, Unicode ................................................................................................. 1 1.1 Character Sets and Collations in General .............................................................................. 2 1.2 Character Sets and Collations in MySQL ............................................................................... 3 1.2.1 Character Set Repertoire ........................................................................................... 5 1.2.2 UTF-8 for Metadata ................................................................................................... 7 1.3 Specifying Character Sets and Collations .............................................................................. 8 1.3.1 Collation Naming Conventions .................................................................................... 9 1.3.2 Server Character Set and Collation .......................................................................... 10 1.3.3 Database Character Set and Collation ...................................................................... 10 1.3.4 Table Character Set and Collation ............................................................................ 12 1.3.5 Column Character Set and Collation ......................................................................... 12 1.3.6 Character String Literal Character Set and Collation .................................................. 14 1.3.7 The National Character Set ...................................................................................... 15 1.3.8 Character Set Introducers ........................................................................................ 16 1.3.9 Examples of Character Set and Collation Assignment ................................................ 18 1.3.10 Compatibility with Other DBMSs ............................................................................. 19 1.4 Connection Character Sets and Collations ........................................................................... 19 1.5 Configuring Application Character Set and Collation ............................................................. 25 1.6 Error Message Character Set .............................................................................................. 27 1.7 Column Character Set Conversion ...................................................................................... 28 1.8 Collation Issues .................................................................................................................. 29 1.8.1 Using COLLATE in SQL Statements ......................................................................... 29 1.8.2 COLLATE Clause Precedence ................................................................................. 30 1.8.3 Character Set and Collation Compatibility ................................................................. 30 1.8.4 Collation Coercibility in Expressions .......................................................................... 30 1.8.5 The binary Collation Compared to _bin Collations ...................................................... 32 1.8.6 Examples of the Effect of Collation ........................................................................... 34 1.8.7 Using Collation in INFORMATION_SCHEMA Searches ............................................. 36 1.9 Unicode Support ................................................................................................................. 38 1.9.1 The utf8mb4 Character Set (4-Byte UTF-8 Unicode Encoding) ................................... 39 1.9.2 The utf8mb3 Character Set (3-Byte UTF-8 Unicode Encoding) ................................... 40 1.9.3 The utf8 Character Set (Alias for utf8mb3) ................................................................ 41 1.9.4 The ucs2 Character Set (UCS-2 Unicode Encoding) .................................................. 41 1.9.5 The utf16 Character Set (UTF-16 Unicode Encoding) ................................................ 41 1.9.6 The utf16le Character Set (UTF-16LE Unicode Encoding) .......................................... 42 1.9.7 The utf32 Character Set (UTF-32 Unicode Encoding) ................................................ 42 1.9.8 Converting Between 3-Byte and 4-Byte Unicode Character Sets ................................. 42 1.10 Supported Character Sets and Collations ........................................................................... 45 1.10.1 Unicode Character Sets ......................................................................................... 46 1.10.2 West European Character Sets .............................................................................. 53 1.10.3 Central European Character Sets ........................................................................... 54 1.10.4 South European and Middle East Character Sets .................................................... 55 1.10.5 Baltic Character Sets ............................................................................................. 56 1.10.6 Cyrillic Character Sets ............................................................................................ 56 1.10.7 Asian Character Sets ............................................................................................. 57 1.10.8 The Binary Character Set ....................................................................................... 61 1.11 Restrictions on Character Sets .......................................................................................... 63 1.12 Setting the Error Message Language ................................................................................. 63 1.13 Adding a Character Set .................................................................................................... 64 1.13.1 Character Definition Arrays ..................................................................................... 66 iii MySQL Globalization 1.13.2 String Collating Support for Complex Character Sets ............................................... 66 1.13.3 Multi-Byte Character Support for Complex Character Sets ........................................ 67 1.14 Adding a Collation to a Character Set ................................................................................ 67 1.14.1 Collation Implementation Types .............................................................................. 68 1.14.2 Choosing a Collation ID ......................................................................................... 71 1.14.3 Adding a Simple Collation to an 8-Bit Character Set ................................................ 72 1.14.4 Adding a UCA Collation to a Unicode Character Set ................................................ 73 1.15 Character Set Configuration .............................................................................................. 80 1.16 MySQL Server Locale Support .......................................................................................... 80 2 MySQL Server Time Zone Support ................................................................................................ 85 iv Preface and Legal Notices This is the MySQL Globalization extract from the MySQL 8.0 Reference Manual. Licensing information—MySQL 8.0. This product may include third-party software, used under license. If you are using a Commercial release of MySQL 8.0, see the MySQL 8.0 Commercial Release License Information User Manual for licensing information, including licensing information relating to third- party software that may be included in this Commercial release. If you are using a Community release of MySQL 8.0, see the MySQL 8.0 Community Release License Information User Manual for licensing information, including licensing information relating to third-party software that may be included in this Community release. Legal Notices Copyright © 1997, 2021, Oracle and/or its affiliates. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs)
Recommended publications
  • PROC SORT (Then And) NOW Derek Morgan, PAREXEL International
    Paper 143-2019 PROC SORT (then and) NOW Derek Morgan, PAREXEL International ABSTRACT The SORT procedure has been an integral part of SAS® since its creation. The sort-in-place paradigm made the most of the limited resources at the time, and almost every SAS program had at least one PROC SORT in it. The biggest options at the time were to use something other than the IBM procedure SYNCSORT as the sorting algorithm, or whether you were sorting ASCII data versus EBCDIC data. These days, PROC SORT has fallen out of favor; after all, PROC SQL enables merging without using PROC SORT first, while the performance advantages of HASH sorting cannot be overstated. This leads to the question: Is the SORT procedure still relevant to any other than the SAS novice or the terminally stubborn who refuse to HASH? The answer is a surprisingly clear “yes". PROC SORT has been enhanced to accommodate twenty-first century needs, and this paper discusses those enhancements. INTRODUCTION The largest enhancement to the SORT procedure is the addition of collating sequence options. This is first and foremost recognition that SAS is an international software package, and SAS users no longer work exclusively with English-language data. This capability is part of National Language Support (NLS) and doesn’t require any additional modules. You may use standard collations, SAS-provided translation tables, custom translation tables, standard encodings, or rules to produce your sorted dataset. However, you may only use one collation method at a time. USING STANDARD COLLATIONS, TRANSLATION TABLES AND ENCODINGS A long time ago, SAS would allow you to sort data using ASCII rules on an EBCDIC system, and vice versa.
    [Show full text]
  • Hieroglyphs for the Information Age: Images As a Replacement for Characters for Languages Not Written in the Latin-1 Alphabet Akira Hasegawa
    Rochester Institute of Technology RIT Scholar Works Theses Thesis/Dissertation Collections 5-1-1999 Hieroglyphs for the information age: Images as a replacement for characters for languages not written in the Latin-1 alphabet Akira Hasegawa Follow this and additional works at: http://scholarworks.rit.edu/theses Recommended Citation Hasegawa, Akira, "Hieroglyphs for the information age: Images as a replacement for characters for languages not written in the Latin-1 alphabet" (1999). Thesis. Rochester Institute of Technology. Accessed from This Thesis is brought to you for free and open access by the Thesis/Dissertation Collections at RIT Scholar Works. It has been accepted for inclusion in Theses by an authorized administrator of RIT Scholar Works. For more information, please contact [email protected]. Hieroglyphs for the Information Age: Images as a Replacement for Characters for Languages not Written in the Latin- 1 Alphabet by Akira Hasegawa A thesis project submitted in partial fulfillment of the requirements for the degree of Master of Science in the School of Printing Management and Sciences in the College of Imaging Arts and Sciences of the Rochester Institute ofTechnology May, 1999 Thesis Advisor: Professor Frank Romano School of Printing Management and Sciences Rochester Institute ofTechnology Rochester, New York Certificate ofApproval Master's Thesis This is to certify that the Master's Thesis of Akira Hasegawa With a major in Graphic Arts Publishing has been approved by the Thesis Committee as satisfactory for the thesis requirement for the Master ofScience degree at the convocation of May 1999 Thesis Committee: Frank Romano Thesis Advisor Marie Freckleton Gr:lduate Program Coordinator C.
    [Show full text]
  • Allgemeines Abkürzungsverzeichnis
    Allgemeines Abkürzungsverzeichnis L.
    [Show full text]
  • IBM Unica Campaign: Administrator's Guide to Remove a Dimension Hierarchy
    IBM Unica Campaign Version 8 Release 6 February, 2013 Administrator's Guide Note Before using this information and the product it supports, read the information in “Notices” on page 385. This edition applies to version 8, release 6, modification 0 of IBM Unica Campaign and to all subsequent releases and modifications until otherwise indicated in new editions. © Copyright IBM Corporation 1998, 2013. US Government Users Restricted Rights – Use, duplication or disclosure restricted by GSA ADP Schedule Contract with IBM Corp. Contents Chapter 1. Administration in IBM Unica To view system table contents .......27 Campaign ..............1 Working with user tables ..........28 Campaign-related administrative tasks in IBM Unica About working with user tables ......28 Marketing ...............1 Guidelines for mapping user tables .....28 To access data sources from within a flowchart 29 Chapter 2. Managing security in IBM Working with user tables while editing a flowchart ..............29 Unica Campaign ...........3 Working with user tables from the Campaign About security policies ...........3 Settings page .............30 The global security policy .........3 Working with data dictionaries ........39 How Campaign evaluates permissions.....4 To open a data dictionary.........39 Using the Owner and Folder Owner roles . 4 To apply changes to a data dictionary ....40 Guidelines for designing security policies....5 When to use a data dictionary .......40 Security scenarios.............5 Data dictionary syntax..........40 Scenario 1: Company with a single division . 5 To manually create a new data dictionary . 40 Scenario 2: Company with multiple separate Working with table catalogs .........41 divisions...............7 To access table catalogs .........41 Scenario 3: Restricted access within a division .
    [Show full text]
  • SUPPORTING the CHINESE, JAPANESE, and KOREAN LANGUAGES in the OPENVMS OPERATING SYSTEM by Michael M. T. Yau ABSTRACT the Asian L
    SUPPORTING THE CHINESE, JAPANESE, AND KOREAN LANGUAGES IN THE OPENVMS OPERATING SYSTEM By Michael M. T. Yau ABSTRACT The Asian language versions of the OpenVMS operating system allow Asian-speaking users to interact with the OpenVMS system in their native languages and provide a platform for developing Asian applications. Since the OpenVMS variants must be able to handle multibyte character sets, the requirements for the internal representation, input, and output differ considerably from those for the standard English version. A review of the Japanese, Chinese, and Korean writing systems and character set standards provides the context for a discussion of the features of the Asian OpenVMS variants. The localization approach adopted in developing these Asian variants was shaped by business and engineering constraints; issues related to this approach are presented. INTRODUCTION The OpenVMS operating system was designed in an era when English was the only language supported in computer systems. The Digital Command Language (DCL) commands and utilities, system help and message texts, run-time libraries and system services, and names of system objects such as file names and user names all assume English text encoded in the 7-bit American Standard Code for Information Interchange (ASCII) character set. As Digital's business began to expand into markets where common end users are non-English speaking, the requirement for the OpenVMS system to support languages other than English became inevitable. In contrast to the migration to support single-byte, 8-bit European characters, OpenVMS localization efforts to support the Asian languages, namely Japanese, Chinese, and Korean, must deal with a more complex issue, i.e., the handling of multibyte character sets.
    [Show full text]
  • Computer Science II
    Computer Science II Dr. Chris Bourke Department of Computer Science & Engineering University of Nebraska|Lincoln Lincoln, NE 68588, USA http://chrisbourke.unl.edu [email protected] 2019/08/15 13:02:17 Version 0.2.0 This book is a draft covering Computer Science II topics as presented in CSCE 156 (Computer Science II) at the University of Nebraska|Lincoln. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License i Contents 1 Introduction1 2 Object Oriented Programming3 2.1 Introduction.................................... 3 2.2 Objects....................................... 4 2.3 The Four Pillars.................................. 4 2.3.1 Abstraction................................. 4 2.3.2 Encapsulation................................ 4 2.3.3 Inheritance ................................. 4 2.3.4 Polymorphism................................ 4 2.4 SOLID Principles................................. 4 2.4.1 Inversion of Control............................. 4 3 Relational Databases5 3.1 Introduction.................................... 5 3.2 Tables ....................................... 9 3.2.1 Creating Tables...............................10 3.2.2 Primary Keys................................16 3.2.3 Foreign Keys & Relating Tables......................18 3.2.4 Many-To-Many Relations .........................22 3.2.5 Other Keys .................................24 3.3 Structured Query Language ...........................26 3.3.1 Creating Data................................28 3.3.2 Retrieving Data...............................30
    [Show full text]
  • Unicode Collators
    Title stata.com unicode collator — Language-specific Unicode collators Description Syntax Remarks and examples Also see Description unicode collator list lists the subset of locales that have language-specific collators for the Unicode string comparison functions: ustrcompare(), ustrcompareex(), ustrsortkey(), and ustrsortkeyex(). Syntax unicode collator list pattern pattern is one of all, *, *name*, *name, or name*. If you specify nothing, all, or *, then all results will be listed. *name* lists all results containing name; *name lists all results ending with name; and name* lists all results starting with name. Remarks and examples stata.com Remarks are presented under the following headings: Overview of collation The role of locales in collation Further controlling collation Overview of collation Collation is the process of comparing and sorting Unicode character strings as a human might logically order them. We call this ordering strings in a language-sensitive manner. To do this, Stata uses a Unicode tool known as the Unicode collation algorithm, or UCA. To perform language-sensitive string sorts, you must combine ustrsortkey() or ustr- sortkeyex() with sort. It is a complicated process and there are several issues about which you need to be aware. For details, see [U] 12.4.2.5 Sorting strings containing Unicode characters. To perform language-sensitive string comparisons, you can use ustrcompare() or ustrcompareex(). For details about the UCA, see http://www.unicode.org/reports/tr10/. The role of locales in collation During collation, Stata can use the default collator or it can perform language-sensitive string comparisons or sorts that require knowledge of a locale. A locale identifies a community with a certain set of preferences for how their language should be written; see [U] 12.4.2.4 Locales in Unicode.
    [Show full text]
  • Braille Decoding Device Employing Microcontroller
    International Journal of Recent Technology and Engineering (IJRTE) ISSN: 2277-3878, Volume-8, Issue-2S11, September 2019 Braille Decoding Device Employing Microcontroller Kanika Jindal, Adittee Mattoo, Bhupendra Kumar Abstract—The Braille decoding method has been The proposed circuit has primary objective to recognize conventionally used by the visually challenged persons to read Braille character inputs from a visually challenged user and books etc. The designed system in the current paper has transmit them to another similar Braille device. This system implemented a method to interface the Braille characters and has been embedded onto a glove that can be worn by the blind English text characters. The system will help to communicate the Braille message from one visually challenged person to another as person. The first four fingers of the glove, starting from the well as help us to transform the Braille language to English text thumb will fitted with tactile micro switches and a vibration through a microcontroller and a PC in order to communicate with motor. This circuit is then connected with a microcontroller the visually challenged persons and PC via RS232C cable to interface the Braille words with the PC based text language. The switch pressed by any person Keyword: Universal synchronous/Asynchronous will create a Braille code that should be converted to ASCII receiver/transmitter, braille, PWM, Vibration Motor, Transistor, form by ASCII conversion program for microcontroller and Diode. these letters will be seen on computer screen. This is used in I. INTRODUCTION order to make the characters USART compatible. Similarly, the computer text word will be reverse programmed into Braille mechanism was founded by Louis Braille in 1821.
    [Show full text]
  • A Chinese Mobile Phone Input Method Based on the Dynamic and Self-Study Language Model
    A Chinese Mobile Phone Input Method Based on the Dynamic and Self-study Language Model Qiaoming Zhu, Peifeng Li, Gu Ping, and Qian Peide School of Computer Science & Technology of Soochow University, Suzhou, 215006 {qmzhu, pfli, pgu, pdqian}@suda.edu.cn Abstract. This paper birefly introduces a Chinese digital input method named as CKCDIM (CKC Digital Input Method) and then applies it to the Symbian OS as an example, and it also proposes a framework of input method which adopted the Client/Server architecture for the handheld computers. To improve the performance of CKCDIM, this paper puts forward a dynamic and self-study language model which based on a general language model and user language model, and proposes two indexes which are the average number of pressed-keys (ANPK) and the hit rate of first characters (HRFC) to measure the performance of the input method. Meanwhile, this paper brings forward a modified Church-Gale smoothing method to reduce the size of general language model to meet the need of mobile phone. At last, the experiments prove that the dynamic and self-study language model is a steady model and can improve the performance of CKCDIM. Keywords: Chinese Digital Input Method, Architecture of Input Method, Dynamic and Self-study Language Model, HRFC, ANPK. 1 Introduction With the developing of communication technology and the popularization of the mobile phone in China, the use of text message in mobile phone is growing rapidly. According to CCTV financial news report, the total number of Short Message Service use will grow from 300 billions in 2005 to 450 billions in 2006 in China.
    [Show full text]
  • Database Globalization Support Guide
    Oracle® Database Database Globalization Support Guide 19c E96349-05 May 2021 Oracle Database Database Globalization Support Guide, 19c E96349-05 Copyright © 2007, 2021, Oracle and/or its affiliates. Primary Author: Rajesh Bhatiya Contributors: Dan Chiba, Winson Chu, Claire Ho, Gary Hua, Simon Law, Geoff Lee, Peter Linsley, Qianrong Ma, Keni Matsuda, Meghna Mehta, Valarie Moore, Cathy Shea, Shige Takeda, Linus Tanaka, Makoto Tozawa, Barry Trute, Ying Wu, Peter Wallack, Chao Wang, Huaqing Wang, Sergiusz Wolicki, Simon Wong, Michael Yau, Jianping Yang, Qin Yu, Tim Yu, Weiran Zhang, Yan Zhu This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S.
    [Show full text]
  • Awk, Perl, Etc
    CNAM, spécialité Informatique NSY116- Multimédia & interaction humain-machine (2008-9) Le texte P. Cubaud <[email protected]> 1. Codes, structures (2. Transport, compression, cryptage) 3. Analyse (4. Synthèse) 5. Présentation Bibliographie B. Habert, C. Fabre, F. Isaac De l’écrit au numérique InterEditions, 1998 I.H. Witten, A. Moffat, T.C. Bell Managing gygabytes. Compressing and indexing documents and images Van Nostrand, 1994 Techniques de l’ingénieur série H. section Document numérique (en ligne au CNAM) R. Laufer, D. Scavetta Texte, hypertexte, hypermédia Que-sais-je n°2629 (v2 1995) 1. Codes - Structures Codage des caractères Trois problèmes : • caractère ≠ glyphe • coder ≠ classer • norme ≠ standard Aussi vieux que le télégraphe… et toujours non résolus. 7 bits : American Standard Code for Information Interchange (ASCII, 1967 - puis ISO646 en 1983) 8 bits : ISO-Latin-XXX (ISO 8859-n) Au delà : Unicode (1990, v3 en 2000) et ISO 10646 = Site web unicode: www.unicode.org Une table à garder en attendant… et un outil : iconv MACCROATIAN [numer:~] pcubaud% iconv -l MACROMANIA ANSI_X3.4-1968 ANSI_X3.4-1986 ASCII CP367 IBM367 ISO-IR-6 ISO646-US ISO_646.IRV:1991 US US-ASCII CSASCII MACCYRILLIC UTF-8 MACUKRAINE ISO-10646-UCS-2 UCS-2 CSUNICODE MACGREEK UCS-2BE UNICODE-1-1 UNICODEBIG CSUNICODE11 MACTURKISH UCS-2LE UNICODELITTLE MACHEBREW ISO-10646-UCS-4 UCS-4 CSUCS4 MACARABIC UCS-4BE MACTHAI UCS-4LE HP-ROMAN8 R8 ROMAN8 CSHPROMAN8 UTF-16 NEXTSTEP UTF-16BE ARMSCII-8 UTF-16LE GEORGIAN-ACADEMY UTF-32 GEORGIAN-PS UTF-32BE KOI8-T UTF-32LE MULELAO-1
    [Show full text]
  • Java Bytecode Manipulation Framework
    Notice About this document The following copyright statements and licenses apply to software components that are distributed with various versions of the OnCommand Performance Manager products. Your product does not necessarily use all the software components referred to below. Where required, source code is published at the following location: ftp://ftp.netapp.com/frm-ntap/opensource/ 215-09632 _A0_ur001 -Copyright 2014 NetApp, Inc. All rights reserved. 1 Notice Copyrights and licenses The following component is subject to the ANTLR License • ANTLR, ANother Tool for Language Recognition - 2.7.6 © Copyright ANTLR / Terence Parr 2009 ANTLR License SOFTWARE RIGHTS ANTLR 1989-2004 Developed by Terence Parr Partially supported by University of San Francisco & jGuru.com We reserve no legal rights to the ANTLR--it is fully in the public domain. An individual or company may do whatever they wish with source code distributed with ANTLR or the code generated by ANTLR, including the incorporation of ANTLR, or its output, into commerical software. We encourage users to develop software with ANTLR. However, we do ask that credit is given to us for developing ANTLR. By "credit", we mean that if you use ANTLR or incorporate any source code into one of your programs (commercial product, research project, or otherwise) that you acknowledge this fact somewhere in the documentation, research report, etc... If you like ANTLR and have developed a nice tool with the output, please mention that you developed it using ANTLR. In addition, we ask that the headers remain intact in our source code. As long as these guidelines are kept, we expect to continue enhancing this system and expect to make other tools available as they are completed.
    [Show full text]