Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference

Total Page:16

File Type:pdf, Size:1020Kb

Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference USENIX Association Proceedings of the FREENIX Track: 2001 USENIX Annual Technical Conference Boston, Massachusetts, USA June 25–30, 2001 THE ADVANCED COMPUTING SYSTEMS ASSOCIATION © 2001 by The USENIX Association All Rights Reserved For more information about the USENIX Association: Phone: 1 510 528 8649 FAX: 1 510 548 5738 Email: [email protected] WWW: http://www.usenix.org Rights to individual papers remain with the author or the author's employer. Permission is granted for noncommercial reproduction of the work for educational or research purposes. This copyright notice must be included in the reproduced paper. USENIX acknowledges all trademarks herein. Citrus project: true multilingual support for BSD operating systems Jun-ichiro itojun Hagino <[email protected]> Research Laboratory, Internet Initiative Japan Inc. http://citrus.bsdclub.org/index.html Abstract The ISO C/SUS V2 standard uses two dif- Citrus project aims to implement a com- ferent encodings for multilingual support: an plete multilingual programming environment for internal and an external encoding. We use the BSD-based operating systems. The goals include: term "internal encoding" to mean a multilingual encoding system used inside C programs (like • ISO C/SUS V2-compatible multilingual pro- inside variables for manipulation), normally gramming environment (locale support), declared as an array of wchar_t. On other papers • multi-script framework, which decouples C the term "process code" is also used. "External API and actual external/internal encoding, encoding" means a multilingual encoding system • gettext and POSIX NLS catalog, used outside of programs (like files or screen out- put stream). It is also called as "file code". The All of our source code is, and will be dis- ISO C/SUS V2 libraries will supply conversion tributed under a BSD-like license. functions between external and internal encoding. The paper concentrates onto the multi- Here we call the functions "encoding engine". script framework, which is unique and central to Many of existing ISO C/SUS V2 libraries, our approach. Most other free software imple- especially freely available ones like GNU libc, mentations support only Unicode in their multi- assume that the internal encoding is Unicode- lingual library, or converts external representation based. More specifically, they assume UCS4 as into Unicode internal representation and loses sig- the internal encoding. They also advocate UTF-8 nificant information on the external representa- [Yergeau, 1998] as the external encoding. The tion. We believe a Unicode-only approach is not encoding engine is hardcoded for internal UCS4 future-proven, and is not the right way to handle encoding. External encodings other than UTF-8 multilingual text. We support multiple different are supported by conversion functions in encod- encodings in ISO C/SUS V2 compatible library, ing engine, which convers external encodings to and made our library code (as well as user pro- UCS4, and vice versa. To implement a correct grams) future-proven. multilingual support, we believe it is very impor- tant to not hardcode any encoding, including Uni- 1. Motivation, and multi-script frame- code. The Citrus project library does not hard- work code any existing encoding. In the next section Compared to vendor UN*X operating sys- we discuss more fully why Unicode is not tems and GNU libc, BSD-based operating sys- enough. tems has been a bit behind in multilingual pro- In order to be able to make less assump- gramming support. This may be because of lack tions about multilingual encodings, we have of manpower, difference in interest, or whatever. designed our libraries to support multiple external Anyway, because of the lack of multilingual sup- encodings, and multiple encoding engines and port, we are seeing increasing pain in porting internal encodings. In other words, each internal applications from/to BSD-based operating sys- encoding has its own encoding engine. We call tems. For example, GNOME, GTK+ and other this concept a "multi-script framework." To sup- window manager software relies heavily upon the port multiple encodings, we simply need to sup- presence of multilingual libraries. We definitely ply the appropriate encoding engines. We also need a multilingual support library for BSD-based use dynamic loading to load encoding engines, so operating systems that is based on ISO C/SUS V2 that we can add more engines on the fly. standards. inside programs User programs ISO C/SUS V2 functions Files/char devices wchar_t stream encoding engine (external encoding) (internal encoding) 2. Why Unicode is not enough plaintext stream, and is a good foundation for tru- You may be still wondering why we need to ely internationalized plaintext handling. For support a multi-script framework, instead of hard- example, by using X11 ctext encoding (which is a coded Unicode support. Below we discuss why subset of ISO-2022 encoding), we can mix Unicode is not sufficient, and why hardcoding of Korean, Chinese, Taiwanese, and Japanese text encodings is not preferable. into a single plaintext stream. We specifically have chosen NOT to hard- Separately from the above direction, there code Unicode in our library suite. Since we were a couple of regional encodings (encodings started using computers for text processing, we that support single language only within a single have experienced many transitions in character set plaintext stream), including EUC (like euc-kr), encodings. Here we list our history in chronolog- MS-Kanji (Shift JIS), or Unicode. Here we list ical order: Unicode under "regional encodings", as we can- not mix Chinese/Taiwanese/Japanese text in a • First, we had a total chaos of vendor-defined plaintext - we need to annotate plaintext with font character set encodings, with 6bit, 7bit or 8bit designation to do it. encodings. Unicode cannot handle multiple Asian • ASCII-only 7bit encodings (and EBCDIC in characters in a plaintext at the same time, due to some places). "han unification". Unicode maps multiple charac- • Hardcoded 8bit encodings. For example, in ters from different Asian regions, into the same many of the european countries, ASCII + Unicode codepoint - this is called "han unifica- ISO-8859-1 (so-called Latin-1) has been used. tion". It was introduced to reduce the amount of In Japan, JIS X0201, which gives us ASCII codepoint used in Unicode (to fit characters into variant and Katakana, was widely used. UCS2 16bit region). While some of the unified • Stateful ISO2022 character encodings, with chararcters are indeed same across Asian regions, explicit character set designations. others have totally different glyphs and meanings in different Asian regions [KUBOTA,]. Suppose ISO-2022 character encodings allow us to that we have assigned the same codepoint for O, encode multiple language text into a single and O with umlaut. Some people cannot notice the difference, however, this will become a signif- 3. Gory details icant problem for people who uses O with umlaut Under our implementation based on multi- in their language (like for German language). script framework, we can switch external encod- Also note that, if we convert Asian multilingual ing, internal encoding and encoding engine as we text into Unicode and then convert it back to other wish, and we support multiple encodings/engines. multilingual encoding, the conversion will not be able to preserve the information contained in the External encoding is normally a stateful, or original multilingual text, due to the han unifica- stateless multibyte encoding, like ISO-2022-JP tion. When we convert the multilingual text into [Murai, 1993] or UTF-8. An octet stream will be Unicode, we will lose the information encoded in used for multilingual representation inside files. the source due to the han unification. Many of the existing external encodings use vari- able-length encodings; one letter will be pre- NOTE: there are proposals to perform lan- sented as a octet, two octets, or more octets. guage tagging [Whistler, 1999] in Unicode, how- When we read in external encoding representation ev er, the language tagging jeopardizes one of the into C program, we normally use array of char to very important aspects of Unicode, uniform 32bit hold it. Internal encoding is a stateless, fixed- wide-char representation, and we do not consider bitwidth encoding. It is defined internally by the it to be useful. library, depending on the current encoding engine Every time we change from one encoding we are using. We use a type called "wchar_t" to system to another, the transition is very painful. hold it. At this moment wchar_t is a 32bit integer In fact, there still are applications that are not (int32_t). 8bit-clean. Even for Unicode, there are imple- When the external encoding is 8bit (like mentations assume 16bit UCS2, and they need to Latin-1), we can just typecast external encoding migrate to 32bit UCS4. From our experience, it (char) into internal encoding (wchar_t). When makes no sense any more to pick a single encod- the external encoding is UTF-8, the natural choice ing to rely on. We do not advocate the use of for internal encoding is UCS4. RFC2279 ISO-2022, either. We believe that no encodings [Yergeau, 1998] defines starndard conversion should be hardcoded, since to do so means that between them. When the external encoding is we will have to bear painful transitions over and ISO-2022, we use a compressed representation of over again. ISO-2022 stream as the internal encoding. Here is another reason for us to avoid hard- With setlocale(3), we pick a pair of internal coded encoding, and avoid hardcoded Unicode and external encoding, and encoding engine. A support. There has been widely deployed user- programmer can convert internal encoding and base which uses non-Unicode multilingual text, external encoding, using ISO C/SUS V2-compati- including big5, euc-kr, KOI8-R, MS-Kanji and ble library calls, like mbstowcs(3).
Recommended publications
  • Licensing Information User Manual Oracle® ZFS
    Licensing Information User Manual ® Oracle ZFS Storage ZS5-2 Release OS8.6.x Part No: E71911-01 September 2016 Licensing Information User Manual Oracle ZFS Storage ZS5-2 Part No: E71911-01 Copyright © 2016, Oracle and/or its affiliates. All rights reserved. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS. Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs.
    [Show full text]
  • Third Party License Information February 18, 2005
    Third Party License Information February 18, 2005 Licenses Used Introduction This document contains third party licensing information for Proventia® appliances. Where applicable, the text has not been edited from its original content or spelling. Red Hat Linux 8.0 This software program is licensed subject to the GNU General Public License (GPL). License Agreement Version2, June 1991, available at http://www.fsf.org/copyleft/gpl.html. It carries NO WARRANTY whatsoever. See the license for complete details. Copyright (c) 1989, 1991 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. Wchar.h/Wctype.h Copyright (c)1999 Citrus Project, All rights reserved. License Agreement Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
    [Show full text]
  • Licensing Information User Manual Oracle® ZFS
    Licensing Information User Manual ® Oracle ZFS Storage ZS7-2, Release OS8. 8.x Last Updated: August 2020 Part No: F13760-04 August 2020 Licensing Information User Manual Oracle ZFS Storage ZS7-2, Release OS8.8.x Part No: F13760-04 Copyright © 2018, 2020, Oracle and/or its affiliates. License Restrictions Warranty/Consequential Damages Disclaimer This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. Warranty Disclaimer The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. Restricted Rights Notice If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs (including any operating system, integrated software, any programs embedded, installed or activated on delivered hardware, and modifications of such programs) and Oracle computer documentation or other Oracle data delivered to or accessed by U.S. Government end users are "commercial computer software" or "commercial computer software documentation" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations.
    [Show full text]
  • Cisco Telepresence Server 4.4(1.16) MR1 Open Source Documentation
    Open Source Used In Cisco TelePresence Server 4.4(1.16) MR1 Cisco Systems, Inc. www.cisco.com Cisco has more than 200 offices worldwide. Addresses, phone numbers, and fax numbers are listed on the Cisco website at www.cisco.com/go/offices. Text Part Number: 78EE117C99-139358573 Open Source Used In Cisco TelePresence Server 4.4(1.16) MR1 1 This document contains licenses and notices for open source software used in this product. With respect to the free/open source software listed in this document, if you have any questions or wish to receive a copy of any source code to which you may be entitled under the applicable free/open source license(s) (such as the GNU Lesser/General Public License), please contact us at [email protected]. In your requests please include the following reference number 78EE117C99-139358573 Contents 1.1 Brian Gladman's AES Implementation 11-01-11 1.1.1 Available under license 1.2 busybox 1.15.1 :15.el6 1.2.1 Available under license 1.3 Coreboot d9b5d897d7f05d0ee8f9411628b757beea990b4b 1.3.1 Available under license 1.4 curl and libcurl 7.44.0 :7.44.0 1.4.1 Available under license 1.5 dhcp 4.1.1-P1 1.5.1 Available under license 1.6 expat 2.2.0 1.6.1 Available under license 1.7 FatFS R0.05 1.7.1 Available under license 1.8 freetype 2.5.3 1.8.1 Available under license 1.9 fribidi 0.19.6 :1 1.9.1 Available under license 1.10 G.722 2.00 1.10.1 Available under license 1.11 HMAC n/a 1.11.1 Available under license 1.12 icelib f50dffe9820bb7e32ac7b9b1b1d19aa3431227a2 1.12.1 Available under license 1.13
    [Show full text]
  • Temial TEA 1.0 with Firmware Version 4.0.3 Third-Party Software
    Temial TEA 1.0 With Firmware Version 4.0.3 Third-Party Software Content Overview ...................................................................................................................... 4 Warranty Disclaimer Regarding Open Source Software ............................................. 6 How to Obtain Source Code ........................................................................................ 6 Contact Information ...................................................................................................... 7 Copyright ...................................................................................................................... 7 Verbatim Generic License Texts ................................................................................ 10 Apache-2.0 – Apache License 2.0 ............................................................................. 10 BSD-2-Clause – BSD 2-Clause “Simplified” License ................................................ 15 BSD-3-Clause – BSD 3-Clause “New” or “Revised” License .................................... 15 BSD-3-Clause-Variant-1 or Modified BSD ................................................................. 16 BSL-1.0 – Boost Software License 1.0 ...................................................................... 17 GPL-3.0-with-GCC-exception - GNU General Public License v3.0 w/GCC Runtime Library exception ......................................................................................... 17 HPND – Historical Permission Notice and Disclaimer (HPND
    [Show full text]
  • Open Source Software License
    Open Source Software License Contents Open Source Software License.................................................................................................................. 5 Copyright Attribution ................................................................................................................................ 59 3 Open Source Software License GNU GPL This projector product includes the open source software programs which apply the GNU General Public License Version 2 or later version ("GPL Programs"). We provide the source code of the GPL Programs until five (5) years after the discontinuation of same model of this projector product. If you desire to receive the source code of the GPL Programs, contact Epson. These GPL Programs are WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. The list of GPL Programs is as follows and the names of author are described in the source code of the GPL Programs The list of GPL Programs • busybox-1.21.0 • iptables-1.4.20 • linux-3.4.49 • patches • udhcp 0.9.8 • wireless_tools 29 • dbus-1.6.18 • EPSON original drivers • wifi driver • Stonestreet One Drivers • dibbler-1.0.1 • linux-2.6.32 • u-boot-2001.06 • busybox-1.19.4 • backports-3.10.4-1 5 The GNU General Public License Version 2 is as follows. You also can see the GNU General Public License Version 2 at http://www.gnu.org/licenses/. GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
    [Show full text]
  • Used Libraries
    NETATMO Smart Thermostat Libraries we use This document lists the libraries and services Netatmo uses in its products and apps. We thank our partners and the open source community for their help and contributions. Embedded Software .......................................................................................................................... 3 FreeRTOS ................................................................................................................................ 4 LwIP ......................................................................................................................................... 5 LwBT ........................................................................................................................................ 6 Newlib ...................................................................................................................................... 7 Apple mDNSResponder ....................................................................................................... 24 Libsodium .............................................................................................................................. 25 Hmacsha512 ......................................................................................................................... 26 OpenSSL ................................................................................................................................ 27 AES Gladman .......................................................................................................................
    [Show full text]
  • Licensing Information User Manual Oracle Server X5-2
    Licensing Information User Manual Oracle Server X5-2 Last Updated: April 2015 Part No: E48322-03 May 2015 Copyright © 2014, 2015, Oracle and/or its affiliates. All rights reserved. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS. Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs. No other rights are granted to the U.S.
    [Show full text]
  • Oracle® Private Cloud Appliance Licensing Information User Manual for Release 2.2
    Oracle® Private Cloud Appliance Licensing Information User Manual for Release 2.2 E71899-01 September 2016 Oracle Legal Notices Copyright © 2013, 2016, Oracle and/or its affiliates. All rights reserved. This software and related documentation are provided under a license agreement containing restrictions on use and disclosure and are protected by intellectual property laws. Except as expressly permitted in your license agreement or allowed by law, you may not use, copy, reproduce, translate, broadcast, modify, license, transmit, distribute, exhibit, perform, publish, or display any part, in any form, or by any means. Reverse engineering, disassembly, or decompilation of this software, unless required by law for interoperability, is prohibited. The information contained herein is subject to change without notice and is not warranted to be error-free. If you find any errors, please report them to us in writing. If this is software or related documentation that is delivered to the U.S. Government or anyone licensing it on behalf of the U.S. Government, then the following notice is applicable: U.S. GOVERNMENT END USERS: Oracle programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, delivered to U.S. Government end users are "commercial computer software" pursuant to the applicable Federal Acquisition Regulation and agency-specific supplemental regulations. As such, use, duplication, disclosure, modification, and adaptation of the programs, including any operating system, integrated software, any programs installed on the hardware, and/or documentation, shall be subject to license terms and license restrictions applicable to the programs. No other rights are granted to the U.S.
    [Show full text]
  • NETATMO Smart Radiator Valves Libraries We Use
    NETATMO Smart Radiator valves Libraries we use This document lists the libraries and services Netatmo uses in its products and apps. We thank our partners and the open source community for their help and contributions. Embedded Software ..................................................................................................... 3 FreeRTOS .......................................................................................................... 4 LwIP .................................................................................................................. 5 LwBT ................................................................................................................. 6 Newlib ............................................................................................................... 7 Apple mDNSResponder ..................................................................................... 24 Libsodium ........................................................................................................ 25 Hmacsha512 .................................................................................................... 26 OpenSSL .......................................................................................................... 27 AES Gladman ................................................................................................... 28 Apple WAC POSIX Server .................................................................................. 29 CSRP ..............................................................................................................
    [Show full text]
  • Open Source Software License
    Open Source Software License Contents Open Source Software License.................................................................................................................. 5 Copyright Attribution .............................................................................................................................. 218 3 Open Source Software License GNU GPL This projector product includes the open source software programs which apply the GNU GENERAL PUBLIC LICENSE Version 2, June 1991 ("GPL Programs"). We provide the source code of the GPL Programs until five (5) years after the discontinuation of same model of this projector product. If you desire to receive the source code of the GPL Programs, contact Epson. These GPL Programs are WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. The list of GPL Programs is as follows and the names of author are described in the source code of the GPL Programs The list of GPL Programs • busybox-1.21.0 • iptables-1.4.20 • linux-3.4.49 • patches • udhcp 0.9.8 • wireless_tools 29 • dbus-1.6.18 • EPSON original drivers • Stonestreet One Drivers • mtd-utils-1.5.0 • nfs-utils-1.3.0 • coreboot • acl • attr • base-files 5 • base-passwd • bash • bc • busybox • coreutils • cpio • dbus-1 • dosfstools • e2fsprogs • ed • ethtool • fbset • findutils • flashrom • fuse-utils • fuser • gawk • gawk-common • grep • init-ifupdown • initscripts • iproute2 • iptables • iputils • kbd • kernel-3.10.62-ltsi-wr6.0.0.32-standard
    [Show full text]
  • Temial TEA 1.0 with Firmware Version 4.3.7 Third-Party
    Temial TEA 1.0 With Firmware Version 4.3.7 Third-Party Software Content Overview ....................................................................................................... 4 Warranty Disclaimer Regarding Open Source Software .............................. 6 How to Obtain Source Code ......................................................................... 7 Contact Information ....................................................................................... 8 Copyright ....................................................................................................... 8 Verbatim Generic License Texts ................................................................. 11 Apache-2.0 – Apache License 2.0 .............................................................. 11 BSD-2-Clause – BSD 2-Clause “Simplified” License ................................. 16 BSD-3-Clause – BSD 3-Clause “New” or “Revised” License ..................... 16 BSD-3-Clause-Variant-1 or Modified BSD .................................................. 17 BSL-1.0 – Boost Software License 1.0 ....................................................... 18 GPL-3.0-with-GCC-exception - GNU General Public License v3.0 w/GCC Runtime Library exception .......................................................................... 18 HPND – Historical Permission Notice and Disclaimer (HPND with advertising clause) ...................................................................................... 33 MIT – MIT License .....................................................................................
    [Show full text]