T-Kernel 2.0 Extension Specification (TEF020-S009

Total Page:16

File Type:pdf, Size:1020Kb

T-Kernel 2.0 Extension Specification (TEF020-S009 T-Kernel2.0EXtension T-Kernel 2.0 Extension Specification December 2012 T-Engine Forum http://www.t-engine.org/ TEF020-S009-02.00.00/en Copyright (c) 2012 by T-Engine Forum T-Kernel 2.0 Extension Specification (Ver.2.00.00) ------------------------------------------- Copyright (c) 2012 by T-Engine Forum You should not transcribe the content, duplicate a part of this specification, etc. without the consent of T-Engine Forum. For improvement, etc., information in this specification is subject to change without notice. For information about this specification, please contact the following: T-Engine Forum Secretariat In YRP Ubiquitous Networking Laboratory 28th Kowa Building, 2-20-1 Nishi-gotanda Shinagawa, Tokyo Japan 141-0031 +81-(0)-3-5437-0572 +81-(0)-3-5437-2399 [email protected] Note In this specification, POSIX means Portable Operating System Interface, specifically the so-called UNIX system Operating System Interface defined in the following standards. ISO/IEC/IEEE 9945 Information technology - Portable Operationg System Interface (POSIX) Base Specifications, Issue 7 The standard C library referred to in the chapter for the Standard C Compatible Library means the above POSIX as well as the library functions defined in the following standard. JIS X 3010:2003 (ISO/IEC 9899:1999) Programming Language C Considering the programming ease and portability at some degree of affinity with POSIX, this specification follows the standard C library specifications almost as is so that programs using the standard C library may easily be ported. This specification quotes some descriptions from the above standards with permission from IEC. This specification is an extension of the underlying T-Kernel 2.0, which is an operating system of a totally different nature from POSIX. This specification does not guarantee the compatibility with POSIX. In addition, it is not guaranteed that C language programs written as per this specification are compliant with the JIS C standard. IEC: International Electrotechnical Commission ISO: International Organization for Standardization JIS: Japanese Industrial Standards The function declarations, structure definitions, and numerical values in this specification are written according to the C language syntax. Copyright © 2012 by T-Engine Forum. All rights reserved. i TEF020-S009-02.00.00/en Table of Contents 1. T-Kernel 2.0 Extension Overview 1 1.1 Overview 1 1.2 T-Kernel 2.0 Extension Features 1 1.3 Relationship with T-Kernel 2 1.4 Relationship with POSIX 3 1.5 Dependencies between Function Modules 4 2. Common Rule 5 2.1 Data Types 5 2.2 String 6 2.3 Valid Context 6 2.4 Usage of T-Kernel 2.0 API 6 2.5 Error Codes 7 2.6 Thread-Safe 8 3. Memory Protection Function 9 3.1 Overview 9 3.2 Memory Protection Model 9 4. File Management Function 13 4.1 Overview 13 4.2 Definition 13 4.3 Unupported Functions 18 4.4 API 18 4.5 File System Implementation Part 42 5. Network Communication Function 54 5.1 Overview 54 5.2 Terms Used in This Section 54 5.3 Unsupported Functions 61 5.4 Data Type Definitions 61 5.5 API 65 5.6 Operation for Routing Socket 101 6. Calendar Function 107 6.1 Overview 107 6.2 Definition 107 6.3 API 109 7. Program Load Function 120 7.1 Overview 120 7.2 Regular Program Module 120 7.3 System Program Module 121 7.4 Data Type Definition 121 7.5 API 122 8. Standard C Compatible Library 126 8.1 Overview 126 8.2 Compatibility 126 8.3 arpa/inet.h - BSD Socket 130 8.4 assert.h - Testing Function 133 8.5 complex.h - Complex Calculation 134 8.6 ctype.h - Character Type Classification 145 8.7 dirent.h - Directory Reading 146 8.8 errno.h - Error Number Definition 150 8.9 float.h - Floating Point Limit Value 153 8.10 inttypes.h - Integer Type Format Conversion 155 8.11 iso646.h -Alternate Spellings 158 8.12 limits.h - Various Limit Values 159 8.13 math.h - Numeric Operation 161 8.14 netinet/in.h - BSD Socket 193 8.15 search.h - Search 194 8.16 stdarg.h - Variable Number Actual Argument 199 8.17 stdbool.h - Boolean Type and Boolean Value 201 8.18 stddef.h - Standard Definition 202 8.19 stdint.h - Integer Type 203 8.20 stdio.h - Standard Input/Output 206 8.21 stdlib.h - General Utility 233 8.22 string.h - String Operation 247 8.23 strings.h - Byte Sequence Operation 259 8.24 time.h - Date and Time 262 Copyright © 2012 by T-Engine Forum. All rights reserved. ii TEF020-S009-02.00.00/en 8.25 wchar.h - Multibyte and Wide Character Extension 267 Appendix A.1 System Setting 268 A.2 Usage Examples of Break Function 269 A.3 Usage Example of Regular Program Module Function 270 Copyright © 2012 by T-Engine Forum. All rights reserved. iii TEF020-S009-02.00.00/en Chapter 1 T-Kernel 2.0 Extension Overview 1.1. Overview T-Kernel 2.0 Extension ("T2EX" below) is the T-Kernel 2.0 feature expansion program. Making the most of light-weight, high-speed, and real-time properties of T-Kernel, a real-time operating system, T2EX is designed as a light-weight extension to realize advanced embedded systems. The functions provided to applications by T2EX consist of extended SVCs (extended system call), library functions, and macros. These functions and application interfaces altogether are called API (Application Programming Interface). The T2EX specification is defined by the T2EX API. Each one of the individual system calls, library functions, and macros in the API is called an "API call". The whole of the file management function API calls, for example, is called the "file management function API". Figure 1.1 shows the software architecture including T2EX. This extension is positioned as an extension available as an additional T-Kernel 2.0 function (addon), allowing you to build application programs using both the T-Kernel 2.0 and T2EX APIs. <file:figure1_1.png> (Figure1.1: T-Kernel 2.0 Extension configuration and positioning) To support the advanced embedded system development, T2EX provides the following functions. - Memory Protection Function - File Management Function - Network Communication Function - Calendar Function - Program Load Function - Standard C Compatible Library Each function is provided as a module by functional unit, allowing you to use some of them and unuse (remove) any unnecessary function modules as needed. The T-Engine forum implements standard T2EX codes on T-Kernel 2.0 on the T-Engine reference board as an extension and discloses the source codes together with the specifications. This implementation is called the "T2EX reference implementation". This specification also describes how implementation-dependent items are implemented in the T2EX reference implementation. 1.2. T-Kernel 2.0 Extension Features Following the existing T-Kernel 2.0 performances and maximizing its light-weight, high-speed, and real-time properties, T2EX is designed to meet the requirement for additional functions in embedded systems that has become larger and more sophisticated. The main features of T2EX are described below. Copyright © 2012 by T-Engine Forum. All rights reserved. 1 TEF020-S009-02.00.00/en Processless, light-weight extension T2EX does not provide the process function to make the entire system light-weight. The process function is effective when developing programs for relatively large systems on a module-by-module basis. However, it does not go together with light-weight and high-speed properties of the entire system, due to large overhead in inter-process communication and resource switching resulting from the split resource. Aiming at lightweight, T2EX assumes the entire system to be built without processes, allowing various extended functions, including file management and network communication, to be used by directly calling them from the task instead of a process. This direct availability of extended functions from the T-Kernel 2.0 task allows you to realize advanced functions while making the most use of the T-Kernel 2.0 real-time property. Effective memory protection function independent of virtual memory General process-based memory protections are based on multiple logical spaces with large overhead in execution when switching the spaces. T-Kernel 2.0 task-based programs often exchange information among multiple tasks via variables (memory). Simply divided logical spaces would increase the inter-process communication overhead. Aiming at lightweight, T2EX provides an effectively feasible two-level ring protection at the system and user levels. This ensures necessary and sufficient reliability for a relatively complex case in a specific-purpose embedded system, which is a main target of this extension. Modular T2EX provides many functions including file management and network communication. Each of them is separated on a module-by-module basis, allowing you to use only the selected necessary modules and remove the rest. This can reduce use of RAM and ROM by unnecessary functions. More affinity with standard C and POSIX specifications Many of the functions targeted by this extension, such as C language standard input/output and network communication functions, have the de facto standard. The T2EX API design orients toward the optimum format as a T-Kernel task-based programming API while it considers affinity with the standard C library and the POSIX specification in terms of code reusability and reduced learning cost. Specific elaborations include: - Integration of the standard C library and POSIX specification error numbers (errno) into the error codes (ER type) The T-Kernel 2.0 error codes are extended so that the error numbers (errno) can be handled as is in the T-Kernel 2.0 error code system. This eliminates confusions when using the standard C and POSIX specification error numbers (errno_t type) together with the error codes (ER type).
Recommended publications
  • Doctor's Thesis Studies on Multilingual Information Processing
    NAIST-IS-DT9761021 Doctor’s Thesis Studies on Multilingual Information Processing on the Internet Akira Maeda September 18, 2000 Department of Information Systems Graduate School of Information Science Nara Institute of Science and Technology Doctor’s Thesis submitted to Graduate School of Information Science, Nara Institute of Science and Technology in partial fulfillment of the requirements for the degree of DOCTOR of ENGINEERING Akira Maeda Thesis committee: Shunsuke Uemura, Professor Yuji Matsumoto, Professor Minoru Ito, Professor Masatoshi Yoshikawa, Associate Professor Studies on Multilingual Information Processing on the Internet ∗ Akira Maeda Abstract With the increasing popularity of the Internet in various part of the world, the languages used for Web documents are expanded from English to various languages. However, there are many unsolved problems in order to realize an information system which can handle such multilingual documents in a unified manner. From the user’s point of view, three most fundamental text processing functions for the general use of the World Wide Web are display, input, and retrieval of the text. However, for languages such as Japanese, Chinese, and Korean, character fonts and input methods that are necessary for displaying and inputting texts, are not always installed on the client side. From the system’s point of view, one of the most troublesome problems is that, many Web documents do not have meta information of the character coding system and the language used for the document itself, although character coding systems used for Web documents vary according to the language. It may result in troubles such as incorrect display on Web browsers, and inaccurate indexing on Web search engines.
    [Show full text]
  • Alphabetization† †† Wendy Korwin*, Haakon Lund** *119 W
    Knowl. Org. 46(2019)No.3 209 W. Korwin and H. Lund. Alphabetization Alphabetization† †† Wendy Korwin*, Haakon Lund** *119 W. Dunedin Rd., Columbus, OH 43214, USA, <[email protected]> **University of Copenhagen, Department of Information Studies, DK-2300 Copenhagen S Denmark, <[email protected]> Wendy Korwin received her PhD in American studies from the College of William and Mary in 2017 with a dissertation entitled Material Literacy: Alphabets, Bodies, and Consumer Culture. She has worked as both a librarian and an archivist, and is currently based in Columbus, Ohio, United States. Haakon Lund is Associate Professor at the University of Copenhagen, Department of Information Studies in Denmark. He is educated as a librarian (MLSc) from the Royal School of Library and Information Science, and his research includes research data management, system usability and users, and gaze interaction. He has pre- sented his research at international conferences and published several journal articles. Korwin, Wendy and Haakon Lund. 2019. “Alphabetization.” Knowledge Organization 46(3): 209-222. 62 references. DOI:10.5771/0943-7444-2019-3-209. Abstract: The article provides definitions of alphabetization and related concepts and traces its historical devel- opment and challenges, covering analog as well as digital media. It introduces basic principles as well as standards, norms, and guidelines. The function of alphabetization is considered and related to alternatives such as system- atic arrangement or classification. Received: 18 February 2019; Revised: 15 March 2019; Accepted: 21 March 2019 Keywords: order, orders, lettering, alphabetization, arrangement † Derived from the article of similar title in the ISKO Encyclopedia of Knowledge Organization Version 1.0; published 2019-01-10.
    [Show full text]
  • Surface Or Essence: Beyond the Coded Character Set Model
    Surface or Essence: Beyond the Coded Character Set Model. Shigeki Moro1) Abstract For almost all users, the coded character set model is the only way to use characters with their computers. Although there have been frequent arguments about the many problems of coded character sets, until now, there was almost nothing on the philosophical consideration on a character in the field of Computer science. In this paper, the similarity between the coded character set model and Aristotle’s Essentialism and the consequent problems derived from it, is discussed. Then the importance of the surface of the character is pointed out using the ´ecrituretheory of Jacques Derrida. Lastly, the Chaon model of the CHISE project is introduced as one of the solutions to this problem. Keywords: Unicode, Aristotle’s Essentialism, Derrida’s Theory of ´ecriture,Chaon model “Depth must be hidden. Where? On the surface.” other local and super character code sets are still —Hugo von Hofmannsthal (1874-1929) being developed, and the repertoires of the existing character sets are increasing even now. What users 1 Introduction. can only do is to choose and follow these character sets. Writing, is not only considered as one of the most The main reason for this is that there are both fundamental mediums of intellectual activities, but sides: Writing is not only dependent on a context, also a frequently used one, which is not restricted but that it is transmitted exceeding the context (it to the use of computers alone. Needless to say that is contrastive with oral language being indivisible the coded character set model (abbreviation being from a context).
    [Show full text]
  • Electronic Document Preparation Pocket Primer
    Electronic Document Preparation Pocket Primer Vít Novotný December 4, 2018 Creative Commons Attribution 3.0 Unported (cc by 3.0) Contents Introduction 1 1 Writing 3 1.1 Text Processing 4 1.1.1 Character Encoding 4 1.1.2 Text Input 12 1.1.3 Text Editors 13 1.1.4 Interactive Document Preparation Systems 13 1.1.5 Regular Expressions 14 1.2 Version Control 17 2 Markup 21 2.1 Meta Markup Languages 22 2.1.1 The General Markup Language 22 2.1.2 The Extensible Markup Language 23 2.2 Markup on the World Wide Web 28 2.2.1 The Hypertext Markup Language 28 2.2.2 The Extensible Hypertext Markup Language 29 2.2.3 The Semantic Web and Linked Data 31 2.3 Document Preparation Systems 32 2.3.1 Batch-oriented Systems 35 2.3.2 Interactive Systems 36 2.4 Lightweight Markup Languages 39 3 Design 41 3.1 Fonts 41 3.2 Structural Elements 42 3.2.1 Paragraphs and Stanzas 42 iv CONTENTS 3.2.2 Headings 45 3.2.3 Tables and Lists 46 3.2.4 Notes 46 3.2.5 Quotations 47 3.3 Page Layout 48 3.4 Color 48 3.4.1 Theory 48 3.4.2 Schemes 51 Bibliography 53 Acronyms 61 Index 65 Introduction With the advent of the digital age, typesetting has become available to virtually anyone equipped with a personal computer. Beautiful text documents can now be crafted using free and consumer-grade software, which often obviates the need for the involvement of a professional designer and typesetter.
    [Show full text]
  • Automated Malware Analysis Report for Set-Up.Exe
    ID: 355727 Sample Name: Set-up.exe Cookbook: default.jbs Time: 13:38:24 Date: 21/02/2021 Version: 31.0.0 Emerald Table of Contents Table of Contents 2 Analysis Report Set-up.exe 4 Overview 4 General Information 4 Detection 4 Signatures 4 Classification 4 Startup 4 Malware Configuration 4 Yara Overview 4 Sigma Overview 4 Signature Overview 4 Compliance: 5 Mitre Att&ck Matrix 5 Behavior Graph 5 Screenshots 6 Thumbnails 6 Antivirus, Machine Learning and Genetic Malware Detection 7 Initial Sample 7 Dropped Files 7 Unpacked PE Files 7 Domains 7 URLs 7 Domains and IPs 8 Contacted Domains 8 URLs from Memory and Binaries 8 Contacted IPs 8 General Information 8 Simulations 9 Behavior and APIs 9 Joe Sandbox View / Context 9 IPs 9 Domains 9 ASN 9 JA3 Fingerprints 9 Dropped Files 9 Created / dropped Files 10 Static File Info 10 General 10 File Icon 10 Static PE Info 10 General 10 Authenticode Signature 11 Entrypoint Preview 11 Rich Headers 12 Data Directories 12 Sections 13 Resources 13 Imports 14 Version Infos 16 Possible Origin 16 Network Behavior 16 Code Manipulations 16 Statistics 16 System Behavior 16 Analysis Process: Set-up.exe PID: 3976 Parent PID: 5896 16 Copyright null 2021 Page 2 of 21 General 16 File Activities 17 File Created 17 File Written 17 File Read 21 Registry Activities 21 Key Value Created 21 Disassembly 21 Code Analysis 21 Copyright null 2021 Page 3 of 21 Analysis Report Set-up.exe Overview General Information Detection Signatures Classification Sample Set-up.exe Name: CCoonntttaaiiinnss fffuunncctttiiioonnaallliiitttyy tttoo
    [Show full text]
  • I18n, M17n, Unicode, and All That
    I18N, M17N, UNICODE, AND ALL THAT Tim Bray General-Purpose Web Geek Sun Microsystems /[a-zA-Z]+/ This is probably a bug. The Problems We Have To Solve Identifying characters Storage Byte⇔character mapping Transfer Good string API Published in 1996; it has 74 major sections, most of which discuss whole families of writing systems. www.w3.org/TR/charmod Identifying Characters 1,1 17 “Planes”14,1 each with 64k code points: U+0000 – U+10FFFF BMP 12 Unicode Code Points 0 0000 1 0000 Basic Multilingual Plane 2 0000 Dead Languages & Math 3 0000 Han Characters 4 0000 5 0000 Non-BMP 6 0000 7 0000 99,024 characters defined in Unicode 5.0 “Astral” Planes 8 0000 9 0000 A 0000 B 0000 C 0000 D 0000 E 0000 Language F 0000 10 0000 Private Use T ags The Basic Multilingual Plane (BMP) U+0000 – U+FFFF 0000 Alphabets 1000 2000 3000 Punctuation 4000 Asian-language Support 5000 Han Characters 6000 7000 8000 9000 A000 Y B000 i Hangul C000 D000 E000 (*: Legacy-Compatibility junk)Surrogates F000 Private Use * Unicode Character Database 00C8;LATIN CAPITAL LETTER E WITH GRAVE;Lu;0;L;0045 0300;;;;N;LATIN CAPITAL LETTER E GRAVE;;;00E8; “Character #200 is LATIN CAPITAL LETTER E WITH GRAVE, a lower-case letter, combining class 0, renders L-to-R, can be composed by U+0045/U+0300, had a differentÈ name in Unicode 1, isn’t a number, lowercase is U+00E8.” www.unicode.org/Public/Unidata $ U+0024 DOLLAR SIGN Ž U+017D LATIN CAPITAL LETTER Z WITH CARON ® U+00AE REGISTERED SIGN ή U+03AE GREEK SMALL LETTER ETA WITH TONOS Ж U+0416 CYRILLIC CAPITAL LETTER ZHE א U+05D0 HEBREW LETTER
    [Show full text]
  • How Unicode Came to "Dominate the World" Lee Collins 18 September 2014 Overview
    How Unicode Came to "Dominate the World" Lee Collins 18 September 2014 Overview • Original design of Unicode • Compromises • Technical • To correct flaws • Political • To buy votes • Dominates the world • But is it still “Unicode” Why Unicode • Mid-late 1980s growth of internationalization • Spread of personal computer • Frustration with existing character encodings • ISO / IEC 2022-based (ISO 8895, Xerox) • Font-based (Mac) • Code pages (Windows) Existing Encodings • No single standard • Different solutions based on single language • Complex multibyte encodings • ISO 2022, Shift JIS, etc. • Multilinguality virtually impossible • Barrier to design of internationalization libraries Assumptions • Encoding is foundation of layered model • Simple, stable base for complex processing • Characters have only ideal shape • Final shape realized in glyphs • Font, family, weight, context • Character properties • Directionality • Interaction with surrounding characters • Non-properties • Language, order in collation sequence, etc. • Depend on context Unicode Design • Single character set • Sufficient for living languages • Simple encoding model • “Begin at zero and add next character” — Peter Fenwick of BSI at Xerox 1987 • No character set shift sequences or mechanisms • Font, code page or ISO 2022 style • Fixed width of 16 bits • Encode only atomic elements • Assume sophisticated rendering technology • a + + = • = Early Strategy • Unicode as pivot code • Interchange between existing encodings • Focus on particular OSs • Xerox, Mac, NeXTSTEP,
    [Show full text]
  • Buzzword Compliance
    BUZZWORD COMPLIANCE Buzzword Compliance ● 3 Slides Per Buzzword ● High Signal To Noise ● Breadth Over Depth About EXPLORING Python Buzzword Compliance Library Building Blocks (&)Games Graphics TheThe PythonPython LanguageLanguage Big Honking Frameworks (Web Application Frameworks) All are part of Python LEARNING PYTHON Learning Python ● The Quick Reference Sheet ● Python Tutorial ● Python Challenge A Cycle of Learning Learning Python GENERAL READING USE NEW TOOLS CODE! EXPLORE LIBRARIES LIST [ ] COMPREHENSIONS List Comprehensions ● A Cool Idiom of Python ● Enables Conciseness ● Obviates map, filter, reduce Unrolls into Simple Loops List Comprehensions lost = sum([c.billed - c.paid for c in customers if c.is_deadbeat()]) l = [ ] for c in customers: if c.is_deadbeat(): l.append(c.billed - c.paid) lost = sum(l) EXECUTING MODULES Executing Modules ● Import runs code, once. ● def is just a statement ● Use to precalculate stuff Python just runs scripts in namespaces Executing Modules class C: print ªHello from Cº def help_make_table(size): ... c_table = help_make_table(64) del help_make_table @ DECORATORS Decorators ● Wraps methods with new functionality ● Useful for logging, security, etc. ● Clean Syntax for use Unrolls to simple code Decorators from decorator import decorator @decorator def trace(f, *args, **kw): print "call %s with args %s, %s" % (f.func_name, args, kw) return f(*args, **kw) @trace def buggy_function(a, b, c) METACLASSES Metaclasses ● The superclass ©type© of classes ● Changes functionality of Python ● Adds complexity to
    [Show full text]
  • Chapter 4 Character Encoding in Corpus Construction
    Chapter 4 Character encoding in corpus construction Anthony McEnery Zhonghua Xiao Lancaster University Corpus linguistics has developed, over the past three decades, into a rich paradigm that addresses a great variety of linguistic issues ranging from monolingual research of one language to contrastive and translation studies involving many different languages. Today, while the construction and exploitation of English language corpora still dominate the field of corpus linguistics, corpora of other languages, either monolingual or multilingual, have also become available. These corpora have added notably to the diversity of corpus-based language studies. Character encoding is rarely an issue for alphabetical languages, like English, which typically still use ASCII characters. For many other languages that use different writing systems (e.g. Chinese), encoding is an important issue if one wants to display the corpus properly or facilitate data interchange, especially when working with multilingual corpora that contain a wide range of writing systems. Language specific encoding systems make data interchange problematic, since it is virtually impossible to display a multilingual document containing texts from different languages using such encoding systems. Such documents constitute a new Tower of Babel which disrupts communication. In addition to the problem with displaying corpus text or search results in general, an issue which is particular relevant to corpus building is that the character encoding in a corpus must be consistent if the corpus is to be searched reliably. This is because if the data in a corpus is encoded using different character sets, even though the internal difference is indiscernible to human eyes, a computer will make a distinction, thus leading to unreliable results.
    [Show full text]
  • Introduction to I18n
    Introduction to i18n Tomohiro KUBOTA <debianattmaildotplaladotordotjp(retiredDD)> 29 Dezember 2009 Abstract This document describes basic concepts for i18n (internationalization), how to write an inter- nationalized software, and how to modify and internationalize a software. Handling of char- acters is discussed in detail. There are a few case-studies in which the author internationalized softwares such as TWM. Copyright Notice Copyright © 1999-2001 Tomohiro KUBOTA. Chapters and sections whose original author is not KUBOTA are copyright by their authors. Their names are written at the top of the chapter or the section. This manual is free software; you may redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2, or (at your option) any later version. This is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose. See the GNU General Public License for more details. A copy of the GNU General Public License is available as /usr/share/common-licenses/GPL in the Debian GNU/Linux distribution or on the World Wide Web at http://www.gnu.org/copyleft/gpl.html. You can also obtain it by writing to the Free Software Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307, USA. i Contents 1 About This Document1 1.1 Scope............................................1 1.2 New Versions of This Document............................1 1.3 Feedback and Contributions...............................2 2 Introduction 3 2.1 General Concepts.....................................3 2.2 Organization........................................6 3 Important Concepts for Character Coding Systems9 3.1 Basic Terminology.....................................9 3.2 Stateless and Stateful..................................
    [Show full text]
  • A Framework for Multilingual Information Processing by Steven Edward Atkin Bachelor of Science Physics State University of New Y
    A Framework for Multilingual Information Processing by Steven Edward Atkin Bachelor of Science Physics State University of New York, Stony Brook 1989 Master of Science in Computer Science Florida Institute of Technology 1994 A dissertation submitted to the College of Engineering at Florida Institute of Technology in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Computer Science Melbourne, Florida December, 2001 We the undersigned committee hereby recommend that the attached document be accepted as fulfilling in part the requirements for the degree of Doctor of Philosophy of Computer Science “A Framework for Multilingual Information Processing,” a dissertation by Steven Edward Atkin __________________________________ Ryan Stansifer, Ph.D. Associate Professor, Computer Science Dissertation Advisor __________________________________ Phil Bernhard, Ph.D. Associate Professor, Computer Science __________________________________ James Whittaker, Ph.D. Associate Professor, Computer Science __________________________________ Gary Howell, Ph.D. Professor, Mathematics __________________________________ William Shoaff, Ph.D. Associate Professor and Head, Computer Science Abstract Title: A Framework for Multilingual Information Processing Author: Steven Edward Atkin Major Advisor: Ryan Stansifer, Ph.D. Recent and (continuing) rapid increases in computing power now enable more of humankind’s written communication to be represented as digital data. The most recent and obvious changes in multilingual information processing have been the introduction of larger character sets encompassing more writing systems. Yet the very richness of larger collections of characters has made the interpretation and pro- cessing of text more difficult. The many competing motivations (satisfying the needs of linguists, computer scientists, and typographers) for standardizing charac- ter sets threaten the purpose of information processing: accurate and facile manipu- lation of data.
    [Show full text]
  • Ascii 1 Ascii
    ASCII 1 ASCII The American Standard Code for Information Interchange (ASCII /ˈæski/ ASS-kee) is a character-encoding scheme originally based on the English alphabet that encodes 128 specified characters - the numbers 0-9, the letters a-z and A-Z, some basic punctuation symbols, some control codes that originated with Teletype machines, and a blank space - into the 7-bit binary integers.[1] ASCII codes represent text in computers, communications equipment, and other devices that use text. Most modern character-encoding A chart of ASCII from a 1972 printer manual schemes are based on ASCII, though they support many additional characters. ASCII developed from telegraphic codes. Its first commercial use was as a seven-bit teleprinter code promoted by Bell data services. Work on the ASCII standard began on October 6, 1960, with the first meeting of the American Standards Association's (ASA) X3.2 subcommittee. The first edition of the standard was published during 1963, a major revision during 1967, and the most recent update during 1986. Compared to earlier telegraph codes, the proposed Bell code and ASCII were both ordered for more convenient sorting (i.e., alphabetization) of lists, and added features for devices other than teleprinters. ASCII includes definitions for 128 characters: 33 are non-printing control characters (many now obsolete) that affect how text and space are processed[2] and 95 printable characters, including the space (which is considered an invisible graphic[3][4]). The IANA prefers the name US-ASCII to avoid ambiguity. ASCII was the most commonly used character encoding on the World Wide Web until December 2007, when it was surpassed by UTF-8, which includes ASCII as a subset.
    [Show full text]