Surface Or Essence: Beyond the Coded Character Set Model

Surface or Essence: Beyond the Coded Character Set Model. Shigeki Moro1) Abstract For almost all users, the coded character set model is the only way to use characters with their computers. Although there have been frequent arguments about the many problems of coded character sets, until now, there was almost nothing on the philosophical consideration on a character in the field of Computer science. In this paper, the similarity between the coded character set model and Aristotle’s Essentialism and the consequent problems derived from it, is discussed. Then the importance of the surface of the character is pointed out using the écrituretheory of Jacques Derrida. Lastly, the Chaon model of the CHISE project is introduced as one of the solutions to this problem. Keywords: Unicode, Aristotle’s Essentialism, Derrida’s Theory of écriture,Chaon model “Depth must be hidden. Where? On the surface.” other local and super character code sets are still —Hugo von Hofmannsthal (1874-1929) being developed, and the repertoires of the existing character sets are increasing even now. What users 1 Introduction. can only do is to choose and follow these character sets. Writing, is not only considered as one of the most The main reason for this is that there are both fundamental mediums of intellectual activities, but sides: Writing is not only dependent on a context, also a frequently used one, which is not restricted but that it is transmitted exceeding the context (it to the use of computers alone. Needless to say that is contrastive with oral language being indivisible the coded character set model (abbreviation being from a context). For instance, an alphabet “a” is CCSM) is the most popular one to encode characters pronounced differently based on the contexts, such in computers and we have many coded character sets as “cat,” “cake,” and so on. It’s, on the other hand, based on this model. used in the various Latin scripts. TRON code, in In recent years, the number of characters, which which a Chinese character 一 from JIS X 0208 and 一 we can use in our computers, has increased dramat- from GB2312 are encoded separately in order to dis- 2 ically and our former state of starvation caused by tinguish Japanese and Chinese , is a poor implemen- the lack of characters has been partly filled. How- tation which purports that the context-dependency ever, when compared to the number of the increased is important. In other words, it aims to express characters, I suppose the quotient of our happiness contexts by code points. On the other hand, the has not increased proportionately. For example, the unification policy of the Unicode Standard regards available standards do not fully provide the knowl- the context-independency as important. Such var- edge required to use a character in computers, such ious situations hint that the duality of a character as, how to search a character, how to control layout, cannot be well expressed by CCSM. context-dependent variations and so on. This paper deals with two main considerations: Moreover, local character sets have been devel- First, we will consider the fundamental misunder- oped for domestic purposes and super sets for them, standing of a character, which CCSM holds. For such as Unicode ([24]), TRON code ([23]) and so this investigation, I would like to focus on Unicode on, have been developed based on today’s global since it has the well-defined design principle which computing circumstances1. Although Unicode is a is hardly discussed in other standards and to use well-developed implementation based on this model, the method of comparative philosophy in order to clarify the problems, especially with Aristotle’s es- 1)Hanazono University. ([email protected]) 1GB18030 of the People’s Republic of China is an inter- 2In addition, 一 s from KS X 1001, GT Mincho, CNS esting example since it has a hybrid nature of both local and 11643, Morohashi Daikanwa and Unicode (!) are not unified super character sets. See [18]. in TRON code. sentialism and Jacques Derrida’s theory of écriture. political and religious reasons and also know that In other words, the view of Unicode about a charac- these definitions are not completely put into prac- ter will be located in European intellectual history. tice. Although it is also interesting to explore the Second, the Chaon model proposed by Dr. To- complicated historical situations of the development mohiko Morioka, who is the leader of the CHISE of Unicode, it’s beyond the scope of this paper. The project, will be introduced as one of the solutions philosophical background of these principles will be to this problem. The CHISE project, of which I am examined. a developer, aims to create a new character / text Especially for our examination, the third, fourth, processing environment, independent of any char- sixth and seventh principles are important. In the acter code sets or CCSM. The Chaon model is a third principle, Unicode defines the term character fundamental model for our project. as follows: Although these subjects may not be directly re- lated to the problems of typesetting and fonts, it is The Unicode Standard draws a distinction reasonable to study them since they are the subjects between characters and glyphs. Charac- relevant to KAGE and Ω/CHISE system3, which are ters are the abstract representations of the a part of the CHISE project to develop a new font smallest components of written language and typesetting environment. that have semantic value. (...) Characters represented by code points. (...) The Uni- code Standard deals only with character 2 Essentialism of Unicode. codes. ([24], p. 15) 2.1 Definition of characters. This principle may be the most important one to Let us begin our analysis by looking at “10 Uni- investigate the philosophy of Unicode. According to code Design Principles” of the Unicode Standard this principle, there is an another existence before quoted below ([24], p. 14). an actual character, i.e., a written glyph, which ex- ists physically and a single code point can express 1. Universality4 a character. By this it’s easy to be reminded of Plato’s idealism, ¯atmanin Indian philosophy and 2. Efficiency so forth. Therefore, a character is not a character 3. Characters, Not Glyphs but what should be called a “pre-character” or an “archi-character”. 4. Semantics In addition, a sequence made of characters should 5. Plain Text be regarded as something like a “pre-text” and so on. The sixth principle is concerned with this issue: 6. Logical Order Unicode text is stored in logical order in 7. Unification the memory representation, roughly cor- responding to the order in which text is 8. Dynamic Composition typed in via the keyboard. In some cir- 9. Equivalent Sequences cumstances, the order of characters differs from this logical order when the text is dis- 10. Convertibility played or printed. (...) For the most part, logical order corresponds to phonetic order. Of course, we maniacs of character codes and ([24], pp. 18-29) typesetting, know that not all these principles are in principle or deductive, but distorted by historical, Put simply, a sequence of characters in logical 3Papers on these projects may be included in this proceed- order is a pre-text, as mentioned above. What we ing. need to remember here is that Unicode supposes the 4Before version 4.0, the first principle was “Sixteen-bit character code.” As known well, this principle did not make precedence of the spoken language over the written any sense from an earlier version. language. Although, such assumptions are common 27 to linguists and people, it’s important as what spec- that are equivalent are given a single code. ifies the limit of CCSM5. (...) Avoidance of duplicate encoding of It should be noticed that what should be regarded characters is important to avoid visual am- as logical in the standard is not defined. How- biguity. ([24], pp. 19-20) ever, only the rough relation with phonetic order is shown. Although, of course, it’s very difficult to Although stated “visual ambiguity” here, the def- define what is logical, it would be defined if this inition of character seems more ambiguous to me. principle were really put into practice. The term script in this principle is a concept ab- Next for the fourth principle, the “semantic value” stracted from a language: “Script. A collection of of a character mentioned above is defined in the symbols used to represent textual information in one quotation below: or more writing systems” ([24], p. 1377). This is a variation of the concepts already seen. Characters have well defined the seman- To sum up these principles: tics. Characters property tables are pro- 1. Unicode is for describing logical characters and vided for use in parsing, sorting, and other texts. algorithms requiring semantic knowledge about the code points. The properties 2. Unicode regards the oral language as logical identified by the Unicode Standard include rather than visually rendered glyphs. numeric, spacing, combination, and direc- tionarity properties (...). Additional prop- 2.2 Aristotle’s Essentialism and Unicode. erties may be defined as needed from time to time. ([24], pp. 17-18) For our investigation, it would be helpful to com- pare these definitions of Unicode with the essential- Here, it’s explained what constitutes an abstract ism of Aristotle and his followers. Aristotle says: character. It’s of value to point out that semantics which attribute to a character must be “well The formula (logos) of the essence of x is defined.” Thus the distinction between characters a formula in which x itself does not ap- and glyphs means that Unicode regards glyphs as pear but which expresses (legonti) x.

Surface Or Essence: Beyond the Coded Character Set Model

Doctor's Thesis Studies on Multilingual Information Processing

The Otaku Phenomenon : Pop Culture, Fandom, and Religiosity in Contemporary Japan

The Significance of Anime As a Novel Animation Form, Referencing Selected Works by Hayao Miyazaki, Satoshi Kon and Mamoru Oshii

Alphabetization† †† Wendy Korwin*, Haakon Lund** *119 W

Electronic Document Preparation Pocket Primer

In Memory of Bernard

Automated Malware Analysis Report for Set-Up.Exe

I18n, M17n, Unicode, and All That

Animate Biology: Data, Visualization, and Life's Moving Image Adam J. Nocek a Dissertation Submitted in Partial Fulfillment Of

Anime's Atomic Legacy: Takashi Murakami, Miyazaki, Anno, and The

How Unicode Came to "Dominate the World" Lee Collins 18 September 2014 Overview

Japanese Media Cultures in Japan and Abroad Transnational Consumption of Manga, Anime, and Media-Mixes