
UvA-DARE (Digital Academic Repository) Japanese Kanji Characters are Small-World Connected Through Shared Components Jeronimus, M.; Westerveld, S.; van Leeuwen, C.; Bhulai, S.; van den Berg, D. Publication date 2017 Document Version Final published version Published in Data Analytics 2017 Link to publication Citation for published version (APA): Jeronimus, M., Westerveld, S., van Leeuwen, C., Bhulai, S., & van den Berg, D. (2017). Japanese Kanji Characters are Small-World Connected Through Shared Components. In S. Bhulai, & D. Kardaras (Eds.), Data Analytics 2017: The Sixth International Conference on Data Analytics : November 12-16, 2017, Barcelona, Spain (pp. 53-58). (International Conference on Data Analytics; No. 6). IARIA. http://www.thinkmind.org/index.php?view=article&articleid=data_analytics_2017_4_20_68009 General rights It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons). Disclaimer/Complaints regulations If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible. UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl) Download date:09 Oct 2021 DATA ANALYTICS 2017 The Sixth International Conference on Data Analytics ISBN: 978-1-61208-603-3 November 12 - 16, 2017 Barcelona, Spain DATA ANALYTICS 2017 Editors Sandjai Bhulai, Vrije Universiteit Amsterdam, the Netherlands Dimitris Kardaras, Athens University of Economics and Business, Greece 1 / 88 DATA ANALYTICS 2017 : The Sixth International Conference on Data Analytics Japanese Kanji Characters are Small-World Connected Through Shared Components Mark Jeronimus∗, Sil Westerveldy, Cees van Leeuwenz, Sandjai Bhulaix and Daan van den Berg{ ∗ Airsupplies Nederland BV, The Netherlands, Email: [email protected] y Nishino, Amsterdam, The Netherlands, Email: [email protected] z Laboratory for Perceptual Dynamics, KU Leuven, Leuven, Belgium, Email: [email protected] z Center for Cognitive Science, TU Kaiserslautern, Kaiserslautern, Germany x Faculty of Science, Vrije Universiteit Amsterdam, Amsterdam, The Netherlands, Email: [email protected] { Docentengroep IvI, Faculty of Science, University of Amsterdam, Amsterdam, The Netherlands, Email: [email protected] Abstract—We investigate the connectivity within different incre- authors are comparable to ours and others in the field. On mental sets of Japanese Kanji characters. Individual characters a slightly higher level, various research teams constructed constitute the vertices in the network, components shared between networks of co-occurring characters [14], words [14]–[16] them provide their edges. We find the resulting networks to and phrases [17] in Chinese. Like [13], these authors find have a high clustering coefficient and a low average path small-world properties, possibly indicating that the same self- length, characterizing them as small worlds. We examine the organizing forces shaping logographic languages at character statistical significance of these findings and the role of the degree distributions. We review the evidence that the small-world level are also shaping writing systems on a larger scale. topologies of these networks are due to the successive elimination Interestingly enough, a similar word level investigation of components in the writing system and discuss the implications was conducted in Japanese two-Kanji words as well [18] of the results for language evolution. [19]. Despite the difference in characters and methods, these Keywords–Japanese characters; Kanji; components; radicals; authors also find small-world networks, affirming consistent small-world networks; phase transition; Zipf’s law; Gelb’s hypoth- sharing of characters between words in logographic languages. esis But as it turns out, an investigation of network topologies in Japanese at character level is still missing. It is this gap that our investigation hopes to fill, conjoining all aforementioned I. INTRODUCTION investigations, and as such interconnecting the field of research Small-world networks are sparsely connected networks that on network structures in Japanese and Chinese writing systems have a high cluster coefficient (CC) in combination with a at both word and character level. low average path length (APL) [1]. The CC on a vertex A The structure of the paper is as follows. We discuss the which is adjacent to vertices B1;:::;Bn, is the number of Japanese writing system in Section II. We then proceed to edges between nodes B1;:::;Bn divided by the maximum show that Kanji is a small-world network in Section III. In of n(n − 1)=2. As such, the CC on A expresses A’s local Section IV we state our conclusions, provide a discussion on interconnectivity; the CC of a network is the average of all its the results, and discuss possible extensions of our work. vertices. The APL is defined as the average number of edges in the shortest path between all pairs of vertices in the network, II. THE JAPANESE WRITING SYSTEM and as such expresses its global connectivity. A writing system reflects the history of the civilization In real-life, small-world networks have been found in a in which it emerged, and some writing systems have devel- broad variety of fields: power grids [1], neuronal networks oped a striking level of complexity. The Japanese language, in nematode worms [2], the primate brain [3] [4], the World notably, employs four character sets: Hiragana, a 46-piece Wide Web [5], and networks of social relationships [6]. Some syllabic script; Katakana, also 46 characters, is similar to evidence suggests that small-world topologies are an emergent Hiragana though mainly used for foreign words, expressions property resulting from self-organization in a population of and emphases; Kanji, a logographic symbol script related communicating agents [7]–[11]. Small worlds have also been to the Chinese characters, and finally Romaji, the Roman found in language networks of co-occurring words [12], and alphabet, used mostly for numbers, advertisements and in pop even more specifically, in far eastern writing systems. An culture. All four character sets are represented in the following investigation in Chinese characters sharing ‘radicals’ [13] sentence: appears to be closest to ours. These authors investigated the network topology of modern-day Chinese characters and >ーク/go、月Ð10gにあのd'Ïっています。 found small-world properties, as well as a non-Poisson degree distribution. Even though Chinese and Japanese characters Tomorrow, Monday, at 10 o’clock, Mark differ considerably nowadays, computational results of these will be waiting near that temple Copyright (c) IARIA, 2017. ISBN: 978-1-61208-603-3 53 62 / 88 DATA ANALYTICS 2017 : The Sixth International Conference on Data Analytics Table I. THREE TIMES THE CHARACTER FOR ‘FUN’ OR ‘ENTERTAINING’. NOTICE THE DIFFERENCE IN COMPOSITIONAL STRUCTURE, ESPECIALLY REGARDING THE ‘THREAD’-COMPONENT (THE ‘LITTLE SIDEBURNS’ IN THE TRADITIONAL CHARACTER). simplified (modern day) Chinese ¢ traditional Chinese, Cantonese, Taiwanese S Japanese The first three characters: >ーク, ‘Mark’, are Katakana; the number 10 is written in Romaji. The characters: /, +, あ, ., ', #, &, い, >, and す are Hiragana. The remaining characters: g, o, 月, Ð, g, d, and Ï are Kanji. Japanese words are usually comprised of Katakana only (>ー ク), Hiragana only (あ.), Kanji only (月Ð, g, d) or a combination of Kanji and Hiragana (Ï#&). In Kanji-only words, combinatorial deployment of characters shows close correspondence to word compounds in other languages. For instance, the single character word for ‘gold’ (u) and the single character word for ‘fish’ (0) are commonly combined into a single two-Kanji word u0, meaning ‘goldfish’. Fishing (釣) and stick (竿) make ‘fishing rod’ (釣竿). Estimations Figure 1. Graph of the Kanji from the example sentence mentioned in the introduction. Vertices represent individual Kanji characters, connected if they for the total number of existing Kanji characters range from share at least one component as identified by the label. 40,000–100,000 and new characters could theoretically still be added today [20], but the vast majority of these characters are rarely used. Although all far eastern logographic languages are thought to stem from the same source, there are considerable ministry of education, since 1981. It is a superset of KyouIku, differences between Japanese Kanji, Chinese characters, and extending it by 939 characters learned in secondary school, the writing systems in Taiwan and Hong Kong nowadays. covering 98.66% of the Kanji used in the Japanese corpus Japan has some unique Kanji and a post-war simplification and contains all Kanji allowable in governmental documents. effort in China resulted in a substantial difference between the Finally, the JIS X.0208 is a Japanese Industrial Standard sets (see Table I). Japanese, Cantonese (from Hong Kong) and defining a 6,355-piece character set, which extends the JouYou Taiwanese characters
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages8 Page
-
File Size-