Cluster Analysis and Visualization for the Legend of the Condor Heroes Based on Social Network
Total Page:16
File Type:pdf, Size:1020Kb
Hindawi Scientific Programming Volume 2021, Article ID 9439583, 10 pages https://doi.org/10.1155/2021/9439583 Research Article Cluster Analysis and Visualization for the Legend of the Condor Heroes Based on Social Network Chao Fan ,1,2 Zhihui Yang,1,2 and Yuyi Yuan1,2 1 e School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi 214122, China 2Jiangsu Key Laboratory of Media Design and Software Technology, Jiangnan University, Wuxi 214122, China Correspondence should be addressed to Chao Fan; [email protected] Received 24 May 2021; Revised 14 June 2021; Accepted 27 June 2021; Published 7 July 2021 Academic Editor: Chenxi Huang Copyright © 2021 Chao Fan et al. )is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. )e Legend of the Condor Heroes (LCH) is one of the fifteen well-known Wuxia novels penned by Jin Yong. It portrays a number of characters in the background of the Southern Song Dynasty. In this research, we attempt to analyze the relationship of characters in LCH based on social network, including network feature analysis, cluster analysis, and data visualization. Moreover, the approach can be extended to other literary works because our research provides a general framework for analyzing character relationships. We first perform lexical analysis on the corpus to extract character names and then utilize co-word analysis to build a social network of character relationships. We reckon characters as nodes and count the cooccurrences of characters as weights of links. By applying the social network analysis of created network, we can obtain network features of LCH. Furthermore, a hierarchical clustering algorithm is implemented to study the structure of LCH network. Both the dendrogram and Venn diagram are used for data visualization. An improved approach of visualizing the clustering results is advanced in order to display the group and hierarchical structure better. )e final experimental results demonstrate that the proposed method shows a good effect. 1. Introduction of-speech tagging, etc. Utilizing the character names and their cooccurrences, we are able to build a weighted )e Legend of the Condor Heroes, written by JIN Yong, is undirected network. Many network features are calcu- one of the most reputable novels painting the fantasy world lated to describe the characteristics of LCH network of Wuxia. Based on real historical events, this novel tells the structure. Furthermore, the agglomerative hierarchical story of the Southern Song Dynasty. It describes the legend clustering method is implemented to analyze the rela- of Guo Jing, who is destined to shoulder the responsibility of tionships of characters. Some interesting conclusions of the country and grow into a hero with the help of Huang the novel can be drawn. Finally, an improved approach by Rong. )e novel portrays about a hundred dramatic employing the Venn diagram is proposed to visualize characters. clustering results. By conducting cluster analysis and data Relationships of characters can be explored from the visualization, not only does this work facilitate the ex- perspective of social network. )is paper tries to build a hibition of character relationships in LCH, but also it character relationship network based on the co-word provides a framework for analyzing characters in the analysis. It is well known that the cooccurrence of two literary work. characters’ names in a paragraph or a sentence indicates a )e content of this paper can be presented in the fol- connection between them in some ways. Hence, the co- lowing sections. Related studies are reviewed in Section 2. word analysis technology can be applied to the con- Social network construction is elaborated in Section 3. struction of character relationship network. At the be- Section 4 states experimental results, discussions of cluster ginning, textual analysis of LCH corpus is carried out by analysis, and data visualization. Section 5 gives the con- natural language processing such as segmentation, part- clusion and future work. 2 Scientific Programming 2. Related Work and preprocessed it by NLP technologies, whereas social network reckoned the work as historical documents and Quantitative research is an effective strategy to analyze lit- focused on the character relationships. erature, among which co-word analysis attracts the attention of many scholars. Co-word analysis is an important tech- 3. Creation of Social Network nology to build a social network. )e co-word analysis approach was developed by French bibliographers [1] and 3.1. Preprocessing of LCH Corpus. We downloaded a full text soon introduced to other fields [2–4]. Some researchers [5] of the Legend of the Condor Heroes (LCH) in Chinese utilized the co-word analysis to study the theme of publi- dataset (https://www.uidzhx.com/Shtml604.html). Data cation in the field of scientometrics. )eir research results cleansing was done automatically and some punctuation suggest that topics of publications often follow hot spots. errors were corrected manually. Also, mistakes of character Zhu and his team [1] concentrated on the scientific literature names were revised according to the name list of LCH ac- concerning social computing and found some hot research quired from the Internet. After original dataset preprocessing, topics by using the co-word analysis. Gan et al. [6] leveraged the created corpus can be used for lexical analysis. co-word analysis tools to analyze the medical subject )e Chinese lexical analyzer ICTCLAS (http://ictclas. headings (MeSH) terms of epilepsy genetics. Nguyen [7] nlpir.org/) was employed to process the LCH corpus. paid attention to the interdisciplinary literature on non- Chinese segmentation was performed on each sentence and biomedical topics from 1987 to 2017. A large-scale co-word parts-of-speech were tagged for segmented words. analysis was exploited for mapping knowledge. Another Exploiting the results of segmentation and part-of-speech research [8] focused on the evolution of medical tourism. It tagging, we succeeded in extracting the character names analyzed the data from the Web of Science and Scopus based which were used for creating LCH network. on co-word analysis and gave a visualization of all subfields. According to recent research [9], open data were employed to investigate knowledge areas, themes, and future research 3.2. Construction of LCH Network. When generating nodes based on co-word analysis. Khaldi and Prado-Gasco´ [10] of the network, the full name, given name, and nickname of a discussed international cooperation issues on migration by character are regarded as one node. For instance, “Hong using co-word analysis of articles in Web of Science. )ey Qigong,” “Qigong,” and “Bei Gai” represent the same gave some frequent keywords and disclosed that most active character. )e cooccurrence of two names in a specified area authors in this field came from a few countries, such as the (a paragraph or a sentence) denotes a link between two United States, the United Kingdom, Germany, etc. characters. )e corresponding frequency indicates the Several authors explored the relationships of characters weight of a link. In this paper, the contextual area is set as the of literature by utilizing social network analysis, cluster paragraph. analysis, and data visualization. Zhao et al. [11] extracted We built a weighted undirected network to describe the character relationships from Chinese literary based on social relationships of characters in LCH based on co-word network. )ey discovered that the character network in analysis. )e established network incorporates 90 nodes and Chinese literature had apparent small-world property and 1,013 links. A visualized LCH network is depicted in Figure 1 limited power-law distribution. Wang et al. [12] proposed a where degrees of nodes are ranked in the top 50. )e size of a research method to analyze the historical characters in the node represents the degree of the node. Romance of the )ree Kingdoms (RTK). )ey created a network according to the cooccurrence of character names 3.3. Network Features and found some interesting phenomena in the RTK. Fan [13] established a character relationship network of the Dream of 3.3.1. Degree Distribution. )e degree of a node is defined as the Red Chamber. )ey calculated the network features and the count of its neighbors. )e degree of a character means performed a cluster analysis. Some interesting conclusions how many other characters are linked to him or her, thus can be drawn through the visualization of experimental implying the significance of a character. In the novel, a results. Hu [14] made an effort to construct the character character may be directly connected to other characters or network in Records of the )ree Kingdoms. Natural lan- talked about in others’ conversations. guage processing (NLP) tools were taken into account to As can be seen from Figure 1, Guo Jing has the biggest handle the text at first. )en segmentation and part-of- value of the degree. )e top ten people in degree ranking are speech tagging were carried out. A custom dictionary was as follows: Guo Jing, Huang Rong, Yang Kang, Qiu Chuji, Ke adopted to process the ancient Chinese text. Anaphora Zhene, Wanyan Honglie, Zhu Cong, Huang Yaoshi, Ouyang resolution was also used to determine which character a Feng, and Tuolei. We calculated the average degree of LCH pronoun is referring to. Besides, the author applied a network and obtained a value of 22.51. )is number shows k-kernel decomposition approach to analyze the significant that a character in LCH is linked to an average of other 22.5 characters in the work. Xu [15] studied the classical literature characters. )at is to say, characters in LCH network are Zuozhuan in the Pre-Qin period from the perspective of highly connected to each other.