Visualization of Multivariate Data Using Expanded Constellation and Expanded Kanji Graphs and Their Application to Clustering

Journal of Environmental Science for Sustainable Society, Vol. 10, No.1, 1-8, April 2021 VISUALIZATION OF MULTIVARIATE DATA USING EXPANDED CONSTELLATION AND EXPANDED KANJI GRAPHS AND THEIR APPLICATION TO CLUSTERING Mika FUJIWARA1*, Shoji KAJINISHI2 and Koji KURIHARA3* 1 Graduate School of Environmental and Life Science, Okayama University (3-1-1,Tsushima-Naka, Kita-ku, Okayama 700-8530, Japan) * Corresponding author, E-mail: [email protected] 2 Graduate School of Environmental and Life Science, Okayama University (3-1-1,Tsushima-Naka, Kita-ku, Okayama 700-8530, Japan) E-mail: [email protected] 3 Professor, Graduate School of Environmental and Life Science, Okayama University (3-1-1,Tsushima-Naka, Kita-ku, Okayama 700-8530, Japan) * Corresponding author, E-mail: [email protected] In this study, expanded constellation and expanded kanji graphs are proposed and the effectiveness of these graphs in the multivariate analysis is examined. To draw the expanded constellation graph, the variables are first placed on the circumference of the semicircle using factor loadings. Then, a line is drawn inside the semicircle by connecting the vectors of each variable. The constellation graph can simultaneously grasp both the tendencies of the objects and 푝-dimensional variables. In the expanded kanji graph, the height and width of the kanji are determined using the mean values of the variables for each cluster. The kanjis are placed around the radar chart drawn using the standard deviation. The kanji graph can intuitively grasp the characteristics of each group in the cluster analysis. Herein, the effectiveness of the proposed methods is verified by evaluating the flavor of whiskey. The proposed methods show that whiskey flavors can be classified into five clusters: (1) “full-body and winey type,” (2) “sweet and balanced type,” (3) “smoky and balanced type,” (4) “full-body and smoky type,” and (5) “light-body and sweet floral type.” Key Words : graph representation, factor loading, cluster analysis, expanded constellation graph, expanded kanji graph 1. INTRODUCTION constellation graph4), and kanji graph5) have been proposed. These graph representations are easy to In statistical analysis, graph representations are used derive and calculate. Moreover, they can easily and to capture the characteristics of the data to facilitate intuitively interpret data features. the interpretation of the graph’s results. The graph Herein, an expanded constellation graph was representation is a visualization method for proposed, which is a new graph that uses the concept intuitively grasping the summary of information and of constellation graphs. To complement the expanded detailed features of the data, and it is effective for constellation graph, an expanded kanji graph was achieving the exploratory interpretation of the also proposed, which adopts the concept of kanji results. Typical graph representations for visually graphs. The usefulness of these methods was verified drawing the information and features of data include by employing them to evaluate the flavor data of the circle graph, bar graph, scatter diagram, and radar whiskey. chart. Besides these graphs, various graph methods such as face graph1), Andrews graph2), tree graph3), 1 Mika FUJIWARA et al. 2. PROPOSED METHOD is performed using the data 푥푖푗 (푖=1,2,⋯,푛∶ 푗= 1, 2, ⋯ , 푝), with 푝 variables. The factor loading 푎 of (1) Expansion of the constellation graph the first factor obtained using this analysis is sorted For each object, a constellation graph4) represents in descending order. At this moment, the maximum each variable as a vector and connects these vectors and minimum values of 푎 is 푎푢 and 푎푙 , respectively. and draws a star at the final point. The concatenated Next, each variable is represented as a vector using vectors are called “path.” The constellations of all the the following transformation. data are always drawn inside the semicircle. The 푎푢 − 푎푗 휑푗 = 휋 ; 푎푢 = max 푎푗 , 푎푙 = min 푎푗 constellation graph can grasp the mean values of the 푎푢 − 푎푙 1≤푗≤푝 1≤푗≤푝 variables and the variation between the variables. When the constellation graph of the i-th data is Moreover, it can intuitively grasp the characteristics drawn using this 휑푗 ,the coordinates (훼, 훽) of the of the variables of the object from the shape of the final point are determined as follows. 1 푝 푥푖푗 푝 푥푖푗 path. (훼, 훽) = (∑ cos휑 , ∑ sin휑 ) ; (푥 > 0) 푗=1 푗 푗=1 푗 푖푗 , For this graph, let the data be represented as 푝 푈푗 푈푗 푥 ≤ 0 푥 > 0 푥푖푗 (푖=1,2,⋯,푛,푗=1,2,⋯,푝) with 푝 variables. where 푖푗 transformed such that 푖푗 . By Each variable is transformed and expressed as a connecting the vectors in a similar manner for each vector as follows. object and drawing a star at the final point, the path and star of each object are drawn in the semicircle. 휃 = 푓 (푥 ) 푖푗 푗 푖푗 After this procedure, a graph is drawn for each 푖 = 1, 2, ⋯ , 푛, 푗 = 1, 2, ⋯ , 푝, 0 ≤ 휃푖푗 ≤ 휋 factor. As mentioned above, in existing constellation The following equation can be considered as an graphs, the length of the vector is common to all example of conversion. variables and the direction 휃푖푗 depends on the value 푥푖푗 − 퐿푗 of the data. However, in the expanded constellation 휃푖푗 = 휋, 푈푗 − 퐿푗 graph, the direction is common to all variables, and where 푈푗 = max 푥푖푗 and 퐿푗 = min 푥푖푗. the length of the vector depends on the raw score. 1≤푖≤푛 1≤푖≤푛 Therefore, in the expanded constellation graph, the Using this 휃푖푗, the coordinates(훼, 훽)of the final point position of the final point is affected by the order of of the constellation graph are determined as follows. 푝 푝 the data. Owing to this difference, in the existing constellation graph, the result derived from the data (훼, 훽) = (∑ 푤 cos휃 , ∑ 푤 sin휃 ) 푗 푖푗 푗 푖푗 is fixed. However, the expanded constellation graph 푗=1 푗=1 can be expressed in different dimensions such as the However, the weight 푤푗 that determines the length first and second factors and interpreted from a wide of the vector should satisfy the following conditions range of perspectives. to ensure that the final point falls within the semicircle. 푝 (2) Expansion of the kanji graph The kanji graph visualizes each variable by ∑ 푤 = 1; (푤 ≥ 0, 푗 = 1,2,⋯,푝) 푗 푗 assigning the raw score of the variable to the size of 푗=1 a single kanji. In the kanji graph, one variable is Here, the larger the variation between the variables, represented by single kanji. This graph can express the closer is the final point of the star to the the properties of the variables using kanjis, and the circumference of the semicircle. Where, 1/푝 is characteristics of the variables in each object can be generally used for 푤푗 . Moreover, as shown in intuitively grasped by comparing kanji sizes. reference 4, the smaller the mean value, the more the The procedure for drawing a kanji graph is shown final point will be placed to the right. below. An expanded constellation graph is proposed 1) The data 푥푖푗 (푖 =1,2,⋯,푛, 푗 =1,2,⋯,푝) herein. This graph reflects the value of the factor consists of 푛 observations and 푝 variables. The loading obtained using the factor analysis and places following transformations is performed using the variables along the circumference of the 푥푖푗 (푖 = 1,2, ⋯ , 푛, 푗 = 1,2, ⋯ , 푝): semicircle. Furthermore, in the graph, the factor 푥푖푗 − 푥̅푗 푥′ = + 푏, loading is assigned to the direction of the vector and 푖푗 푆 the data value is assigned to the length of the vector. 푗 where 푏 is a positive constant that satisfies 푥′ ≧ 0 This graph is expected to facilitate the interpretation 푖푗 of the results by visualizing both the tendency of at all times. 푥̅푗 is the mean of variable 푋푗, and 푆푗 is factors and the values of variables based on factor object standard deviation of variable 푋푗. analysis. 2) Single kanji that represents the characteristics of In the expanded constellation graph, factor analysis the variable is selected arbitrarily. Because a variable 2 VISUALIZATION OF MULTIVARIATE DATA USING EXPANDED CONSTELLATION AND EXPANDED KANJI GRAPHS AND THEIR APPLICATION TO CLUSTERING can be represented using a kanji, 푝 types of kanji 3. APPLICATION EXAMPLE OF THE must be prepared for 푝 variables. METHOD 3) The height and width of the kanji are determined according to the values of 푥푖푗. (1) Data used In object 푖, the height and width of the first kanji are In this section, to confirm the usefulness of the transformed as proposed methods, whiskey data provided by the 푥′ University of Strathclyde in Scotland were evaluated. ℓ ( 푖1), 푏 The evaluation of 12 flavors (medicinal, smoky, body, spicy, winey, nutty, malty, honey, fruity, where ℓ is a size of the kanji. sweetness, floral, and tobacco) of 86 brands of Similarly, the 푝-th kanji is drawn using the values whiskey was performed. The 12 flavors were rated on ′ a 5-point scale (0–4 points). of 푥 . 푖푝 First, the data set was subjected to factor analysis 4) Kanjis are drawn and arranged from left to right. and the whiskey brands data were classified using the 5) All objects are drawn using this procedure. k-means method. Next, to examine the effectiveness As mentioned above, the existing kanji graph of each method, the expanded constellation and represents a kanji string for each object. However, expanded kanji graphs were employed to evaluate the when drawing a kanji graph for a cluster consisting five clusters obtained using the k-means method. of multiple objects, it is necessary to use the mean or For statistical analysis, R version 3.5.1 and IBM median value of each variable in the cluster.

Visualization of Multivariate Data Using Expanded Constellation and Expanded Kanji Graphs and Their Application to Clustering

Master Thesis

Multivariate Process Control with Radar Charts

Statistical Analysis Plan Revisions Version 1.0 – 16 October 2017

Jeong HJ, Kim M, an EA, Et Al. a Strategy to Develop Tailored Patient Safety Culture Improvement Programs with Latent Class Analysis Method

Creating Charts and Graphs Presenting Information Visually Copyright

USING RADAR CHART to DISPLAY CLINICAL DATA Shi-Tao Yeh, Glaxosmithkline, King of Prussia, PA

Placeprofile: Employing Visual and Cluster Analysis to Profile Regions

Statistical Inference with Paired Observations and Independent Observations in Two Samples

Comparative Visual Analytics in a Cohort of Breast Cancer Patients

Learn to Create a Radar Chart in R with Data from Our World in Data (2018)

Music-Circles: Can Music Be Represented with Numbers?

Ranking Visualizations of Correlation Using Weber's