An Optimized VTCR Feature Dimensionality Reduction Algorithm Based on Information Entropy
Total Page:16
File Type:pdf, Size:1020Kb
Engineering Letters, 28:1, EL_28_1_02 ______________________________________________________________________________________ An Optimized VTCR Feature Dimensionality Reduction Algorithm Based on Information Entropy Shaohao Mou, Weikuan Jia*, Yuyu Tian, Yuanjie Zheng, Yanna Zhao Abstract—As the basic research applied in pattern I. INTRODUCTION recognition, machine leaning, data mining and other fields, the ith the advent of information technology, data sets main purpose of feature extraction is to achieve low loss of data dimensionality reduction. Among all the dimensionality W update and grow faster, and number of data reduction algorithm, the classical statistical theory is the most widely used, the feature variance total contribution ratio dimensions become higher and unstructured. Although data (VTCR) is mostly used to measure the effect of evaluation obtained through these information systems contains a criteria for feature extraction. Traditional VTCR only focuses wealth of information, much available information also on the nature of the samples’ correlation matrix eigenvalue but increase the difficulty of effective data analysis. For instance, not the information measurement, resulting in large loss of data redundancy results in overwhelmed useful information, information for feature extraction. Shannon information entropy is introduced into feature extraction algorithm, the the correlation between the features of the data causes generalized class probability and the class information function repeated information, and so on. So how to make use of these are defined, the contributive ratio for VTCR is improved. huge volumes of data, analyze, extract useful information and Finally, the dimensions of feature extraction are determined by exclude the influence of unrelated or repeated factors, to save calculating the accumulate information ratio (AIR), which the storage space and computing time. Feature extraction could achieve good evaluation in respect of information theory. theory is that extracting effective and reasonable simple data By combining the new methods with principal component analysis (PCA) and factor analysis (FA) respectively, an from the huge volumes data while keeping the data optimized VTCR feature dimensionality reduction algorithm information complete, which is to reduce the feature based on information entropy is established; the number of dimensions without affecting the problem solving as much as feature dimensions extracted is calculated by AIR. By the possible in order to satisfy the need of storage, computation experiment, the results show that, the low-dimensional data has and recognition [1-4]. more interpretability, and the new algorithm has higher Classical statistical analysis methods are commonly compression ratio. feature extraction algorithms, which are usually judged by Index Terms—Feature Extraction; Variance Total the value of the variance total contribution ratio (VTCR) [5]. Contribution Ratio; Shannon Entropy; Accumulate In many application fields, there are many mature methods Information Ratio such as principal component analysis (PCA) [6], independent component analysis (ICA) [7], factor analysis (FA) [8], correlation analysis [9], cluster analysis [10], linear discriminant analysis (LDA) [11], etc. these methods have achieved good results in dealing with linear problems. However, nonlinear problems are ubiquitous, and how to Manuscript received October 13, 2018. This work is supported by Focus on improve these classical algorithms to deal with nonlinear Research and Development Plan in Shandong Province (2019GNC106115); problems, it is the current research hotspot. The most studied China Postdoctoral Science Foundation (No.: 2018M630797); project of Kernel-based improvements, such as K-PCA [12, 13], Shandong Province Higher Educational Science and Technology Program (J18KA308); National Nature Science Foundation of China (No.: 31571571, K-ICA [14], two-dimensional principal component analysis 61572300); Taishan Scholar Program of Shandong Province of China (No.: [15], and so on, they have obtained good effect. For these TSHW201502038). algorithms, VTCR is calculated based on the correlation S.H. Mou is with School of Information Science and Engineering, Shandong matrix eigenvalues of data sample to measure the quality of Normal University, Jinan 250358, China W. K. Jia is with School of Information Science and Engineering, Shandong feature extraction and to determine the number of extracted Normal University, Jinan 250358, China (Corresponding author, Tel: features. Although this method is simple to operate, but the +86-531-86190755; Email: [email protected]) eigenvalues of relation matrix hardly covers information Y. Y. Tian is with School of Information Science and Engineering, Shandong Normal University, Jinan 250358, China measure, in other words, VTCR can’t really evaluate the Y. N. Zhao is with Key Lab of Intelligent Computing & Information Security efficacy of dimension reduction algorithm from the angle of in Universities of Shandong, Shandong Normal University, Jinan 250358, information. China Y. J. Zheng is with Institute of Life Sciences, Shandong Provincial Key in the information theory, information entropy (Shannon Laboratory for Distributed Computer Software Novel Technology & Key entropy) is used to measure the amount of information, it Lab of Intelligent Information Processing, Shandong Normal University, establishing the scientific foundation for the modern Jinan 250014, China Volume 28, Issue 1: March 2020 Engineering Letters, 28:1, EL_28_1_02 ______________________________________________________________________________________ information theory. The information entropy is proposed by Where, P(ai ) pi is the probability of eventX ai the American institute of electrical engineers Shannon CE in 1948 [16], entropy is considered as a measure of occurred, and 0 pi 1 . Since [X P] completely "uncertainty" of a random incident or the amount of describes the characteristic of the discrete information source information, as a variable the uncertainly larger, the entropy represented by , it is called the information source space larger, the greater the amount of information contained. The for the information source X. information entropy theory is used for feature extraction, it Definition: has got many achievements, some algorithms are directly be Information function used for feature extraction [17], some are integrated with (2) other algorithms [18]. I(ai ) log pi , i 1,2,, n In this study, PCA and FA are selected as typical example It represents the uncertainty of the event before X ai of VTCR. PCA is a method to represent the original multiple variables with several comprehensive factors and make the the event occurs and the contained information of comprehensive factors reflect as much information of the the event after the event occurs, which original variables as possible, the comprehensive factors do is also called self-information for ai. Shannon makes the not relate to each other so as to reduce dimensions. FA is one statistical average of information source space in kind of analysis method to change many variables into the several integrated variables, it concentrates the information the information function of the system's numerous original indexes and saves to the n (3) factors, it can also adjust the amount of the information by H(X ) H( p1, p2 , , pn ) pi log pi i1 controlling the number of the factors, according to the as the measurement of the uncertainty for information precision that the actual problems need. FA can be seen as a source , what we call the Information Entropy or Shannon further promotion of the PCA, to a certain extent, the data after dimension reduction by FA include more original Entropy. As H (X ) is larger, the uncertainty of information information than by PCA. source larger, and the obtained amount of information Aiming at VTCR limitations, in this paper, some concepts more. are defined, such as generalized class probability (GCP), class information function (CIF), accumulate information B. Variance Total Contribution Ratio ratio (AIR) based on the eigenvalue of sample correlation First, the original data is normalized so as to eliminate the matrix combining with the Shannon entropy theory. The differences of index distribution, which can not only avoid feature dimension is extracted by calculating AIR and repetition of the information but also overcome subjective measure the quality of feature extraction. Further, an factor in weight determination, ensuring the utilization optimized VTCR feature dimensionality reduction algorithm quality from data headstream. based on information entropy is established, which calculated Suppose there are n variables in the original sample, AIR as the original data contained by the extracted feature, to denoted by X x , x ,, x , through the orthogonal determine the principal components or factor numbers, in 1 2 n order to achieve dimension reduction. New algorithm transformation, integrated into n comprehensive variables, evaluates from the angle of information theory has more namely: original information, by experiment, the results show that, y1 c11 x1 c12 x2 c1p xn the low-dimensional data has more interpretability, and the new algorithm has higher compression ratio. y2 c21 x1 c22 x2 c2 p xn (4) II. INFORMATION ENTROPY AND VARIANCE TOTAL CONTRIBUTION RATIO yn cn1x1 cn2 x2 cnn xn and they meet the following equation: A. Shannon Information Entropy c2