Deep Inverse Feature Learning

1 Deep Inverse Feature Learning: A Representation Learning of Error Behzad Ghazanfari, Fatemeh Afghah School of Informatics, Computing, and Cyber Security, Northern Arizona University, Flagstaff, AZ 86001, USA. Abstract—This paper introduces a novel perspective about The proposed error representation learning offers a new error in machine learning and proposes inverse feature learning approach in which the error is represented as a dynamic (IFL) as a representation learning approach that learns a set quantity that captures the relationships between the instances of high-level features based on the representation of error for classification or clustering purposes. The proposed perspective and the predicted labels in classification or the predicted about error representation is fundamentally different from cur- clusters in clustering. This method is called “inverse feature rent learning methods, where in classification approaches they learning (IFL)” as it firsts generates the error and then learn interpret the error as a function of the differences between the high-level features based on the representation of error. We true labels and the predicted ones or in clustering approaches, propose IFL as a framework to transform the error repre- in which the clustering objective functions such as compactness are used. Inverse feature learning method operates based on a sentation to a compact set of impactful high-level features in deep clustering approach to obtain a qualitative form of the different machine learning techniques. The IFL method can be representation of error as features. The performance of the particularly effective in the data sets, where the instances of proposed IFL method is evaluated by applying the learned each class have diverse feature representations or the ones with features along with the original features, or just using the learned imbalanced classes. It can also be effective in active learning features in different classification and clustering techniques for several data sets. The experimental results show that the applications with a considerable cost of training the instances. proposed method leads to promising results in classification and In summary, the error representation learning method offers especially in clustering. In classification, the proposed features a different source of knowledge that can be implemented along with the primary features improve the results of most of the by different learning techniques and be used alongside a classification methods on several popular data sets. In clustering, variety of feature learning methods in order to enhance their the performance of different clustering methods is considerably improved on different data sets. There are interesting results performance. In this paper, we present a basic implementation that show some few features of the representation of error of this strategy to learn the representation of error to focus on capture highly informative aspects of primary features. We hope the concept, its potential role in improving the performance of this paper helps to utilize the error representation learning in state-of-the-art machine learning techniques when a general set different feature learning domains. of high-level features are utilized across all these techniques. Index Terms—error representation learning, deep learning, Indeed, the proposed technique can be improved by using addi- inverse feature learning, clustering, classification tional feature representations as well as defining other dynamic notions of error. The experimental results confirm that even I. INTRODUCTION a few simple features defined based on error representation Error as the main source of knowledge in learning has a can improve the performance of learning in classification and unique and key role in machine learning approaches. In current clustering. arXiv:2003.04285v1 [cs.LG] 9 Mar 2020 machine learning techniques, the error is interpreted without considering its representation in a variety of forms in objective II. RELATED WORKS functions, loss functions, or cost functions. In supervised approaches, different loss functions and cost functions have been The error in most machine learning approaches, depending defined to measure the error as the differences between the true on whether the approach is supervised or unsupervised has and the predicted labels during the training process in order to a similar definition of the difference between the predicted provide decision functions which minimize the risk of wrong and true labels in supervised ones or the clustering losses in decisions [1]. In clustering, high-level objectives in the form unsupervised ones. Also, the term error refers to the same of clustering loss (e.g., compactness or connectivity) are used concept in similarity learning [3], semi-supervised learning to consider similar instances as the clusters. The clustering [4], self-supervised learning [5]. Generative methods are dis- objectives minimize the cost functions that are defined as tinguished from discriminative methods in machine learning in the distances between the instances and the representative of terms of considering the underlying distribution, but the error clusters (e.g., the centers of clusters in k-means [2]). In this is the same depend on is that the approach is supervised or paper, we define the error in a more informative format which unsupervised. captures different representations of each instance based on its Representation learning methods are known for their role class or cluster and uses that toward learning several high-level in improving the performance of clustering and classification features to represent the raw data in a compact way. approaches by learning new data representations from the 2 raw data which better presents different variations behind the loss functions or cost functions to minimize the risk by taking data [6]. Classification with representation learning typically the difference between the predicted labels and true labels. is a closed-loop learning with many different architectures. In clustering with deep learning, the objectives are defined Clustering with deep learning, while does not use the labels, in the form of clustering loss and auxiliary losses [7] which can lead to promising results [7]. Unsupervised representation are still based on data representation. The distinctions of the learning methods can be used for pre-training nets, generally inverse feature learning related to common trends in machine for deep networks, or for extracting high-level features, de- learning is that the inverse feature learning method generates noising, and dimensionality reductions. the error by a trial and calculates the resultant representation Representation learning approaches in supervised or unsu- as the error and then transform the error as high-level features pervised applications are generally based on elements such to represent the characteristics of each instance related to the as restricted Boltzmann machines (RBMs) [8], autoencoder available classes or clusters. [9, 10], convolutional neural networks (ConvNets) [11], and clustering methods [7]. In the majority of existing classifi- III. NOTATIONS cation with representation learning methods, the term error First, we define the notations which are used for both refers to a function of the differences between the true and clustering and classification. Let us consider X and Y to the predicted labels (i.e., loss functions) [12]. Also, the error refer to the input and the output spaces, respectively in in clustering with representation learning methods is defined which each instance x 2 X consists of h features — as clustering loss of high-level objectives alongside other loss i x = hx ; ··· ; x i. We use notation s to denote the number functions [13]. i i;1 i;h of classes or the number of clusters. We consider C to In unsupervised feature learning such as autoencoders, the denote the set of clusters in clustering, C = hc ; ··· ; c i, error still is defined in the form of reconstruction error 1 s as 8x 2 X; 9c 2 C j x 2 c . Y is the corresponding between the input data and the resultant of encoding and j i j i classes of instances in the classification. The set of clusters, decoding process [9, 10]. Regularization terms [14] or dropout C can simply correspond to Y in assignment problems using [15] are used alongside with loss functions to enhance the approaches such as Hungarian algorithm. training process to have a better generalization of the learned Notation µ , i 2 f1; :::; sg refers to the center of cluster features (e.g., learning the weights of neural nets). Generative i i. jbj indicates the number of instances in set b. In continue, adversarial nets (GANs) [16] use two separate neural networks we define the specific notations used for the classification and competing against each other with the ultimate goal of gener- clustering approaches. ating synthetic data similar to the original inputs through the In clustering, the input data set is presented with D = hXi, generator. However, the concept error in GANs is the same as in which X = fx ; ··· ; x g indicates the set of input the other ML approaches. In this paper, we propose a novel 1 n instances and n shows the number of input instances — notion of error as the resultant representation of assigning jXj = n. the data instances to different available classes or clusters In classification, the input training data set is pre- as a source of knowledge for learning. We would like to sented with D = hXT rain;Y T raini, in which XT rain = note that this approach is different from generative approaches fx ; ··· ; x g indicates the set of input training instances such as GANs which are based on data generation while the 1 ntrain and n shows the number of input instances in the train- proposed method is based on the virtual assignment of the train ing partition. The label set is denoted by Y T rain, where data instances to different classes or clusters to consider the T rain Y = fy1; ··· ; yntrain g is a vector corresponded with resultant representation of such assumptions for each instance T rain data set X .

Deep Inverse Feature Learning

Automatic Feature Learning Using Recurrent Neural Networks

Exploring the Potential of Sparse Coding for Machine Learning

Text-To-Speech Synthesis

Unsupervised Speech Representation Learning Using Wavenet Autoencoders Jan Chorowski, Ron J

Sparse Feature Learning for Deep Belief Networks

Comparison of Feature Learning Methods for Human Activity Recognition Using Wearable Sensors

Convex Multi-Task Feature Learning

Visual Word2vec (Vis-W2v): Learning Visually Grounded Word Embeddings Using Abstract Scenes

Deep Learning with a Recurrent Network Structure in the Sequence Modeling of Imbalanced Data for ECG-Rhythm Classiﬁer

Deep Graphical Feature Learning for the Feature Matching Problem

Unsupervised, Backpropagation-Free Convolutional Neural Networks for Representation Learning

Sparse Estimation and Dictionary Learning (For Biostatistics?)