Generalized Autoencoder: a Neural Network Framework for Dimensionality Reduction

Total Page:16

File Type:pdf, Size:1020Kb

Generalized Autoencoder: a Neural Network Framework for Dimensionality Reduction Generalized Autoencoder: A Neural Network Framework for Dimensionality Reduction Wei Wang1, Yan Huang1, Yizhou Wang2, Liang Wang1 1Center for Research on Intelligent Perception and Computing, CRIPAC Nat’l Lab of Pattern Recognition, Institute of Automation Chinese Academy of Sciences 2Nat’l Eng. Lab for Video Technology, Key Lab. of Machine Perception (MoE), Sch’l of EECS, Peking University, Beijing, China fwangwei, yhuang, [email protected], [email protected] Abstract The autoencoder algorithm and its deep version as tra- ditional dimensionality reduction methods have achieved great success via the powerful representability of neural networks. However, they just use each instance to recon- struct itself and ignore to explicitly model the data relation so as to discover the underlying effective manifold structure. In this paper, we propose a dimensionality reduction method by manifold learning, which iteratively explores data rela- (a) (b) tion and use the relation to pursue the manifold structure. The method is realized by a so called “generalized autoen- Figure 1. Traditional autoencoder and the proposed generalized coder” (GAE), which extends the traditional autoencoder autoencoder. (a) In the traditional autoencoder, xi is only used 0 2 in two aspects: (1) each instance xi is used to reconstruct to reconstruct itself and the reconstruction error jjxi − xijj just measures the distance between x and x0 . (b) In the generalized a set of instances fxjg rather than itself. (2) The recon- i i 0 2 autoencoder, xi involves in the reconstruction of a set of instances struction error of each instance (jjxj −xijj ) is weighted by 0 2 a relational function of x and x defined on the learned fxj ; xk; :::g. Each reconstruction error sij jjxj − xijj measures a i j weighted distance between x and x0 . manifold. Hence, the GAE captures the structure of the j i data space through minimizing the weighted distances be- tween reconstructed instances and the original ones. The pace”. Various methods of dimensionality reduction have generalized autoencoder provides a general neural network been proposed to discover the underlying manifold struc- framework for dimensionality reduction. In addition, we ture, which plays an important role in many tasks, such as propose a multilayer architecture of the generalized autoen- pattern classification [9] and information visualization [16]. coder called deep generalized autoencoder to handle highly Principal Component Analysis (PCA) is one of the complex datasets. Finally, to evaluate the proposed method- most popular linear dimensionality reduction techniques s, we perform extensive experiments on three datasets. The [10][17]. It projects the original data onto its principal di- experiments demonstrate that the proposed methods achieve rections with the maximal variance, and does not consider promising performance. any data relation. Linear Discriminant Analysis (LDA) is a supervised 1. Introduction method to find a linear subspace, which is optimal for dis- criminating data from different classes [2]. Marginal Fisher Real-world data, such as images of faces and digits, usu- Analysis (MFA) extends LDA by characterizing the intra- ally have a high dimension which leads to the well-known class compactness and interclass separability [19]. These curse of dimensionality in statistical pattern recognition. two methods use class label information as a weak data re- However, high dimensional data usually lies in a lower di- lation to seek a low-dimensional separating subspace. mensional manifold, so-called “intrinsic dimensionality s- However, the low-dimensional manifold structure of re- 1490 al data is usually very complicated. Generally, just using ent reconstruction sets and weights. Inspired by existing a simple parametric model, such as PCA, it is not easy to dimensionality reduction ideas, we derive six implementa- capture such structures. Exploiting data relations has been tions of GAE including both supervised and unsupervised proved to be a promising means to discover the underly- models. Furthermore, we propose a deep GAE (dGAE), ing structure. For example, ISOMAP [15] learns a low- which extends GAE to multiple layers. dimensional manifold by retaining the geodesic distance be- The rest of the paper is organized as follows. In Section tween pairwise data in the original space. In Locally Linear 2, we first introduce the formulation of the generalized au- Embedding (LLE) [12], each data point is a linear combi- toencoder and its connection to graph embedding, then pro- nation of its neighbors. The linear relation is preserved in pose the generalized autoencoder as a general framework of the projected low-dimensional space. Laplacian Eigenmap- dimensionality reduction and derive its six implementation- s (LE) [1] purses a low-dimensional manifold by minimiz- s. In Section 2.3, we illustrate deep generalized autoencoder ing the pairwise distance in the projected space weighted by as a multilayer network of GAE. The experimental results the corresponding distance in the original space. They have on three datasets are presented in Section3. Finally, we dis- a common weakness of suffering from the out-of-sample cuss the proposed method and conclude the paper in Section problem. Neighborhood Preserving Embedding (NPE) [3] 4. and Locality Preserving Projection (LPP) [4] are linear ap- proximations to LLE and LE to handle the out-of-sample 2. Dimensionality Reduction by Manifold problem, respectively. Learning Although these methods exploit local data relation to learn manifold, the relation is fixed and defined on the o- 2.1. The method riginal high-dimensional space. Such relation may not be Algorithm 1 shows the proposed method. (1) We initial- valid on the manifold, e.g. the geodesic nearest neighbor on ize the data relation by computing the data pairwise sim- a manifold may not be the Euclidian nearest neighbor in the ilarity/distance in the original high dimension space, then original space. determine a “relation/reconstruction” set Ωi for each data The autoencoder algorithm [13] belongs to a special fam- point xi. For example, the set can be composed of k-nearest ily of dimensionality reduction methods implemented using neighbors. (2) Local manifold structure around each data is artificial neural networks. It aims to learn a compressed learned from its reconstruction set using the proposed “gen- representation for an input through minimizing its recon- eralized autoencoder” (GAE) introduced below. (3) On the struction error. Fig.1(a) shows the architecture of an autoen- learned manifold, data relation is updated according to the coder. Recently, the autoencoder algorithm and its exten- pairwise similarity/distance defined on the hidden represen- sions [8][18][11] demonstrate a promising ability to learn tation derived from the GAE. Step (2) and (3) are iterated meaningful features from data, which could induce the “in- until convergence or the maximum iteration number being trinsic data structure”. However, these methods just con- reached. sider self-reconstruction and ignore to explicitly model the data relation. 2.2. Generalized Autoencoder In this paper, we propose a method of dimension re- The generalized autoencoder consists of two parts, an duction by manifold learning, which extends the tradition- encoder and a decoder. As shown in Fig.1 (b), the encoder al autoencoder to iteratively explore data relation and use dx maps an input xi 2 R to a reduced hidden representation the relation to pursue the manifold structure. The proposed dy yi 2 R by a function g(), method is realized by a so called “generalized autoencoder” (GAE). As shown in Fig.1, it differs from the traditional au- yi = g(W xi) (1) toencoder in two aspects, (i) the GAE learns a compressed representation yi for an instance xi, and builds the relation where g() is either the identity function for a linear projec- 1 between xi and other data fxj; xk:::g by using yi to recon- tion or a sigmoid function 1+e−W x for a nonlinear mapping. struct each element in the set, not just xi itself. (ii) The The parameter W is a dy × dx weight matrix. In this paper, GAE imposes a relational weight sij on the reconstruction we ignore the bias terms of the neural network for simple 0 2 error jjxj − xijj . Hence, the GAE captures the structure of notation. the data space through minimizing the weighted reconstruc- 0 dx The decoder reconstructs xi 2 R from the hidden rep- tion error. When replacing the sigmoid function in the GAE resentation yi with a linear transformation, we show that the linear GAE 0 0 x = f(W yi) (2) and the linear graph embedding [19] have similar objective i 0 functions. This indicates that the GAE can also be a general The parameter W is another dx × dy weight matrix, which framework for dimensionality reduction by defining differ- can be W T . Similar to g(), f() is either the identity function 491 Algorithm 1 Iterative learning procedure for Generalized is a constant and B is a constraint matrix to avoid a trivial Autoencoder solution 1. n Input: training set fxig1 The linear graph embedding (LGE) assumes that the 0 Parameters: Θ = (W; W ) low-dimensional representation can be obtained by a linear Notation: Ωi: reconstruction set for xi projection y = XT w, where w is the projection vector and Si: the set of reconstruction weight for xi n X = [x1; x2; :::; xn]. The objective function of LGE is fyig1 : hidden representation n 1. Compute the reconstruction weights Si from fxig and ∗ X T T 2 1 w = arg min sijjjw xi − w xjjj (6) T T determine the reconstruction set Ωi, e.g. by k-nearest w XBX w=c T i;j neighbor or w w=c 2. Minimize E in Eqn.4 using the stochastic gradient Similarly, the hidden representation of the generalized descent and update Θ for t steps autoencoder induces a low-dimensional manifold of data n 3.
Recommended publications
  • Training Autoencoders by Alternating Minimization
    Under review as a conference paper at ICLR 2018 TRAINING AUTOENCODERS BY ALTERNATING MINI- MIZATION Anonymous authors Paper under double-blind review ABSTRACT We present DANTE, a novel method for training neural networks, in particular autoencoders, using the alternating minimization principle. DANTE provides a distinct perspective in lieu of traditional gradient-based backpropagation techniques commonly used to train deep networks. It utilizes an adaptation of quasi-convex optimization techniques to cast autoencoder training as a bi-quasi-convex optimiza- tion problem. We show that for autoencoder configurations with both differentiable (e.g. sigmoid) and non-differentiable (e.g. ReLU) activation functions, we can perform the alternations very effectively. DANTE effortlessly extends to networks with multiple hidden layers and varying network configurations. In experiments on standard datasets, autoencoders trained using the proposed method were found to be very promising and competitive to traditional backpropagation techniques, both in terms of quality of solution, as well as training speed. 1 INTRODUCTION For much of the recent march of deep learning, gradient-based backpropagation methods, e.g. Stochastic Gradient Descent (SGD) and its variants, have been the mainstay of practitioners. The use of these methods, especially on vast amounts of data, has led to unprecedented progress in several areas of artificial intelligence. On one hand, the intense focus on these techniques has led to an intimate understanding of hardware requirements and code optimizations needed to execute these routines on large datasets in a scalable manner. Today, myriad off-the-shelf and highly optimized packages exist that can churn reasonably large datasets on GPU architectures with relatively mild human involvement and little bootstrap effort.
    [Show full text]
  • Nonlinear Dimensionality Reduction by Locally Linear Embedding
    R EPORTS 2 ˆ 35. R. N. Shepard, Psychon. Bull. Rev. 1, 2 (1994). 1ÐR (DM, DY). DY is the matrix of Euclidean distanc- ing the coordinates of corresponding points {xi, yi}in 36. J. B. Tenenbaum, Adv. Neural Info. Proc. Syst. 10, 682 es in the low-dimensional embedding recovered by both spaces provided by Isomap together with stan- ˆ (1998). each algorithm. DM is each algorithm’s best estimate dard supervised learning techniques (39). 37. T. Martinetz, K. Schulten, Neural Netw. 7, 507 (1994). of the intrinsic manifold distances: for Isomap, this is 44. Supported by the Mitsubishi Electric Research Labo- 38. V. Kumar, A. Grama, A. Gupta, G. Karypis, Introduc- the graph distance matrix DG; for PCA and MDS, it is ratories, the Schlumberger Foundation, the NSF tion to Parallel Computing: Design and Analysis of the Euclidean input-space distance matrix DX (except (DBS-9021648), and the DARPA Human ID program. Algorithms (Benjamin/Cummings, Redwood City, CA, with the handwritten “2”s, where MDS uses the We thank Y. LeCun for making available the MNIST 1994), pp. 257Ð297. tangent distance). R is the standard linear correlation database and S. Roweis and L. Saul for sharing related ˆ 39. D. Beymer, T. Poggio, Science 272, 1905 (1996). coefficient, taken over all entries of DM and DY. unpublished work. For many helpful discussions, we 40. Available at www.research.att.com/ϳyann/ocr/mnist. 43. In each sequence shown, the three intermediate im- thank G. Carlsson, H. Farid, W. Freeman, T. Griffiths, 41. P. Y. Simard, Y. LeCun, J.
    [Show full text]
  • Comparison of Dimensionality Reduction Techniques on Audio Signals
    Comparison of Dimensionality Reduction Techniques on Audio Signals Tamás Pál, Dániel T. Várkonyi Eötvös Loránd University, Faculty of Informatics, Department of Data Science and Engineering, Telekom Innovation Laboratories, Budapest, Hungary {evwolcheim, varkonyid}@inf.elte.hu WWW home page: http://t-labs.elte.hu Abstract: Analysis of audio signals is widely used and this work: car horn, dog bark, engine idling, gun shot, and very effective technique in several domains like health- street music [5]. care, transportation, and agriculture. In a general process Related work is presented in Section 2, basic mathe- the output of the feature extraction method results in huge matical notation used is described in Section 3, while the number of relevant features which may be difficult to pro- different methods of the pipeline are briefly presented in cess. The number of features heavily correlates with the Section 4. Section 5 contains data about the evaluation complexity of the following machine learning method. Di- methodology, Section 6 presents the results and conclu- mensionality reduction methods have been used success- sions are formulated in Section 7. fully in recent times in machine learning to reduce com- The goal of this paper is to find a combination of feature plexity and memory usage and improve speed of following extraction and dimensionality reduction methods which ML algorithms. This paper attempts to compare the state can be most efficiently applied to audio data visualization of the art dimensionality reduction techniques as a build- in 2D and preserve inter-class relations the most. ing block of the general process and analyze the usability of these methods in visualizing large audio datasets.
    [Show full text]
  • Turbo Autoencoder: Deep Learning Based Channel Codes for Point-To-Point Communication Channels
    Turbo Autoencoder: Deep learning based channel codes for point-to-point communication channels Hyeji Kim Himanshu Asnani Yihan Jiang Samsung AI Center School of Technology ECE Department Cambridge and Computer Science University of Washington Cambridge, United Tata Institute of Seattle, United States Kingdom Fundamental Research [email protected] [email protected] Mumbai, India [email protected] Sewoong Oh Pramod Viswanath Sreeram Kannan Allen School of ECE Department ECE Department Computer Science & University of Illinois at University of Washington Engineering Urbana Champaign Seattle, United States University of Washington Illinois, United States [email protected] Seattle, United States [email protected] [email protected] Abstract Designing codes that combat the noise in a communication medium has remained a significant area of research in information theory as well as wireless communica- tions. Asymptotically optimal channel codes have been developed by mathemati- cians for communicating under canonical models after over 60 years of research. On the other hand, in many non-canonical channel settings, optimal codes do not exist and the codes designed for canonical models are adapted via heuristics to these channels and are thus not guaranteed to be optimal. In this work, we make significant progress on this problem by designing a fully end-to-end jointly trained neural encoder and decoder, namely, Turbo Autoencoder (TurboAE), with the following contributions: (a) under moderate block lengths, TurboAE approaches state-of-the-art performance under canonical channels; (b) moreover, TurboAE outperforms the state-of-the-art codes under non-canonical settings in terms of reliability. TurboAE shows that the development of channel coding design can be automated via deep learning, with near-optimal performance.
    [Show full text]
  • Double Backpropagation for Training Autoencoders Against Adversarial Attack
    1 Double Backpropagation for Training Autoencoders against Adversarial Attack Chengjin Sun, Sizhe Chen, and Xiaolin Huang, Senior Member, IEEE Abstract—Deep learning, as widely known, is vulnerable to adversarial samples. This paper focuses on the adversarial attack on autoencoders. Safety of the autoencoders (AEs) is important because they are widely used as a compression scheme for data storage and transmission, however, the current autoencoders are easily attacked, i.e., one can slightly modify an input but has totally different codes. The vulnerability is rooted the sensitivity of the autoencoders and to enhance the robustness, we propose to adopt double backpropagation (DBP) to secure autoencoder such as VAE and DRAW. We restrict the gradient from the reconstruction image to the original one so that the autoencoder is not sensitive to trivial perturbation produced by the adversarial attack. After smoothing the gradient by DBP, we further smooth the label by Gaussian Mixture Model (GMM), aiming for accurate and robust classification. We demonstrate in MNIST, CelebA, SVHN that our method leads to a robust autoencoder resistant to attack and a robust classifier able for image transition and immune to adversarial attack if combined with GMM. Index Terms—double backpropagation, autoencoder, network robustness, GMM. F 1 INTRODUCTION N the past few years, deep neural networks have been feature [9], [10], [11], [12], [13], or network structure [3], [14], I greatly developed and successfully used in a vast of fields, [15]. such as pattern recognition, intelligent robots, automatic Adversarial attack and its defense are revolving around a control, medicine [1]. Despite the great success, researchers small ∆x and a big resulting difference between f(x + ∆x) have found the vulnerability of deep neural networks to and f(x).
    [Show full text]
  • Litrec Vs. Movielens a Comparative Study
    LitRec vs. Movielens A Comparative Study Paula Cristina Vaz1;4, Ricardo Ribeiro2;4 and David Martins de Matos3;4 1IST/UAL, Rua Alves Redol, 9, Lisboa, Portugal 2ISCTE-IUL, Rua Alves Redol, 9, Lisboa, Portugal 3IST, Rua Alves Redol, 9, Lisboa, Portugal 4INESC-ID, Rua Alves Redol, 9, Lisboa, Portugal fpaula.vaz, ricardo.ribeiro, [email protected] Keywords: Recommendation Systems, Data Set, Book Recommendation. Abstract: Recommendation is an important research area that relies on the availability and quality of the data sets in order to make progress. This paper presents a comparative study between Movielens, a movie recommendation data set that has been extensively used by the recommendation system research community, and LitRec, a newly created data set for content literary book recommendation, in a collaborative filtering set-up. Experiments have shown that when the number of ratings of Movielens is reduced to the level of LitRec, collaborative filtering results degrade and the use of content in hybrid approaches becomes important. 1 INTRODUCTION and (b) assess its suitability in recommendation stud- ies. The number of books, movies, and music items pub- The paper is structured as follows: Section 2 de- lished every year is increasing far more quickly than scribes related work on recommendation systems and is our ability to process it. The Internet has shortened existing data sets. In Section 3 we describe the collab- the distance between items and the common user, orative filtering (CF) algorithm and prediction equa- making items available to everyone with an Internet tion, as well as different item representation. Section connection.
    [Show full text]
  • Artificial Intelligence Applied to Electromechanical Monitoring, A
    ARTIFICIAL INTELLIGENCE APPLIED TO ELECTROMECHANICAL MONITORING, A PERFORMANCE ANALYSIS Erasmus project Authors: Staš Osterc Mentor: Dr. Miguel Delgado Prieto, Dr. Francisco Arellano Espitia 1/5/2020 P a g e II ANNEX VI – DECLARACIÓ D’HONOR P a g e II I declare that, the work in this Master Thesis / Degree Thesis (choose one) is completely my own work, no part of this Master Thesis / Degree Thesis (choose one) is taken from other people’s work without giving them credit, all references have been clearly cited, I’m authorised to make use of the company’s / research group (choose one) related information I’m providing in this document (select when it applies). I understand that an infringement of this declaration leaves me subject to the foreseen disciplinary actions by The Universitat Politècnica de Catalunya - BarcelonaTECH. ___________________ __________________ ___________ Student Name Signature Date Title of the Thesis : _________________________________________________ ________________________________________________________________ ________________________________________________________________ ________________________________________________________________ P a g e II Contents Introduction........................................................................................................................................5 Abstract ..........................................................................................................................................5 Aim .................................................................................................................................................6
    [Show full text]
  • Unsupervised Speech Representation Learning Using Wavenet Autoencoders Jan Chorowski, Ron J
    1 Unsupervised speech representation learning using WaveNet autoencoders Jan Chorowski, Ron J. Weiss, Samy Bengio, Aaron¨ van den Oord Abstract—We consider the task of unsupervised extraction speaker gender and identity, from phonetic content, properties of meaningful latent representations of speech by applying which are consistent with internal representations learned autoencoding neural networks to speech waveforms. The goal by speech recognizers [13], [14]. Such representations are is to learn a representation able to capture high level semantic content from the signal, e.g. phoneme identities, while being desired in several tasks, such as low resource automatic speech invariant to confounding low level details in the signal such as recognition (ASR), where only a small amount of labeled the underlying pitch contour or background noise. Since the training data is available. In such scenario, limited amounts learned representation is tuned to contain only phonetic content, of data may be sufficient to learn an acoustic model on the we resort to using a high capacity WaveNet decoder to infer representation discovered without supervision, but insufficient information discarded by the encoder from previous samples. Moreover, the behavior of autoencoder models depends on the to learn the acoustic model and a data representation in a fully kind of constraint that is applied to the latent representation. supervised manner [15], [16]. We compare three variants: a simple dimensionality reduction We focus on representations learned with autoencoders bottleneck, a Gaussian Variational Autoencoder (VAE), and a applied to raw waveforms and spectrogram features and discrete Vector Quantized VAE (VQ-VAE). We analyze the quality investigate the quality of learned representations on LibriSpeech of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately re- [17].
    [Show full text]
  • Unsupervised Speech Representation Learning Using Wavenet Autoencoders
    Unsupervised speech representation learning using WaveNet autoencoders https://arxiv.org/abs/1901.08810 Jan Chorowski University of Wrocław 06.06.2019 Deep Model = Hierarchy of Concepts Cat Dog … Moon Banana M. Zieler, “Visualizing and Understanding Convolutional Networks” Deep Learning history: 2006 2006: Stacked RBMs Hinton, Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks” Deep Learning history: 2012 2012: Alexnet SOTA on Imagenet Fully supervised training Deep Learning Recipe 1. Get a massive, labeled dataset 퐷 = {(푥, 푦)}: – Comp. vision: Imagenet, 1M images – Machine translation: EuroParlamanet data, CommonCrawl, several million sent. pairs – Speech recognition: 1000h (LibriSpeech), 12000h (Google Voice Search) – Question answering: SQuAD, 150k questions with human answers – … 2. Train model to maximize log 푝(푦|푥) Value of Labeled Data • Labeled data is crucial for deep learning • But labels carry little information: – Example: An ImageNet model has 30M weights, but ImageNet is about 1M images from 1000 classes Labels: 1M * 10bit = 10Mbits Raw data: (128 x 128 images): ca 500 Gbits! Value of Unlabeled Data “The brain has about 1014 synapses and we only live for about 109 seconds. So we have a lot more parameters than data. This motivates the idea that we must do a lot of unsupervised learning since the perceptual input (including proprioception) is the only place we can get 105 dimensions of constraint per second.” Geoff Hinton https://www.reddit.com/r/MachineLearning/comments/2lmo0l/ama_geoffrey_hinton/ Unsupervised learning recipe 1. Get a massive labeled dataset 퐷 = 푥 Easy, unlabeled data is nearly free 2. Train model to…??? What is the task? What is the loss function? Unsupervised learning by modeling data distribution Train the model to minimize − log 푝(푥) E.g.
    [Show full text]
  • Dimensionality Reduction and Principal Component Analysis
    Dimensionality Reduction Before running any ML algorithm on our data, we may want to reduce the number of features Dimensionality Reduction and Principal ● To save computer memory/disk space if the data are large. Component Analysis ● To reduce execution time. ● To visualize our data, e.g., if the dimensions can be reduced to 2 or 3. Dimensionality Reduction results in an Projecting Data - 2D to 1D approximation of the original data x2 x1 Original 50% 5% 1% Projecting Data - 2D to 1D Projecting Data - 2D to 1D x2 x2 x2 x1 x1 x1 Projecting Data - 2D to 1D Projecting Data - 2D to 1D new feature new feature Projecting Data - 3D to 2D Projecting Data - 3D to 2D Projecting Data - 3D to 2D Why such projections are effective ● In many datasets, some of the features are correlated. ● Correlated features can often be well approximated by a smaller number of new features. ● For example, consider the problem of predicting housing prices. Some of the features may be the square footage of the house, number of bedrooms, number of bathrooms, and lot size. These features are likely correlated. Principal Component Analysis (PCA) Principal Component Analysis (PCA) Suppose we want to reduce data from d dimensions to k dimensions, where d > k. Determine new basis. PCA finds k vectors onto which to project the data so that the Project data onto k axes of new projection errors are minimized. basis with largest variance. In other words, PCA finds the principal components, which offer the best approximation. PCA ≠ Linear Regression PCA Algorithm Given n data
    [Show full text]
  • Collaborative Filtering: a Machine Learning Perspective by Benjamin Marlin a Thesis Submitted in Conformity with the Requirement
    Collaborative Filtering: A Machine Learning Perspective by Benjamin Marlin A thesis submitted in conformity with the requirements for the degree of Master of Science Graduate Department of Computer Science University of Toronto Copyright c 2004 by Benjamin Marlin Abstract Collaborative Filtering: A Machine Learning Perspective Benjamin Marlin Master of Science Graduate Department of Computer Science University of Toronto 2004 Collaborative filtering was initially proposed as a framework for filtering information based on the preferences of users, and has since been refined in many different ways. This thesis is a comprehensive study of rating-based, pure, non-sequential collaborative filtering. We analyze existing methods for the task of rating prediction from a machine learning perspective. We show that many existing methods proposed for this task are simple applications or modifications of one or more standard machine learning methods for classification, regression, clustering, dimensionality reduction, and density estima- tion. We introduce new prediction methods in all of these classes. We introduce a new experimental procedure for testing stronger forms of generalization than has been used previously. We implement a total of nine prediction methods, and conduct large scale prediction accuracy experiments. We show interesting new results on the relative performance of these methods. ii Acknowledgements I would like to begin by thanking my supervisor Richard Zemel for introducing me to the field of collaborative filtering, for numerous helpful discussions about a multitude of models and methods, and for many constructive comments about this thesis itself. I would like to thank my second reader Sam Roweis for his thorough review of this thesis, as well as for many interesting discussions of this and other research.
    [Show full text]
  • 13 Dimensionality Reduction and Microarray Data
    13 Dimensionality Reduction and Microarray data David A. Elizondo, Benjamin N. Passow, Ralph Birkenhead, and Andreas Huemer Centre for Computational Intelligence, School of Computing, Faculty of Computing Sciences and Engineering, De Montfort University, Leicester, UK,{elizondo,passow,rab,ahuemer}@dmu.ac.uk Summary. Microarrays are being currently used for the expression levels of thou- sands of genes simultaneously. They present new analytical challenges because they have a very high input dimension and a very low sample size. It is highly com- plex to analyse multi-dimensional data with complex geometry and to identify low- dimensional “principal objects” that relate to the optimal projection while losing the least amount of information. Several methods have been proposed for dimensionality reduction of microarray data. Some of these methods include principal component analysis and principal manifolds. This article presents a comparison study of the performance of the linear principal component analysis and the non linear local tan- gent space alignment principal manifold methods on such a problem. Two microarray data sets will be used in this study. A classification model will be created using fully dimensional and dimensionality reduced data sets. To measure the amount of infor- mation lost with the two dimensionality reduction methods, the level of performance of each of the methods will be measured in terms of level of generalisation obtained by the classification models on previously unseen data sets. These results will be compared with the ones obtained using the fully dimensional data sets. Key words: Microarray data; Principal Manifolds; Principal Component Analysis; Local Tangent Space Alignment; Linear Separability; Neural Net- works.
    [Show full text]