Dimensionality Reduction
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Nonlinear Dimensionality Reduction by Locally Linear Embedding
R EPORTS 2 ˆ 35. R. N. Shepard, Psychon. Bull. Rev. 1, 2 (1994). 1ÐR (DM, DY). DY is the matrix of Euclidean distanc- ing the coordinates of corresponding points {xi, yi}in 36. J. B. Tenenbaum, Adv. Neural Info. Proc. Syst. 10, 682 es in the low-dimensional embedding recovered by both spaces provided by Isomap together with stan- ˆ (1998). each algorithm. DM is each algorithm’s best estimate dard supervised learning techniques (39). 37. T. Martinetz, K. Schulten, Neural Netw. 7, 507 (1994). of the intrinsic manifold distances: for Isomap, this is 44. Supported by the Mitsubishi Electric Research Labo- 38. V. Kumar, A. Grama, A. Gupta, G. Karypis, Introduc- the graph distance matrix DG; for PCA and MDS, it is ratories, the Schlumberger Foundation, the NSF tion to Parallel Computing: Design and Analysis of the Euclidean input-space distance matrix DX (except (DBS-9021648), and the DARPA Human ID program. Algorithms (Benjamin/Cummings, Redwood City, CA, with the handwritten “2”s, where MDS uses the We thank Y. LeCun for making available the MNIST 1994), pp. 257Ð297. tangent distance). R is the standard linear correlation database and S. Roweis and L. Saul for sharing related ˆ 39. D. Beymer, T. Poggio, Science 272, 1905 (1996). coefficient, taken over all entries of DM and DY. unpublished work. For many helpful discussions, we 40. Available at www.research.att.com/ϳyann/ocr/mnist. 43. In each sequence shown, the three intermediate im- thank G. Carlsson, H. Farid, W. Freeman, T. Griffiths, 41. P. Y. Simard, Y. LeCun, J. -
Comparison of Dimensionality Reduction Techniques on Audio Signals
Comparison of Dimensionality Reduction Techniques on Audio Signals Tamás Pál, Dániel T. Várkonyi Eötvös Loránd University, Faculty of Informatics, Department of Data Science and Engineering, Telekom Innovation Laboratories, Budapest, Hungary {evwolcheim, varkonyid}@inf.elte.hu WWW home page: http://t-labs.elte.hu Abstract: Analysis of audio signals is widely used and this work: car horn, dog bark, engine idling, gun shot, and very effective technique in several domains like health- street music [5]. care, transportation, and agriculture. In a general process Related work is presented in Section 2, basic mathe- the output of the feature extraction method results in huge matical notation used is described in Section 3, while the number of relevant features which may be difficult to pro- different methods of the pipeline are briefly presented in cess. The number of features heavily correlates with the Section 4. Section 5 contains data about the evaluation complexity of the following machine learning method. Di- methodology, Section 6 presents the results and conclu- mensionality reduction methods have been used success- sions are formulated in Section 7. fully in recent times in machine learning to reduce com- The goal of this paper is to find a combination of feature plexity and memory usage and improve speed of following extraction and dimensionality reduction methods which ML algorithms. This paper attempts to compare the state can be most efficiently applied to audio data visualization of the art dimensionality reduction techniques as a build- in 2D and preserve inter-class relations the most. ing block of the general process and analyze the usability of these methods in visualizing large audio datasets. -
Geometric Polarimetry − Part I: Spinors and Wave States
1 Geometric Polarimetry Part I: Spinors and − Wave States David Bebbington, Laura Carrea, and Ernst Krogager, Member, IEEE Abstract A new approach to polarization algebra is introduced. It exploits the geometric properties of spinors in order to represent wave states consistently in arbitrary directions in three dimensional space. In this first expository paper of an intended series the basic derivation of the spinorial wave state is seen to be geometrically related to the electromagnetic field tensor in the spatio-temporal Fourier domain. Extracting the polarization state from the electromagnetic field requires the introduction of a new element, which enters linearly into the defining relation. We call this element the phase flag and it is this that keeps track of the polarization reference when the coordinate system is changed and provides a phase origin for both wave components. In this way we are able to identify the sphere of three dimensional unit wave vectors with the Poincar´esphere. Index Terms state of polarization, geometry, covariant and contravariant spinors and tensors, bivectors, phase flag, Poincar´esphere. I. INTRODUCTION arXiv:0804.0745v1 [physics.optics] 4 Apr 2008 The development of applications in radar polarimetry has been vigorous over the past decade, and is anticipated to continue with ambitious new spaceborne remote sensing missions (for example TerraSAR-X [1] and TanDEM-X [2]). As technical capabilities increase, and new application areas open up, innovations in data analysis often result. Because polarization data are This work was supported by the Marie Curie Research Training Network AMPER (Contract number HPRN-CT-2002-00205). D. Bebbington and L. -
Litrec Vs. Movielens a Comparative Study
LitRec vs. Movielens A Comparative Study Paula Cristina Vaz1;4, Ricardo Ribeiro2;4 and David Martins de Matos3;4 1IST/UAL, Rua Alves Redol, 9, Lisboa, Portugal 2ISCTE-IUL, Rua Alves Redol, 9, Lisboa, Portugal 3IST, Rua Alves Redol, 9, Lisboa, Portugal 4INESC-ID, Rua Alves Redol, 9, Lisboa, Portugal fpaula.vaz, ricardo.ribeiro, [email protected] Keywords: Recommendation Systems, Data Set, Book Recommendation. Abstract: Recommendation is an important research area that relies on the availability and quality of the data sets in order to make progress. This paper presents a comparative study between Movielens, a movie recommendation data set that has been extensively used by the recommendation system research community, and LitRec, a newly created data set for content literary book recommendation, in a collaborative filtering set-up. Experiments have shown that when the number of ratings of Movielens is reduced to the level of LitRec, collaborative filtering results degrade and the use of content in hybrid approaches becomes important. 1 INTRODUCTION and (b) assess its suitability in recommendation stud- ies. The number of books, movies, and music items pub- The paper is structured as follows: Section 2 de- lished every year is increasing far more quickly than scribes related work on recommendation systems and is our ability to process it. The Internet has shortened existing data sets. In Section 3 we describe the collab- the distance between items and the common user, orative filtering (CF) algorithm and prediction equa- making items available to everyone with an Internet tion, as well as different item representation. Section connection. -
A Fast Method for Computing the Inverse of Symmetric Block Arrowhead Matrices
Appl. Math. Inf. Sci. 9, No. 2L, 319-324 (2015) 319 Applied Mathematics & Information Sciences An International Journal http://dx.doi.org/10.12785/amis/092L06 A Fast Method for Computing the Inverse of Symmetric Block Arrowhead Matrices Waldemar Hołubowski1, Dariusz Kurzyk1,∗ and Tomasz Trawi´nski2 1 Institute of Mathematics, Silesian University of Technology, Kaszubska 23, Gliwice 44–100, Poland 2 Mechatronics Division, Silesian University of Technology, Akademicka 10a, Gliwice 44–100, Poland Received: 6 Jul. 2014, Revised: 7 Oct. 2014, Accepted: 8 Oct. 2014 Published online: 1 Apr. 2015 Abstract: We propose an effective method to find the inverse of symmetric block arrowhead matrices which often appear in areas of applied science and engineering such as head-positioning systems of hard disk drives or kinematic chains of industrial robots. Block arrowhead matrices can be considered as generalisation of arrowhead matrices occurring in physical problems and engineering. The proposed method is based on LDLT decomposition and we show that the inversion of the large block arrowhead matrices can be more effective when one uses our method. Numerical results are presented in the examples. Keywords: matrix inversion, block arrowhead matrices, LDLT decomposition, mechatronic systems 1 Introduction thermal and many others. Considered subsystems are highly differentiated, hence formulation of uniform and simple mathematical model describing their static and A square matrix which has entries equal zero except for dynamic states becomes problematic. The process of its main diagonal, a one row and a column, is called the preparing a proper mathematical model is often based on arrowhead matrix. Wide area of applications causes that the formulation of the equations associated with this type of matrices is popular subject of research related Lagrangian formalism [9], which is a convenient way to with mathematics, physics or engineering, such as describe the equations of mechanical, electromechanical computing spectral decomposition [1], solving inverse and other components. -
Unsupervised Speech Representation Learning Using Wavenet Autoencoders Jan Chorowski, Ron J
1 Unsupervised speech representation learning using WaveNet autoencoders Jan Chorowski, Ron J. Weiss, Samy Bengio, Aaron¨ van den Oord Abstract—We consider the task of unsupervised extraction speaker gender and identity, from phonetic content, properties of meaningful latent representations of speech by applying which are consistent with internal representations learned autoencoding neural networks to speech waveforms. The goal by speech recognizers [13], [14]. Such representations are is to learn a representation able to capture high level semantic content from the signal, e.g. phoneme identities, while being desired in several tasks, such as low resource automatic speech invariant to confounding low level details in the signal such as recognition (ASR), where only a small amount of labeled the underlying pitch contour or background noise. Since the training data is available. In such scenario, limited amounts learned representation is tuned to contain only phonetic content, of data may be sufficient to learn an acoustic model on the we resort to using a high capacity WaveNet decoder to infer representation discovered without supervision, but insufficient information discarded by the encoder from previous samples. Moreover, the behavior of autoencoder models depends on the to learn the acoustic model and a data representation in a fully kind of constraint that is applied to the latent representation. supervised manner [15], [16]. We compare three variants: a simple dimensionality reduction We focus on representations learned with autoencoders bottleneck, a Gaussian Variational Autoencoder (VAE), and a applied to raw waveforms and spectrogram features and discrete Vector Quantized VAE (VQ-VAE). We analyze the quality investigate the quality of learned representations on LibriSpeech of learned representations in terms of speaker independence, the ability to predict phonetic content, and the ability to accurately re- [17]. -
Unsupervised Speech Representation Learning Using Wavenet Autoencoders
Unsupervised speech representation learning using WaveNet autoencoders https://arxiv.org/abs/1901.08810 Jan Chorowski University of Wrocław 06.06.2019 Deep Model = Hierarchy of Concepts Cat Dog … Moon Banana M. Zieler, “Visualizing and Understanding Convolutional Networks” Deep Learning history: 2006 2006: Stacked RBMs Hinton, Salakhutdinov, “Reducing the Dimensionality of Data with Neural Networks” Deep Learning history: 2012 2012: Alexnet SOTA on Imagenet Fully supervised training Deep Learning Recipe 1. Get a massive, labeled dataset 퐷 = {(푥, 푦)}: – Comp. vision: Imagenet, 1M images – Machine translation: EuroParlamanet data, CommonCrawl, several million sent. pairs – Speech recognition: 1000h (LibriSpeech), 12000h (Google Voice Search) – Question answering: SQuAD, 150k questions with human answers – … 2. Train model to maximize log 푝(푦|푥) Value of Labeled Data • Labeled data is crucial for deep learning • But labels carry little information: – Example: An ImageNet model has 30M weights, but ImageNet is about 1M images from 1000 classes Labels: 1M * 10bit = 10Mbits Raw data: (128 x 128 images): ca 500 Gbits! Value of Unlabeled Data “The brain has about 1014 synapses and we only live for about 109 seconds. So we have a lot more parameters than data. This motivates the idea that we must do a lot of unsupervised learning since the perceptual input (including proprioception) is the only place we can get 105 dimensions of constraint per second.” Geoff Hinton https://www.reddit.com/r/MachineLearning/comments/2lmo0l/ama_geoffrey_hinton/ Unsupervised learning recipe 1. Get a massive labeled dataset 퐷 = 푥 Easy, unlabeled data is nearly free 2. Train model to…??? What is the task? What is the loss function? Unsupervised learning by modeling data distribution Train the model to minimize − log 푝(푥) E.g. -
Structured Factorizations in Scalar Product Spaces
STRUCTURED FACTORIZATIONS IN SCALAR PRODUCT SPACES D. STEVEN MACKEY¤, NILOUFER MACKEYy , AND FRANC»OISE TISSEURz Abstract. Let A belong to an automorphism group, Lie algebra or Jordan algebra of a scalar product. When A is factored, to what extent do the factors inherit structure from A? We answer this question for the principal matrix square root, the matrix sign decomposition, and the polar decomposition. For general A, we give a simple derivation and characterization of a particular generalized polar decomposition, and we relate it to other such decompositions in the literature. Finally, we study eigendecompositions and structured singular value decompositions, considering in particular the structure in eigenvalues, eigenvectors and singular values that persists across a wide range of scalar products. A key feature of our analysis is the identi¯cation of two particular classes of scalar products, termed unitary and orthosymmetric, which serve to unify assumptions for the existence of structured factorizations. A variety of di®erent characterizations of these scalar product classes is given. Key words. automorphism group, Lie group, Lie algebra, Jordan algebra, bilinear form, sesquilinear form, scalar product, inde¯nite inner product, orthosymmetric, adjoint, factorization, symplectic, Hamiltonian, pseudo-orthogonal, polar decomposition, matrix sign function, matrix square root, generalized polar decomposition, eigenvalues, eigenvectors, singular values, structure preservation. AMS subject classi¯cations. 15A18, 15A21, 15A23, 15A57, -
Dimensionality Reduction and Principal Component Analysis
Dimensionality Reduction Before running any ML algorithm on our data, we may want to reduce the number of features Dimensionality Reduction and Principal ● To save computer memory/disk space if the data are large. Component Analysis ● To reduce execution time. ● To visualize our data, e.g., if the dimensions can be reduced to 2 or 3. Dimensionality Reduction results in an Projecting Data - 2D to 1D approximation of the original data x2 x1 Original 50% 5% 1% Projecting Data - 2D to 1D Projecting Data - 2D to 1D x2 x2 x2 x1 x1 x1 Projecting Data - 2D to 1D Projecting Data - 2D to 1D new feature new feature Projecting Data - 3D to 2D Projecting Data - 3D to 2D Projecting Data - 3D to 2D Why such projections are effective ● In many datasets, some of the features are correlated. ● Correlated features can often be well approximated by a smaller number of new features. ● For example, consider the problem of predicting housing prices. Some of the features may be the square footage of the house, number of bedrooms, number of bathrooms, and lot size. These features are likely correlated. Principal Component Analysis (PCA) Principal Component Analysis (PCA) Suppose we want to reduce data from d dimensions to k dimensions, where d > k. Determine new basis. PCA finds k vectors onto which to project the data so that the Project data onto k axes of new projection errors are minimized. basis with largest variance. In other words, PCA finds the principal components, which offer the best approximation. PCA ≠ Linear Regression PCA Algorithm Given n data -
Linear Algebra - Part II Projection, Eigendecomposition, SVD
Linear Algebra - Part II Projection, Eigendecomposition, SVD (Adapted from Punit Shah's slides) 2019 Linear Algebra, Part II 2019 1 / 22 Brief Review from Part 1 Matrix Multiplication is a linear tranformation. Symmetric Matrix: A = AT Orthogonal Matrix: AT A = AAT = I and A−1 = AT L2 Norm: s X 2 jjxjj2 = xi i Linear Algebra, Part II 2019 2 / 22 Angle Between Vectors Dot product of two vectors can be written in terms of their L2 norms and the angle θ between them. T a b = jjajj2jjbjj2 cos(θ) Linear Algebra, Part II 2019 3 / 22 Cosine Similarity Cosine between two vectors is a measure of their similarity: a · b cos(θ) = jjajj jjbjj Orthogonal Vectors: Two vectors a and b are orthogonal to each other if a · b = 0. Linear Algebra, Part II 2019 4 / 22 Vector Projection ^ b Given two vectors a and b, let b = jjbjj be the unit vector in the direction of b. ^ Then a1 = a1 · b is the orthogonal projection of a onto a straight line parallel to b, where b a = jjajj cos(θ) = a · b^ = a · 1 jjbjj Image taken from wikipedia. Linear Algebra, Part II 2019 5 / 22 Diagonal Matrix Diagonal matrix has mostly zeros with non-zero entries only in the diagonal, e.g. identity matrix. A square diagonal matrix with diagonal elements given by entries of vector v is denoted: diag(v) Multiplying vector x by a diagonal matrix is efficient: diag(v)x = v x is the entrywise product. Inverting a square diagonal matrix is efficient: 1 1 diag(v)−1 = diag [ ;:::; ]T v1 vn Linear Algebra, Part II 2019 6 / 22 Determinant Determinant of a square matrix is a mapping to a scalar. -
Collaborative Filtering: a Machine Learning Perspective by Benjamin Marlin a Thesis Submitted in Conformity with the Requirement
Collaborative Filtering: A Machine Learning Perspective by Benjamin Marlin A thesis submitted in conformity with the requirements for the degree of Master of Science Graduate Department of Computer Science University of Toronto Copyright c 2004 by Benjamin Marlin Abstract Collaborative Filtering: A Machine Learning Perspective Benjamin Marlin Master of Science Graduate Department of Computer Science University of Toronto 2004 Collaborative filtering was initially proposed as a framework for filtering information based on the preferences of users, and has since been refined in many different ways. This thesis is a comprehensive study of rating-based, pure, non-sequential collaborative filtering. We analyze existing methods for the task of rating prediction from a machine learning perspective. We show that many existing methods proposed for this task are simple applications or modifications of one or more standard machine learning methods for classification, regression, clustering, dimensionality reduction, and density estima- tion. We introduce new prediction methods in all of these classes. We introduce a new experimental procedure for testing stronger forms of generalization than has been used previously. We implement a total of nine prediction methods, and conduct large scale prediction accuracy experiments. We show interesting new results on the relative performance of these methods. ii Acknowledgements I would like to begin by thanking my supervisor Richard Zemel for introducing me to the field of collaborative filtering, for numerous helpful discussions about a multitude of models and methods, and for many constructive comments about this thesis itself. I would like to thank my second reader Sam Roweis for his thorough review of this thesis, as well as for many interesting discussions of this and other research. -
Unit 6: Matrix Decomposition
Unit 6: Matrix decomposition Juan Luis Melero and Eduardo Eyras October 2018 1 Contents 1 Matrix decomposition3 1.1 Definitions.............................3 2 Singular Value Decomposition6 3 Exercises 13 4 R practical 15 4.1 SVD................................ 15 2 1 Matrix decomposition 1.1 Definitions Matrix decomposition consists in transforming a matrix into a product of other matrices. Matrix diagonalization is a special case of decomposition and is also called diagonal (eigen) decomposition of a matrix. Definition: there is a diagonal decomposition of a square matrix A if we can write A = UDU −1 Where: • D is a diagonal matrix • The diagonal elements of D are the eigenvalues of A • Vector columns of U are the eigenvectors of A. For symmetric matrices there is a special decomposition: Definition: given a symmetric matrix A (i.e. AT = A), there is a unique decomposition of the form A = UDU T Where • U is an orthonormal matrix • Vector columns of U are the unit-norm orthogonal eigenvectors of A • D is a diagonal matrix formed by the eigenvalues of A This special decomposition is known as spectral decomposition. Definition: An orthonormal matrix is a square matrix whose columns and row vectors are orthogonal unit vectors (orthonormal vectors). Orthonormal matrices have the property that their transposed matrix is the inverse matrix. 3 Proposition: if a matrix Q is orthonormal, then QT = Q−1 Proof: consider the row vectors in a matrix Q: 0 1 u1 B . C T u j ::: ju Q = @ . A and Q = 1 n un 0 1 0 1 0 1 u1 hu1; u1i ::: hu1; uni 1 ::: 0 T B .