Efficient Dimensionality Reduction Using Random Projection
Total Page:16
File Type:pdf, Size:1020Kb
Computer Vision Winter Workshop 2010, Libor Spaˇ cek,ˇ Vojtechˇ Franc (eds.) Nove´ Hrady, Czech Republic, February 3–5 Czech Pattern Recognition Society Efficient Dimensionality Reduction Using Random Projection Vildana Sulic,´ Janez Pers,ˇ Matej Kristan, and Stanislav Kovaciˇ cˇ Faculty of Electrical Engineering, University of Ljubljana, Slovenia [email protected] Abstract Dimensionality reduction techniques are transformations have been very popular in determining especially important in the context of embedded vision the intrinsic dimensionality of the manifold as well as systems. A promising dimensionality reduction method extracting its principal directions (i.e., basis vectors). The for a use in such systems is the random projection. most famous method in this category is the Principal In this paper we explore the performance of the Component Analysis (PCA) [11]. PCA (also known random projection method, which can be easily used in as the Karhunen-Loeve´ transform) is a vector-space embedded cameras. Random projection is compared to transform that reduces multidimensional data sets to lower Principal Component Analysis in the terms of recognition dimensions while minimizing the loss of information. A efficiency on the COIL-20 image data set. Results low-dimensional representation of the data is constructed show surprisingly good performance of the random in such a way that it describes as much of the variance projection in comparison to the principal component in the data as possible. This is achieved by finding a analysis even without explicit orthogonalization or linear basis of reduced dimensionality for the data (a normalization of transformation subspace. These results set of eigenvectors) in which the variance in the data is support the use of random projection in our hierarchical maximal [27]. feature-distribution scheme in visual-sensor networks, Besides PCA, many other dimensionality reduction where random projection elegantly solves the problem of techniques exist. Recently, Random Projection shared subspace distribution. (RP) [28, 16, 5] has emerged as a powerful method for reducing dimensionality. The most important property of the RP method is that it is a general data reduction 1 Introduction method. Unlike PCA, it does not depend on a particular training data set. Unlike Discrete Cosine Transform (DCT) In visual systems we usually deal with large amounts of or Discrete Fourier Transform (DFT) its basis vectors do digital image data. Data has to be archived or exchanged not exhibit particular frequency or phase properties. between numerous users and systems [3], consuming In RP, the original high-dimensional data is projected expensive resources, such as storage space or transmission onto a low-dimensional subspace using a random matrix, bandwidth. whose columns have unit length. If compared to In digital imaging the basic unit of image is pixel. other methods, for instance PCA, which compute a Therefore, an image can be represented as a feature vector, low-dimensional subspace by optimizing certain criteria where each pixel corresponds to one feature value. Even (e.g., PCA finds a subspace that maximizes the variance in standard resolution images (e.g., VGA) contain large the data), RP does not use such criteria, therefore, it is data number of pixels and therefore the resulting feature vector independent. Furthermore, it represents a computationally representation has usually high dimensionality. In order simple and efficient method that preserves the structure to handle real-world data adequately, the dimensionality of the data without significant distortion [11]. There needs to be reduced [27]. Using dimensionality reduction exist theoretical results supporting that RP preserves for techniques is especially important when data is distributed example volumes and affine distances [19] or the structure across networks with limited bandwidth. of data (e.g., clustering) [6]. A dimensionality reduction technique that is capable to reduce the data into a lower-dimensional model, while 1.1 Motivation preserving the reconstructive or discriminative properties Dimensionality reduction techniques are especially of the original data can be marked as ideal. However, important in the context of embedded vision systems, in practice information is lost as the dimensionality is such as smart cameras. The reasons for that are reduced. Therefore, a method which efficiently reduces the specific characteristics and limitations of smart dimensionality, while preserving as much as possible cameras (i.e., low processing power, limited storage information from the original data is needed. One solution space and limited network bandwidth) [26]. Processing is to reduce the dimensionality of data by projecting it of dimensionality reduced data usually requires far onto a lower-dimensional subspace [18]. less resources than processing of unmodified data. In Dimensionality reduction techniques using linear our previous work [24, 25] we proposed a framework 1 Efficient Dimensionality Reduction Using Random Projection [ ] ← of hierarchical feature-distribution (HFD) for object In the Section 2 we provide theoretical background of recognition in a network of visual sensors, which utilizes the RP, including selection of the random matrix and its network in a more balanced way than trivial network orthogonalization, and we present different applications of flooding. HFD for visual-sensor networks (VSNs) is based the RP. Experiments and results of tests are reported and on hierarchical distribution of the information, where each discussed in the Section 3. Section 4 provides discussion individual node retains only a small amount of information and conclusion. about the objects seen by the network. Nevertheless, this amount is sufficient to efficiently route queries through 2 Random projection the network without any degradation in the recognition performance. The amount of data transmitted through the Random projection is a powerful dimension reduction network can be significantly reduced using our hierarchical technique that uses random projection matrices to project distribution. data from high-dimensional subspace to a low-dimensional subspace [11]. The RP technique has been used for One of the methods for image feature extraction used designing algorithms for problems from a variety of areas, in our work was PCA. We used the variant of the PCA, such as combinatorial optimization, information retrieval which assumes that the eigenvectors (the subspace) were and machine learning [28]. Some theoretical background obtained in advance. We assumed that each sensor has of RP and a brief review of its applications is presented an access to the global subspace. However, distribution of below. subspace is problem by itself. In practical implementation this subspace would have to be transmitted to each and 2.1 Background every node in the network, which would cause significant The main idea of RP is that using a random matrix whose network traffic. Alternatively, cameras could use fixed columns have unit lengths, the original high-dimensional subspace, which is built into the camera at the time of data is projected onto a lower-dimensional subspace [3]. RP manufacture or installation. has been found computationally efficient and a sufficiently Therefore, the problem that we are aiming to solve is as accurate method for dimensionality reduction of highly follows. How to distribute common subspace to all cameras dimensional data sets (e.g., [14, 16, 4, 6, 1] just to name in a network without transmitting large amount of data? a few). Using RP, there is a possibility that all cameras in The concept of RP is as follows: Given a data matrix the network recreate exactly the same subspace with X, the dimensionality of data can be reduced by projecting minimum of transmitted information. This is possible, if it onto a lower-dimensional subspace formed by a set of pseudorandom generator is used to generate the random random vectors [16], matrix. The only thing that has to be known to each camera in the network is the pseudorandom state. In A[m×N] = R[m×d] X[d×N], (1) practice, it is sufficient that each camera knows only one · where N is the total number of points, d is the original parameter, called seed of the pseudorandom generator. dimension, and m is the desired lower dimension. The Since same pseudorandom seed results in exactly the same central idea of RP is based on the Johnson-Lindenstrauss pseudorandom sequence, recreation of the same random lemma (JL lemma) [15]: matrix in each camera is possible. This means that each Johnson-Lindenstrauss lemma camera in the network projects the input data (images) For any 0 < ε < 1 and any integer n, let k be a positive through the same random matrix into same subspace. integer such that k Therefore, since RP method enables reduction of 2 3 data dimensionality, is computationally simple, preserves ǫ ǫ − k 4( ) 1 ln n. (2) the structure of the data and is increasingly used in ≥ 2 − 3 distributed visual-sensor networks, we decided to examine Rd the recognition efficiency of RP and compare it to the Then for any set V of n points in , there is a map f : Rd Rk such that for all u,v ǫ V [7], well-known PCA. Other researchers have explored RP 7→ in the context of many applications, but to best of our (1 ǫ) u v 2 f(u) f(v) 2 (1+ǫ) u v 2, (3) knowledge, a detailed comparison to PCA has not been − k − k ≤ k − k ≤ k − k done in the terms of recognition efficiency. Moreover, where f(u) and f(v) are the projections of u and v. RP can be implemented in many ways, and many of them are not appropriate for the resource-constrained Using the above lemma, [7] shows that if we perform an embedded cameras. Therefore, our aim is to determine the orthogonal projection of n points in a vector space (Rd) performance of RP implementation, which can be directly onto a selected lower-dimensional subspace, then distances used in embedded cameras. Furthermore, we decided to between points are preserved (i.e., not distorted more than explore several variant of RP and their effect on recognition a factor of 1 ε), for any 0 < ε < 1.