8 Principal Components Analysis

Total Page:16

File Type:pdf, Size:1020Kb

8 Principal Components Analysis 8 Principal Components Analysis Further reading. The paper Xoplaki et al. (2000) shows a nice application of the principal components and canonical correlation analysis method (see Chapter 9). 8.1 Introduction Example (Weather Report, p. 1-6). The data set consists of sunshine (three variables), air temperature (six variables), heating degree days (HDD) and precipitation data (five variables). Figure 8.1 shows a scatterplot matrix of several variables. With the help of principal component analysis we want to reduce the number of variables, without loosing a lot of information. A principal component analysis is concerned with explaining the variance-covariance structure of a set of variables through a few linear combinations of these variables. Its general objectives are (1) data reduction and (2) interpretation. Although p components are required to reproduce the total system variability, often much of this variability can be accounted for by a small number k of the principal components. If so, there is almost as much information in the k components as there is in the original p variables. The k principal components can then replace the initial p variables, and the original data set, consisting of n measurements on p variables, is reduced to a data set consisting of n measurements on k principal components. Analyses of principal components are more of a means to an end rather than an end in themselves, because they frequently serve as intermediate steps in much larger investigations. 8.2 Population Principal Components Algebraically, principal components are particular linear combinations of the p random variables X1;:::;Xp. Geometrically, these linear combinations represent the selection of a new coordinate system obtained by rotating the original system with X1;:::;Xp as the coordinate axes. The new axes represent the directions with maximum variability and provide a simpler and more parsimonious description of the covariance structure. As we shall see, principal components depend solely on the covariance matrix Σ (or the correlation matrix ρ of X1;:::;Xp). Their development does not require a multivariate normal assumption. On the other hand, principal components derived for multivariate normal populations have useful interpretations in terms of the constant density ellipsoids. Further, inferences can be done from the sample components when the population is multivariate normal. 0 Let the random vector X = (X1;:::;Xp) have the covariance matrix Σ with eigen- 8-1 values λ1 ::: λp 0. Consider the linear combinations ≥ ≥ ≥ 0 Y1 = a1X = a11X1 + ::: + a1pXp . 0 Yp = apX = ap1X1 + ::: + appXp: Then, we find 0 Var(Yi) = aiΣai; i = 1; : : : ; p (8.1) 0 Cov(Yi;Yk) = aiΣak; i; k = 1; : : : ; p (8.2) The principal components are those uncorrelated linear combinations Y1;:::;Yp whose variances in (8.1) are as large as possible. Definition 8.2.1. We define 0 First principal component = linear combination a1Xthat maximizes 0 0 Var(a1X) subject to a1a1 = 1 0 Second principal component = linear combination a2Xthat maximizes 0 0 Var(a2X) subject to a2a2 = 1 and 0 0 Cov(a1X; a2X) = 0 . 0 ith principal component = linear combination aiXthat maximizes 0 0 Var(aiX) subject to aiai = 1 and 0 0 Cov(aiX; akX) = 0 for k < i: Proposition 8.2.2. Let Σ be the covariance matrix associated with the random vector 0 X = (X1;:::;Xp). Let Σ have eigenvalue-eigenvector pairs (λ1; e1);:::; (λp; ep) where λ1 ::: λp 0. Then the ith principal component is given by ≥ ≥ ≥ p 0 Yi = eiX = eijXj; i = 1; : : : ; p: (8.3) j=1 X With these choices 0 Var(Yi) = eiΣei = λi; i = 1; : : : ; p 0 Cov(Yi;Yk) = e Σek = 0; i = k: i 6 Remark. If some λi are equal, the choices of the corresponding coefficient vectors, ei, and hence Yi, are not unique. 8-2 0 Proposition 8.2.3. Let X = (X1;:::;Xp) as in Proposition 8.2.2. Then p p p p tr(Σ) = σii = Var(Xi) = λi = Var(Yi) (8.4) i=1 i=1 i=1 i=1 X X X X and the proportion of total population variance due to the kth principal component λk p ; k = 1; : : : ; p: i=1 λi Remark. If most of the total populationP variance, for large p, can be attributed to the first one, two or three components, then these components can \replace" the original p variables without much loss of information. Remark. The magnitude of eik { also principal component loading { measures the im- portance of the kth variable to the ith principal component and thus is a useful basis for interpretation. A large coefficient (in absolute value) corresponds to a high loading, while a coefficient near zero has a low loading. Remark. One important use of principal components is interpreting the original data in terms of the principal components. The images of the original data under the principal components transformation are referred to as principal component scores. 0 0 Proposition 8.2.4. If Y1 = e1X;:::;Yp = epX are the principal components obtained from the covariance matrix Σ then eikpλi ρYi;Xk = ; i; k = 1; : : : ; p: pσkk 2 p 2 Remark. Suppose X Np(0; Σ). Then we have c = y /λi and this equation ∼ i=1 i defines an ellipsoid with axes y1; : : : ; yp lying in directions of e1;:::; ep, respectively. P 8.2.1 Principal Components obtained from Standardized Val- ues Principal components may also be obtained for the standardized variables Xi µi Zi = − ; i = 1; : : : ; p (8.5) pσii or −1 Z = V1=2 (X µ); (8.6) − where the diagonal standard deviation matrix V1=2 is defined as pσ11 0 0 ··· 0 pσ22 0 V1=2 = 0 ··· 1 : . .. B . C B C B 0 0 pσpp:C B ··· C @ A 8-3 −1 −1 We find E(Z) = 0 and Cov(Z) = V−1=2 Σ V1=2 = ρ: Remark. The eigenvalue-eigenvector pairs derived from Σare in general not the same as the ones derived from ρ. Proposition 8.2.5. The ith principal component of the standardized variables Z0 = (Z1;:::;Zp) with Cov(Z) = ρ, is given by −1 0 0 1=2 Yi = e Z = e V (X µ); i = 1; : : : ; p: i i − Moreover p p Var(Yi) = Var(Zi) = p i=1 i=1 X X and ρYi;Zk = eik λi; i; k = 1; : : : ; p: In this case (λ1; e1);:::; (λp; ep) arep the eigenvalue-eigenvector pairs of ρ with λ1 ≥ ::: λp 0. ≥ ≥ Remark. Variables should be standardized if they are measured on scales with widely differing ranges or if the units of measurements are not commensurate. 8.3 Summarizing Sample Variation by Principal Com- ponents We now study the problem of summarizing the variation in n measurements on p vari- ables with a few judiciously chosen linear combinations. Suppose the data x1;:::; xn represent n independent drawings from some p-dimensional population with mean vec- tor µ and covariance matrix Σ. These data yield the sample mean vector x, sample covariance matrix S and the sample correlation matrix R. The uncorrelated combinations with the largest variances will be called the sample principal components. The sample principal components (PC) are defined as those linear 8-4 combinations which have maximum sample variance. Specifically, 0 First sample PC = linear combination a1xj that maximizes the 0 0 sample variance Var(a1xj) subject to a1a1 = 1 0 Second sample PC = linear combination a2xj that maximizes the 0 0 sample variance Var(a2xj) subject to a2a2 = 1 and 0 0 Cov(a1xj; a2xj) = 0 . 0 ith sample PC = linear combination aixj that maximizes the 0 0 sample variance Var(aixj) subject to aiai = 1 and 0 0 Cov(aixj; akxj) = 0; k < i: Proposition 8.3.1. If S = sik is the p p sample covariance matrix with eigenvalue- ^ f ^ g × eigenvector pairs (λ1; e^1);:::; (λp; e^p) the ith sample principal component is given by 0 y^i = e^ix =e ^i1x1 + ::: +e ^ipxp; i = 1; : : : ; p; ^ ^ where λ1 ::: λp 0 and x is any observation on the variable X1;:::;Xp. Also, ≥ ≥ ≥ ^ sample variance (^yk) = λk; k = 1; : : : ; p; sample covariance (^yi; y^k) = 0; i = k: 6 In addition p ^ ^ total sample variance = sii = λ1 + ::: + λp i=1 X and ^ e^ik λi r(^yi; xk) = ; i; k = 1; : : : ; p: ppσkk Number of Principal Components How many components to retain? There is no definitive answer to this question. A useful visual aid to determining an appropriate number of principal components is a scree plot. With the eigenvalues ordered from largest to smallest, a scree plot is a plot of ^ λi versus i. To determine the appropriate number of components, we look for an elbow (bend) in the scree plot. The number of components is taken to be the point at which the remaining eigenvalues are relatively small and all about the same size. 8-5 Remark. An unusually small value for the last eigenvalue from either the sample covari- ance or correlation matrix can indicate an unnoticed linear dependency in the data set and should therefore not be routinely ignored. Example (Weather Report, p. 1-6). Figure 8.2 shows the results of the principal com- ponent analysis calculated with the correlation matrix. 8-6 Figure 8.1: Scatterplot matrix of some variables of the Weather Report, p. 1-6. 8-7 Figure 8.2: Screeplot of the principal components analysis (top left), scatterplot matrix of the scores calculated with the correlation matrix (top right) and the loadings of the variables (bottom). Data set: Weather Report, p. 1-6. 8-8 9 Canonical Correlation Analysis Further reading. Read again the paper Xoplaki et al. (2000) to see how the canonical correlation analysis method can be applied in climate sciences.
Recommended publications
  • Canonical Correlation a Tutorial
    Canonical Correlation a Tutorial Magnus Borga January 12, 2001 Contents 1 About this tutorial 1 2 Introduction 2 3 Definition 2 4 Calculating canonical correlations 3 5 Relating topics 3 5.1 The difference between CCA and ordinary correlation analysis . 3 5.2Relationtomutualinformation................... 4 5.3 Relation to other linear subspace methods ............. 4 5.4RelationtoSNR........................... 5 5.4.1 Equalnoiseenergies.................... 5 5.4.2 Correlation between a signal and the corrupted signal . 6 A Explanations 6 A.1Anoteoncorrelationandcovariancematrices........... 6 A.2Affinetransformations....................... 6 A.3Apieceofinformationtheory.................... 7 A.4 Principal component analysis . ................... 9 A.5 Partial least squares ......................... 9 A.6 Multivariate linear regression . ................... 9 A.7Signaltonoiseratio......................... 10 1 About this tutorial This is a printable version of a tutorial in HTML format. The tutorial may be modified at any time as will this version. The latest version of this tutorial is available at http://people.imt.liu.se/˜magnus/cca/. 1 2 Introduction Canonical correlation analysis (CCA) is a way of measuring the linear relationship between two multidimensional variables. It finds two bases, one for each variable, that are optimal with respect to correlations and, at the same time, it finds the corresponding correlations. In other words, it finds the two bases in which the correlation matrix between the variables is diagonal and the correlations on the diagonal are maximized. The dimensionality of these new bases is equal to or less than the smallest dimensionality of the two variables. An important property of canonical correlations is that they are invariant with respect to affine transformations of the variables. This is the most important differ- ence between CCA and ordinary correlation analysis which highly depend on the basis in which the variables are described.
    [Show full text]
  • A Tutorial on Canonical Correlation Analysis
    Finding the needle in high-dimensional haystack: A tutorial on canonical correlation analysis Hao-Ting Wang1,*, Jonathan Smallwood1, Janaina Mourao-Miranda2, Cedric Huchuan Xia3, Theodore D. Satterthwaite3, Danielle S. Bassett4,5,6,7, Danilo Bzdok,8,9,10,* 1Department of Psychology, University of York, Heslington, York, YO10 5DD, United Kingdome 2Centre for Medical Image Computing, Department of Computer Science, University College London, London, United Kingdom; MaX Planck University College London Centre for Computational Psychiatry and Ageing Research, University College London, London, United Kingdom 3Department of Psychiatry, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA, 19104, USA 4Department of Bioengineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA 5Department of Electrical and Systems Engineering, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA 6Department of Neurology, Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104 USA 7Department of Physics & Astronomy, School of Arts & Sciences, University of Pennsylvania, Philadelphia, PA 19104 USA 8 Department of Psychiatry, Psychotherapy and Psychosomatics, RWTH Aachen University, Germany 9JARA-BRAIN, Jülich-Aachen Research Alliance, Germany 10Parietal Team, INRIA, Neurospin, Bat 145, CEA Saclay, 91191, Gif-sur-Yvette, France * Correspondence should be addressed to: Hao-Ting Wang [email protected] Department of Psychology, University of York, Heslington, York, YO10 5DD, United Kingdome Danilo Bzdok [email protected] Department of Psychiatry, Psychotherapy and Psychosomatics, RWTH Aachen University, Germany 1 1 ABSTRACT Since the beginning of the 21st century, the size, breadth, and granularity of data in biology and medicine has grown rapidly. In the eXample of neuroscience, studies with thousands of subjects are becoming more common, which provide eXtensive phenotyping on the behavioral, neural, and genomic level with hundreds of variables.
    [Show full text]
  • Ph 21.5: Covariance and Principal Component Analysis (PCA)
    Ph 21.5: Covariance and Principal Component Analysis (PCA) -v20150527- Introduction Suppose we make a measurement for which each data sample consists of two measured quantities. A simple example would be temperature (T ) and pressure (P ) taken at time (t) at constant volume(V ). The data set is Ti;Pi ti N , which represents a set of N measurements. We wish to make sense of the data and determine thef dependencej g of, say, P on T . Suppose P and T were for some reason independent of each other; then the two variables would be uncorrelated. (Of course we are well aware that P and V are correlated and we know the ideal gas law: PV = nRT ). How might we infer the correlation from the data? The tools for quantifying correlations between random variables is the covariance. For two real-valued random variables (X; Y ), the covariance is defined as (under certain rather non-restrictive assumptions): Cov(X; Y ) σ2 (X X )(Y Y ) ≡ XY ≡ h − h i − h i i where ::: denotes the expectation (average) value of the quantity in brackets. For the case of P and T , we haveh i Cov(P; T ) = (P P )(T T ) h − h i − h i i = P T P T h × i − h i × h i N−1 ! N−1 ! N−1 ! 1 X 1 X 1 X = PiTi Pi Ti N − N N i=0 i=0 i=0 The extension of this to real-valued random vectors (X;~ Y~ ) is straighforward: D E Cov(X;~ Y~ ) σ2 (X~ < X~ >)(Y~ < Y~ >)T ≡ X~ Y~ ≡ − − This is a matrix, resulting from the product of a one vector and the transpose of another vector, where X~ T denotes the transpose of X~ .
    [Show full text]
  • 6 Probability Density Functions (Pdfs)
    CSC 411 / CSC D11 / CSC C11 Probability Density Functions (PDFs) 6 Probability Density Functions (PDFs) In many cases, we wish to handle data that can be represented as a real-valued random variable, T or a real-valued vector x =[x1,x2,...,xn] . Most of the intuitions from discrete variables transfer directly to the continuous case, although there are some subtleties. We describe the probabilities of a real-valued scalar variable x with a Probability Density Function (PDF), written p(x). Any real-valued function p(x) that satisfies: p(x) 0 for all x (1) ∞ ≥ p(x)dx = 1 (2) Z−∞ is a valid PDF. I will use the convention of upper-case P for discrete probabilities, and lower-case p for PDFs. With the PDF we can specify the probability that the random variable x falls within a given range: x1 P (x0 x x1)= p(x)dx (3) ≤ ≤ Zx0 This can be visualized by plotting the curve p(x). Then, to determine the probability that x falls within a range, we compute the area under the curve for that range. The PDF can be thought of as the infinite limit of a discrete distribution, i.e., a discrete dis- tribution with an infinite number of possible outcomes. Specifically, suppose we create a discrete distribution with N possible outcomes, each corresponding to a range on the real number line. Then, suppose we increase N towards infinity, so that each outcome shrinks to a single real num- ber; a PDF is defined as the limiting case of this discrete distribution.
    [Show full text]
  • A Call for Conducting Multivariate Mixed Analyses
    Journal of Educational Issues ISSN 2377-2263 2016, Vol. 2, No. 2 A Call for Conducting Multivariate Mixed Analyses Anthony J. Onwuegbuzie (Corresponding author) Department of Educational Leadership, Box 2119 Sam Houston State University, Huntsville, Texas 77341-2119, USA E-mail: [email protected] Received: April 13, 2016 Accepted: May 25, 2016 Published: June 9, 2016 doi:10.5296/jei.v2i2.9316 URL: http://dx.doi.org/10.5296/jei.v2i2.9316 Abstract Several authors have written methodological works that provide an introductory- and/or intermediate-level guide to conducting mixed analyses. Although these works have been useful for beginning and emergent mixed researchers, with very few exceptions, works are lacking that describe and illustrate advanced-level mixed analysis approaches. Thus, the purpose of this article is to introduce the concept of multivariate mixed analyses, which represents a complex form of advanced mixed analyses. These analyses characterize a class of mixed analyses wherein at least one of the quantitative analyses and at least one of the qualitative analyses both involve the simultaneous analysis of multiple variables. The notion of multivariate mixed analyses previously has not been discussed in the literature, illustrating the significance and innovation of the article. Keywords: Mixed methods data analysis, Mixed analysis, Mixed analyses, Advanced mixed analyses, Multivariate mixed analyses 1. Crossover Nature of Mixed Analyses At least 13 decision criteria are available to researchers during the data analysis stage of mixed research studies (Onwuegbuzie & Combs, 2010). Of these criteria, the criterion that is the most underdeveloped is the crossover nature of mixed analyses. Yet, this form of mixed analysis represents a pivotal decision because it determines the level of integration and complexity of quantitative and qualitative analyses in mixed research studies.
    [Show full text]
  • Covariance of Cross-Correlations: Towards Efficient Measures for Large-Scale Structure
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by RERO DOC Digital Library Mon. Not. R. Astron. Soc. 400, 851–865 (2009) doi:10.1111/j.1365-2966.2009.15490.x Covariance of cross-correlations: towards efficient measures for large-scale structure Robert E. Smith Institute for Theoretical Physics, University of Zurich, Zurich CH 8037, Switzerland Accepted 2009 August 4. Received 2009 July 17; in original form 2009 June 13 ABSTRACT We study the covariance of the cross-power spectrum of different tracers for the large-scale structure. We develop the counts-in-cells framework for the multitracer approach, and use this to derive expressions for the full non-Gaussian covariance matrix. We show that for the usual autopower statistic, besides the off-diagonal covariance generated through gravitational mode- coupling, the discreteness of the tracers and their associated sampling distribution can generate strong off-diagonal covariance, and that this becomes the dominant source of covariance as spatial frequencies become larger than the fundamental mode of the survey volume. On comparison with the derived expressions for the cross-power covariance, we show that the off-diagonal terms can be suppressed, if one cross-correlates a high tracer-density sample with a low one. Taking the effective estimator efficiency to be proportional to the signal-to-noise ratio (S/N), we show that, to probe clustering as a function of physical properties of the sample, i.e. cluster mass or galaxy luminosity, the cross-power approach can outperform the autopower one by factors of a few.
    [Show full text]
  • Lecture 4 Multivariate Normal Distribution and Multivariate CLT
    Lecture 4 Multivariate normal distribution and multivariate CLT. T We start with several simple observations. If X = (x1; : : : ; xk) is a k 1 random vector then its expectation is × T EX = (Ex1; : : : ; Exk) and its covariance matrix is Cov(X) = E(X EX)(X EX)T : − − Notice that a covariance matrix is always symmetric Cov(X)T = Cov(X) and nonnegative definite, i.e. for any k 1 vector a, × a T Cov(X)a = Ea T (X EX)(X EX)T a T = E a T (X EX) 2 0: − − j − j � We will often use that for any vector X its squared length can be written as X 2 = XT X: If we multiply a random k 1 vector X by a n k matrix A then the covariancej j of Y = AX is a n n matrix × × × Cov(Y ) = EA(X EX)(X EX)T AT = ACov(X)AT : − − T Multivariate normal distribution. Let us consider a k 1 vector g = (g1; : : : ; gk) of i.i.d. standard normal random variables. The covariance of g is,× obviously, a k k identity × matrix, Cov(g) = I: Given a n k matrix A, the covariance of Ag is a n n matrix × × � := Cov(Ag) = AIAT = AAT : Definition. The distribution of a vector Ag is called a (multivariate) normal distribution with covariance � and is denoted N(0; �): One can also shift this disrtibution, the distribution of Ag + a is called a normal distri­ bution with mean a and covariance � and is denoted N(a; �): There is one potential problem 23 with the above definition - we assume that the distribution depends only on covariance ma­ trix � and does not depend on the construction, i.e.
    [Show full text]
  • The Variance Ellipse
    The Variance Ellipse ✧ 1 / 28 The Variance Ellipse For bivariate data, like velocity, the variabililty can be spread out in not one but two dimensions. In this case, the variance is now a matrix, and the spread of the data is characterized by an ellipse. This variance ellipse eccentricity indicates the extent to which the variability is anisotropic or directional, and the orientation tells the direction in which the variability is concentrated. ✧ 2 / 28 Variance Ellipse Example Variance ellipses are a very useful way to analyze velocity data. This example compares velocities observed by a mooring array in Fram Strait with velocities in two numerical models. From Hattermann et al. (2016), “Eddy­driven recirculation of Atlantic Water in Fram Strait”, Geophysical Research Letters. Variance ellipses can be powerfully combined with lowpassing and bandpassing to reveal the geometric structure of variability in different frequency bands. ✧ 3 / 28 Understanding Ellipses This section will focus on understanding the properties of the variance ellipse. To do this, it is not really possible to avoid matrix algebra. Therefore we will first review some relevant mathematical background. ✧ 4 / 28 Review: Rotations T The most important action on a vector z ≡ [u v] is a ninety-degree rotation. This is carried out through the matrix multiplication 0 −1 0 −1 u −v z = = . [ 1 0 ] [ 1 0 ] [ v ] [ u ] Note the mathematically positive direction is counterclockwise. A general rotation is carried out by the rotation matrix cos θ − sin θ J(θ) ≡ [ sin θ cos θ ] cos θ − sin θ u u cos θ − v sin θ J(θ) z = = .
    [Show full text]
  • A Probabilistic Interpretation of Canonical Correlation Analysis
    A Probabilistic Interpretation of Canonical Correlation Analysis Francis R. Bach Michael I. Jordan Computer Science Division Computer Science Division University of California and Department of Statistics Berkeley, CA 94114, USA University of California [email protected] Berkeley, CA 94114, USA [email protected] April 21, 2005 Technical Report 688 Department of Statistics University of California, Berkeley Abstract We give a probabilistic interpretation of canonical correlation (CCA) analysis as a latent variable model for two Gaussian random vectors. Our interpretation is similar to the proba- bilistic interpretation of principal component analysis (Tipping and Bishop, 1999, Roweis, 1998). In addition, we cast Fisher linear discriminant analysis (LDA) within the CCA framework. 1 Introduction Data analysis tools such as principal component analysis (PCA), linear discriminant analysis (LDA) and canonical correlation analysis (CCA) are widely used for purposes such as dimensionality re- duction or visualization (Hotelling, 1936, Anderson, 1984, Hastie et al., 2001). In this paper, we provide a probabilistic interpretation of CCA and LDA. Such a probabilistic interpretation deepens the understanding of CCA and LDA as model-based methods, enables the use of local CCA mod- els as components of a larger probabilistic model, and suggests generalizations to members of the exponential family other than the Gaussian distribution. In Section 2, we review the probabilistic interpretation of PCA, while in Section 3, we present the probabilistic interpretation of CCA and LDA, with proofs presented in Section 4. In Section 5, we provide a CCA-based probabilistic interpretation of LDA. 1 z x Figure 1: Graphical model for factor analysis. 2 Review: probabilistic interpretation of PCA Tipping and Bishop (1999) have shown that PCA can be seen as the maximum likelihood solution of a factor analysis model with isotropic covariance matrix.
    [Show full text]
  • Canonical Correlation Analysis and Graphical Modeling for Huaman
    Canonical Autocorrelation Analysis and Graphical Modeling for Human Trafficking Characterization Qicong Chen Maria De Arteaga Carnegie Mellon University Carnegie Mellon University Pittsburgh, PA 15213 Pittsburgh, PA 15213 [email protected] [email protected] William Herlands Carnegie Mellon University Pittsburgh, PA 15213 [email protected] Abstract The present research characterizes online prostitution advertisements by human trafficking rings to extract and quantify patterns that describe their online oper- ations. We approach this descriptive analytics task from two perspectives. One, we develop an extension to Sparse Canonical Correlation Analysis that identifies autocorrelations within a single set of variables. This technique, which we call Canonical Autocorrelation Analysis, detects features of the human trafficking ad- vertisements which are highly correlated within a particular trafficking ring. Two, we use a variant of supervised latent Dirichlet allocation to learn topic models over the trafficking rings. The relationship of the topics over multiple rings character- izes general behaviors of human traffickers as well as behaviours of individual rings. 1 Introduction Currently, the United Nations Office on Drugs and Crime estimates there are 2.5 million victims of human trafficking in the world, with 79% of them suffering sexual exploitation. Increasingly, traffickers use the Internet to advertise their victims’ services and establish contact with clients. The present reserach chacterizes online advertisements by particular human traffickers (or trafficking rings) to develop a quantitative descriptions of their online operations. As such, prediction is not the main objective. This is a task of descriptive analytics, where the objective is to extract and quantify patterns that are difficult for humans to find.
    [Show full text]
  • Principal Components Analysis (Pca)
    PRINCIPAL COMPONENTS ANALYSIS (PCA) Steven M. Holand Department of Geology, University of Georgia, Athens, GA 30602-2501 3 December 2019 Introduction Suppose we had measured two variables, length and width, and plotted them as shown below. Both variables have approximately the same variance and they are highly correlated with one another. We could pass a vector through the long axis of the cloud of points and a second vec- tor at right angles to the first, with both vectors passing through the centroid of the data. Once we have made these vectors, we could find the coordinates of all of the data points rela- tive to these two perpendicular vectors and re-plot the data, as shown here (both of these figures are from Swan and Sandilands, 1995). In this new reference frame, note that variance is greater along axis 1 than it is on axis 2. Also note that the spatial relationships of the points are unchanged; this process has merely rotat- ed the data. Finally, note that our new vectors, or axes, are uncorrelated. By performing such a rotation, the new axes might have particular explanations. In this case, axis 1 could be regard- ed as a size measure, with samples on the left having both small length and width and samples on the right having large length and width. Axis 2 could be regarded as a measure of shape, with samples at any axis 1 position (that is, of a given size) having different length to width ratios. PC axes will generally not coincide exactly with any of the original variables.
    [Show full text]
  • 3 Random Vectors
    10 3 Random Vectors X1 . 3.1 Definition: A random vector is a vector of random variables X = . . Xn E[X1] . 3.2 Definition: The mean or expectation of X is defined as E[X]= . . E[Xn] 3.3 Definition: A random matrix is a matrix of random variables Z =(Zij). Its expectation is given by E[Z]=(E[Zij ]). 3.4 Theorem: A constant vector a (vector of constants) and a constant matrix A (matrix of constants) satisfy E[a]=a and E[A]=A. 3.5 Theorem: E[X + Y]=E[X]+E[Y]. 3.6 Theorem: E[AX]=AE[X] for a constant matrix A. 3.7 Theorem: E[AZB + C]=AE[Z]B + C if A, B, C are constant matrices. 3.8 Definition: If X is a random vector, the covariance matrix of X is defined as var(X )cov(X ,X ) cov(X ,X ) 1 1 2 ··· 1 n cov(X2,X1)var(X2) cov(X2,Xn) cov(X) [cov(X ,X )] ··· . i j . .. ≡ ≡ . cov(X ,X )cov(X ,X ) var(X ) n 1 n 2 ··· n An alternative form is X1 E[X1] − . cov(X)=E[(X E[X])(X E[X])!]=E . (X E[X ], ,X E[X ]) . − − . 1 − 1 ··· n − n Xn E[Xn] − 3.9 Example: If X1,...,Xn are independent, then the covariances are 0 and the covariance matrix is equal to 2 2 2 2 diag(σ1,...,σn) ,or σ In if the Xi have common variance σ . Properties of covariance matrices: 3.10 Theorem: Symmetry: cov(X)=[cov(X)]!.
    [Show full text]