Linear Regression in Matrix Form
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
A Generalized Linear Model for Principal Component Analysis of Binary Data
A Generalized Linear Model for Principal Component Analysis of Binary Data Andrew I. Schein Lawrence K. Saul Lyle H. Ungar Department of Computer and Information Science University of Pennsylvania Moore School Building 200 South 33rd Street Philadelphia, PA 19104-6389 {ais,lsaul,ungar}@cis.upenn.edu Abstract they are not generally appropriate for other data types. Recently, Collins et al.[5] derived generalized criteria We investigate a generalized linear model for for dimensionality reduction by appealing to proper- dimensionality reduction of binary data. The ties of distributions in the exponential family. In their model is related to principal component anal- framework, the conventional PCA of real-valued data ysis (PCA) in the same way that logistic re- emerges naturally from assuming a Gaussian distribu- gression is related to linear regression. Thus tion over a set of observations, while generalized ver- we refer to the model as logistic PCA. In this sions of PCA for binary and nonnegative data emerge paper, we derive an alternating least squares respectively by substituting the Bernoulli and Pois- method to estimate the basis vectors and gen- son distributions for the Gaussian. For binary data, eralized linear coefficients of the logistic PCA the generalized model's relationship to PCA is anal- model. The resulting updates have a simple ogous to the relationship between logistic and linear closed form and are guaranteed at each iter- regression[12]. In particular, the model exploits the ation to improve the model's likelihood. We log-odds as the natural parameter of the Bernoulli dis- evaluate the performance of logistic PCA|as tribution and the logistic function as its canonical link. -
Lecture 13: Simple Linear Regression in Matrix Format
11:55 Wednesday 14th October, 2015 See updates and corrections at http://www.stat.cmu.edu/~cshalizi/mreg/ Lecture 13: Simple Linear Regression in Matrix Format 36-401, Section B, Fall 2015 13 October 2015 Contents 1 Least Squares in Matrix Form 2 1.1 The Basic Matrices . .2 1.2 Mean Squared Error . .3 1.3 Minimizing the MSE . .4 2 Fitted Values and Residuals 5 2.1 Residuals . .7 2.2 Expectations and Covariances . .7 3 Sampling Distribution of Estimators 8 4 Derivatives with Respect to Vectors 9 4.1 Second Derivatives . 11 4.2 Maxima and Minima . 11 5 Expectations and Variances with Vectors and Matrices 12 6 Further Reading 13 1 2 So far, we have not used any notions, or notation, that goes beyond basic algebra and calculus (and probability). This has forced us to do a fair amount of book-keeping, as it were by hand. This is just about tolerable for the simple linear model, with one predictor variable. It will get intolerable if we have multiple predictor variables. Fortunately, a little application of linear algebra will let us abstract away from a lot of the book-keeping details, and make multiple linear regression hardly more complicated than the simple version1. These notes will not remind you of how matrix algebra works. However, they will review some results about calculus with matrices, and about expectations and variances with vectors and matrices. Throughout, bold-faced letters will denote matrices, as a as opposed to a scalar a. 1 Least Squares in Matrix Form Our data consists of n paired observations of the predictor variable X and the response variable Y , i.e., (x1; y1);::: (xn; yn). -
LECTURES 2 - 3 : Stochastic Processes, Autocorrelation Function
LECTURES 2 - 3 : Stochastic Processes, Autocorrelation function. Stationarity. Important points of Lecture 1: A time series fXtg is a series of observations taken sequentially over time: xt is an observation recorded at a specific time t. Characteristics of times series data: observations are dependent, become available at equally spaced time points and are time-ordered. This is a discrete time series. The purposes of time series analysis are to model and to predict or forecast future values of a series based on the history of that series. 2.2 Some descriptive techniques. (Based on [BD] x1.3 and x1.4) ......................................................................................... Take a step backwards: how do we describe a r.v. or a random vector? ² for a r.v. X: 2 d.f. FX (x) := P (X · x), mean ¹ = EX and variance σ = V ar(X). ² for a r.vector (X1;X2): joint d.f. FX1;X2 (x1; x2) := P (X1 · x1;X2 · x2), marginal d.f.FX1 (x1) := P (X1 · x1) ´ FX1;X2 (x1; 1) 2 2 mean vector (¹1; ¹2) = (EX1; EX2), variances σ1 = V ar(X1); σ2 = V ar(X2), and covariance Cov(X1;X2) = E(X1 ¡ ¹1)(X2 ¡ ¹2) ´ E(X1X2) ¡ ¹1¹2. Often we use correlation = normalized covariance: Cor(X1;X2) = Cov(X1;X2)=fσ1σ2g ......................................................................................... To describe a process X1;X2;::: we define (i) Def. Distribution function: (fi-di) d.f. Ft1:::tn (x1; : : : ; xn) = P (Xt1 · x1;:::;Xtn · xn); i.e. this is the joint d.f. for the vector (Xt1 ;:::;Xtn ). (ii) First- and Second-order moments. ² Mean: ¹X (t) = EXt 2 2 2 2 ² Variance: σX (t) = E(Xt ¡ ¹X (t)) ´ EXt ¡ ¹X (t) 1 ² Autocovariance function: γX (t; s) = Cov(Xt;Xs) = E[(Xt ¡ ¹X (t))(Xs ¡ ¹X (s))] ´ E(XtXs) ¡ ¹X (t)¹X (s) (Note: this is an infinite matrix). -
Introducing the Game Design Matrix: a Step-By-Step Process for Creating Serious Games
Air Force Institute of Technology AFIT Scholar Theses and Dissertations Student Graduate Works 3-2020 Introducing the Game Design Matrix: A Step-by-Step Process for Creating Serious Games Aaron J. Pendleton Follow this and additional works at: https://scholar.afit.edu/etd Part of the Educational Assessment, Evaluation, and Research Commons, Game Design Commons, and the Instructional Media Design Commons Recommended Citation Pendleton, Aaron J., "Introducing the Game Design Matrix: A Step-by-Step Process for Creating Serious Games" (2020). Theses and Dissertations. 4347. https://scholar.afit.edu/etd/4347 This Thesis is brought to you for free and open access by the Student Graduate Works at AFIT Scholar. It has been accepted for inclusion in Theses and Dissertations by an authorized administrator of AFIT Scholar. For more information, please contact [email protected]. INTRODUCING THE GAME DESIGN MATRIX: A STEP-BY-STEP PROCESS FOR CREATING SERIOUS GAMES THESIS Aaron J. Pendleton, Captain, USAF AFIT-ENG-MS-20-M-054 DEPARTMENT OF THE AIR FORCE AIR UNIVERSITY AIR FORCE INSTITUTE OF TECHNOLOGY Wright-Patterson Air Force Base, Ohio DISTRIBUTION STATEMENT A APPROVED FOR PUBLIC RELEASE; DISTRIBUTION UNLIMITED. The views expressed in this document are those of the author and do not reflect the official policy or position of the United States Air Force, the United States Department of Defense or the United States Government. This material is declared a work of the U.S. Government and is not subject to copyright protection in the United States. AFIT-ENG-MS-20-M-054 INTRODUCING THE GAME DESIGN MATRIX: A STEP-BY-STEP PROCESS FOR CREATING LEARNING OBJECTIVE BASED SERIOUS GAMES THESIS Presented to the Faculty Department of Electrical and Computer Engineering Graduate School of Engineering and Management Air Force Institute of Technology Air University Air Education and Training Command in Partial Fulfillment of the Requirements for the Degree of Master of Science in Cyberspace Operations Aaron J. -
Stat 5102 Notes: Regression
Stat 5102 Notes: Regression Charles J. Geyer April 27, 2007 In these notes we do not use the “upper case letter means random, lower case letter means nonrandom” convention. Lower case normal weight letters (like x and β) indicate scalars (real variables). Lowercase bold weight letters (like x and β) indicate vectors. Upper case bold weight letters (like X) indicate matrices. 1 The Model The general linear model has the form p X yi = βjxij + ei (1.1) j=1 where i indexes individuals and j indexes different predictor variables. Ex- plicit use of (1.1) makes theory impossibly messy. We rewrite it as a vector equation y = Xβ + e, (1.2) where y is a vector whose components are yi, where X is a matrix whose components are xij, where β is a vector whose components are βj, and where e is a vector whose components are ei. Note that y and e have dimension n, but β has dimension p. The matrix X is called the design matrix or model matrix and has dimension n × p. As always in regression theory, we treat the predictor variables as non- random. So X is a nonrandom matrix, β is a nonrandom vector of unknown parameters. The only random quantities in (1.2) are e and y. As always in regression theory the errors ei are independent and identi- cally distributed mean zero normal. This is written as a vector equation e ∼ Normal(0, σ2I), where σ2 is another unknown parameter (the error variance) and I is the identity matrix. This implies y ∼ Normal(µ, σ2I), 1 where µ = Xβ. -
Spatial Autocorrelation: Covariance and Semivariance Semivariance
Spatial Autocorrelation: Covariance and Semivariancence Lily Housese P eters GEOG 593 November 10, 2009 Quantitative Terrain Descriptorsrs Covariance and Semivariogram areare numericc methods used to describe the character of the terrain (ex. Terrain roughness, irregularity) Terrain character has important implications for: 1. the type of sampling strategy chosen 2. estimating DTM accuracy (after sampling and reconstruction) Spatial Autocorrelationon The First Law of Geography ““ Everything is related to everything else, but near things are moo re related than distant things.” (Waldo Tobler) The value of a variable at one point in space is related to the value of that same variable in a nearby location Ex. Moranan ’s I, Gearyary ’s C, LISA Positive Spatial Autocorrelation (Neighbors are similar) Negative Spatial Autocorrelation (Neighbors are dissimilar) R(d) = correlation coefficient of all the points with horizontal interval (d) Covariance The degree of similarity between pairs of surface points The value of similarity is an indicator of the complexity of the terrain surface Smaller the similarity = more complex the terrain surface V = Variance calculated from all N points Cov (d) = Covariance of all points with horizontal interval d Z i = Height of point i M = average height of all points Z i+d = elevation of the point with an interval of d from i Semivariancee Expresses the degree of relationship between points on a surface Equal to half the variance of the differences between all possible points spaced a constant distance apart -
Uncertainty of the Design and Covariance Matrices in Linear Statistical Model*
Acta Univ. Palacki. Olomuc., Fac. rer. nat., Mathematica 48 (2009) 61–71 Uncertainty of the design and covariance matrices in linear statistical model* Lubomír KUBÁČEK 1, Jaroslav MAREK 2 Department of Mathematical Analysis and Applications of Mathematics, Faculty of Science, Palacký University, tř. 17. listopadu 12, 771 46 Olomouc, Czech Republic 1e-mail: [email protected] 2e-mail: [email protected] (Received January 15, 2009) Abstract The aim of the paper is to determine an influence of uncertainties in design and covariance matrices on estimators in linear regression model. Key words: Linear statistical model, uncertainty, design matrix, covariance matrix. 2000 Mathematics Subject Classification: 62J05 1 Introduction Uncertainties in entries of design and covariance matrices influence the variance of estimators and cause their bias. A problem occurs mainly in a linearization of nonlinear regression models, where the design matrix is created by deriva- tives of some functions. The question is how precise must these derivatives be. Uncertainties of covariance matrices must be suppressed under some reasonable bound as well. The aim of the paper is to give the simple rules which enables us to decide how many ciphers an entry of the mentioned matrices must be consisted of. *Supported by the Council of Czech Government MSM 6 198 959 214. 61 62 Lubomír KUBÁČEK, Jaroslav MAREK 2 Symbols used In the following text a linear regression model (in more detail cf. [2]) is denoted as k Y ∼n (Fβ, Σ), β ∈ R , (1) where Y is an n-dimensional random vector with the mean value E(Y) equal to Fβ and with the covariance matrix Var(Y)=Σ.ThesymbolRk means the k-dimensional linear vector space. -
Linear, Ridge Regression, and Principal Component Analysis
Linear, Ridge Regression, and Principal Component Analysis Linear, Ridge Regression, and Principal Component Analysis Jia Li Department of Statistics The Pennsylvania State University Email: [email protected] http://www.stat.psu.edu/∼jiali Jia Li http://www.stat.psu.edu/∼jiali Linear, Ridge Regression, and Principal Component Analysis Introduction to Regression I Input vector: X = (X1, X2, ..., Xp). I Output Y is real-valued. I Predict Y from X by f (X ) so that the expected loss function E(L(Y , f (X ))) is minimized. I Square loss: L(Y , f (X )) = (Y − f (X ))2 . I The optimal predictor ∗ 2 f (X ) = argminf (X )E(Y − f (X )) = E(Y | X ) . I The function E(Y | X ) is the regression function. Jia Li http://www.stat.psu.edu/∼jiali Linear, Ridge Regression, and Principal Component Analysis Example The number of active physicians in a Standard Metropolitan Statistical Area (SMSA), denoted by Y , is expected to be related to total population (X1, measured in thousands), land area (X2, measured in square miles), and total personal income (X3, measured in millions of dollars). Data are collected for 141 SMSAs, as shown in the following table. i : 1 2 3 ... 139 140 141 X1 9387 7031 7017 ... 233 232 231 X2 1348 4069 3719 ... 1011 813 654 X3 72100 52737 54542 ... 1337 1589 1148 Y 25627 15389 13326 ... 264 371 140 Goal: Predict Y from X1, X2, and X3. Jia Li http://www.stat.psu.edu/∼jiali Linear, Ridge Regression, and Principal Component Analysis Linear Methods I The linear regression model p X f (X ) = β0 + Xj βj . -
The Concept of a Generalized Inverse for Matrices Was Introduced by Moore(1920)
J. Japanese Soc. Comp. Statist. 2(1989), 1-7 THE MOORE-PENROSE INVERSE MATRIX POR THE BALANCED ANOVA MODELS Byung Chun Kim* and Ha Sik Sunwoo* ABSTRACT Since the design matrix of the balanced linear model with no interactions has special form, the general solution of the normal equations can be easily found. From the relationships between the minimum norm least squares solution and the Moore-Penrose inverse we can obtain the explicit form of the Moore-Penrose inverse X+ of the design matrix of the model y = XƒÀ + ƒÃ for the balanced model with no interaction. 1. Introduction The concept of a generalized inverse for matrices was introduced by Moore(1920). He developed it in the context of linear transformations from n-dimensional to m-dimensional vector space over a complex field with usual Euclidean norm. Unaware of Moore's work, Penrose(1955) established the existence and the uniqueness of the generalized inverse that satisfies the Moore's condition under L2 norm. It is commonly known as the Moore-Penrose inverse. In the early 1970's many statisticians, Albert (1972), Ben-Israel and Greville(1974), Graybill(1976),Lawson and Hanson(1974), Pringle and Raynor(1971), Rao and Mitra(1971), Rao(1975), Schmidt(1976), and Searle(1971, 1982), worked on this subject and yielded sev- eral texts that treat the mathematics and its application to statistics in some depth in detail. Especially, Shinnozaki et. al.(1972a, 1972b) have surveyed the general methods to find the Moore-Penrose inverse in two subject: direct methods and iterative methods. Also they tested their accuracy for some cases. -
Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis
Fast Estimation of the Median Covariation Matrix with Application to Online Robust Principal Components Analysis Hervé Cardot, Antoine Godichon-Baggioni Institut de Mathématiques de Bourgogne, Université de Bourgogne Franche-Comté, 9, rue Alain Savary, 21078 Dijon, France July 12, 2016 Abstract The geometric median covariation matrix is a robust multivariate indicator of dis- persion which can be extended without any difficulty to functional data. We define estimators, based on recursive algorithms, that can be simply updated at each new observation and are able to deal rapidly with large samples of high dimensional data without being obliged to store all the data in memory. Asymptotic convergence prop- erties of the recursive algorithms are studied under weak conditions. The computation of the principal components can also be performed online and this approach can be useful for online outlier detection. A simulation study clearly shows that this robust indicator is a competitive alternative to minimum covariance determinant when the dimension of the data is small and robust principal components analysis based on projection pursuit and spherical projections for high dimension data. An illustration on a large sample and high dimensional dataset consisting of individual TV audiences measured at a minute scale over a period of 24 hours confirms the interest of consider- ing the robust principal components analysis based on the median covariation matrix. All studied algorithms are available in the R package Gmedian on CRAN. Keywords. Averaging, Functional data, Geometric median, Online algorithms, Online principal components, Recursive robust estimation, Stochastic gradient, Weiszfeld’s algo- arXiv:1504.02852v5 [math.ST] 9 Jul 2016 rithm. -
Characteristics and Statistics of Digital Remote Sensing Imagery (1)
Characteristics and statistics of digital remote sensing imagery (1) Digital Images: 1 Digital Image • With raster data structure, each image is treated as an array of values of the pixels. • Image data is organized as rows and columns (or lines and pixels) start from the upper left corner of the image. • Each pixel (picture element) is treated as a separate unite. Statistics of Digital Images Help: • Look at the frequency of occurrence of individual brightness values in the image displayed • View individual pixel brightness values at specific locations or within a geographic area; • Compute univariate descriptive statistics to determine if there are unusual anomalies in the image data; and • Compute multivariate statistics to determine the amount of between-band correlation (e.g., to identify redundancy). 2 Statistics of Digital Images It is necessary to calculate fundamental univariate and multivariate statistics of the multispectral remote sensor data. This involves identification and calculation of – maximum and minimum value –the range, mean, standard deviation – between-band variance-covariance matrix – correlation matrix, and – frequencies of brightness values The results of the above can be used to produce histograms. Such statistics provide information necessary for processing and analyzing remote sensing data. A “population” is an infinite or finite set of elements. A “sample” is a subset of the elements taken from a population used to make inferences about certain characteristics of the population. (e.g., training signatures) 3 Large samples drawn randomly from natural populations usually produce a symmetrical frequency distribution. Most values are clustered around the central value, and the frequency of occurrence declines away from this central point. -
Simple Linear Regression with Least Square Estimation: an Overview
Aditya N More et al, / (IJCSIT) International Journal of Computer Science and Information Technologies, Vol. 7 (6) , 2016, 2394-2396 Simple Linear Regression with Least Square Estimation: An Overview Aditya N More#1, Puneet S Kohli*2, Kshitija H Kulkarni#3 #1-2Information Technology Department,#3 Electronics and Communication Department College of Engineering Pune Shivajinagar, Pune – 411005, Maharashtra, India Abstract— Linear Regression involves modelling a relationship amongst dependent and independent variables in the form of a (2.1) linear equation. Least Square Estimation is a method to determine the constants in a Linear model in the most accurate way without much complexity of solving. Metrics where such as Coefficient of Determination and Mean Square Error is the ith value of the sample data point determine how good the estimation is. Statistical Packages is the ith value of y on the predicted regression such as R and Microsoft Excel have built in tools to perform Least Square Estimation over a given data set. line The above equation can be geometrically depicted by Keywords— Linear Regression, Machine Learning, Least Squares Estimation, R programming figure 2.1. If we draw a square at each point whose length is equal to the absolute difference between the sample data point and the predicted value as shown, each of the square would then represent the residual error in placing the I. INTRODUCTION regression line. The aim of the least square method would Linear Regression involves establishing linear be to place the regression line so as to minimize the sum of relationships between dependent and independent variables.