Matlab Tutorial 6

Total Page:16

File Type:pdf, Size:1020Kb

Matlab Tutorial 6 Introduction to Matlab MATLAB Tutorial Amir massoud Farahmand http://www.cs.ualberta.ca/~amir Ela Pek! alska, Marjolein van der Glas [CMPUTPattern Recognit651] PrionobabilisticGroup, Faculty Graof Appliephicald Scienc Modelses Russ DelftGreinerUniversit andy of T Mattechnology Brown VersionJan 0.6:uary September2002 24, 2008 Send comments to [email protected] The MATLAB logo is a trademark of MathWorks, Inc. What is MATLAB? • A scripting/programming language for numerical calculations • Pros • Fast prototyping for your [numerical] algorithms • Drawing graphs easily • Fast matrix computation • Cons • Slow for many other things (e.g. for loops are terribly slow) • Not a general-purpose programming language • Not so cheap How to Run MATLAB @ U of A? login to bonanza, pipestone, eureka, etc. or matlab -nodesktop You need to type here! I Will Talk About ... • vectors and matrices in MATLAB • several useful predefined functions • graphics • writing your own functions • tricks for writing efficient codes • ... Vectors >> a = 2 a = 2 >> b = [1 2 3] b = 1 2 3 >> c = [-1 1.1 2]' c = -1.0000 1.1000 2.0000 Vectors >> a = 2 >> a*b a = ans = 2 2 4 6 >> b = [1 2 3] >> c*b b = ans = 1 2 3 -1.0000 -2.0000 -3.0000 1.1000 2.2000 3.3000 >> c = [-1 1.1 2]' 2.0000 4.0000 6.0000 c = >> b*c -1.0000 ans = 1.1000 2.0000 7.2000 Vectors >> sin(b) >> a = 2 >> a*b ans = a = ans = 0.8415 0.9093 0.1411 2 2 4 6 >> exp(c) >> b = [1 2 3] >> c*b ans = b = ans = 0.3679 3.0042 1 2 3 -1.0000 -2.0000 -3.0000 7.3891 1.1000 2.2000 3.3000 >> c = [-1 1.1 2]' 2.0000 4.0000 6.0000 >> b + c ??? Error using ==> plus c = >> b*c Matrix dimensions must agree. -1.0000 ans = >> b + c' 1.1000 2.0000 7.2000 ans = 0 3.1000 5.0000 Another way to generate vectors >> x = -2*pi:0.01:2*pi; >> y = sin(x); >> y2 = sin(x) + cos(2*x) + 0.1*sin(10*x); >> plot(x,y,'b') >> hold Current plot held >> plot(x,y2,'k') >> x = -2*pi:0.01:2*pi; >> y = sin(x); >> y2 = sin(x) + cos(2*x) + 0.1*sin(10*x); >> plot(x,y,'b') >> hold Current plot held >> plot(x,y2,'k') xlabel('Time'); ylabel('Amplitude'); title('My sinusoid wave') >> A = [1 2 3;4 5 6] Matrices A = 1 2 3 4 5 6 >> B = ones(3,5) B = 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 >> C = zeros(3) C = 0 0 0 0 0 0 0 0 0 >> A = [1 2 3;4 5 6] Matrices A = 1 2 3 4 5 6 >> B = ones(3,5) size(D) >> D = A*B B = ans = D = 1 1 1 1 1 1 1 1 1 1 2 5 6 6 6 6 6 1 1 1 1 1 15 15 15 15 15 >> C = zeros(3) C = 0 0 0 0 0 0 0 0 0 Accessing Elements of Vectors and Matrices a = [1 2 3 4 5 6 7 8 9 10] A = [1 2 3 4;5 6 7 8; 9 10 11 12;13 14 15 16] a = A = 1 2 3 4 5 6 7 8 9 10 1 2 3 4 5 6 7 8 a(3) 9 10 11 12 13 14 15 16 ans = >> B = A(2:3,2:4) 3 B = 6 7 8 10 11 12 Random Number Generators rand(5,1) Uniform ans = 0.0193 0.7683 0.4579 0.0436 0.0996 >> randn(5,1) Normal (Gaussian) ans = -0.6223 -0.8081 -0.8724 0.1395 0.6093 Random Numbers and Histograms X1 = randn(2000,1); >> X2 = rand(2000,1); >> hist(X1,20) sums, means, var X = randn(1000,1); >> Y = rand(1000,1); >> sum(X), mean(X), var(X) min(X), max(X) ans = -14.6247 ans = ans = -3.5851 -0.0146 ans = ans = 3.7476 1.0325 >> min(Y), max(Y) >> sum(Y), mean(Y), var(Y) ans = ans = 0.0010 491.2878 ans = ans = 0.9973 0.4913 ans = 0.0863 Matrices Again! A = [1 2 3; 4 5 6; 7 8 9] A*B A = ans = 1 2 3 3 2 4 4 5 6 9 2 10 7 8 9 15 2 16 >> B = [1 -1 1;1 0 0;0 1 1] B = >> A.*B ans = 1 -1 1 1 0 0 1 -2 3 0 1 1 4 0 0 >> rank(A) 0 8 9 ans = B./A 2 ans = >> eig(A) ans = 1.0000 -0.5000 0.3333 0.2500 0 0 16.1168 0 0.1250 0.1111 -1.1168 -0.0000 How to Write a Program? • Script • Anything that you do in command line except in a single file • Functions • Get inputs, return outputs MATLAB for Machine Learning: Regression Assume that y = f(x) + ! where E ! = 0. The goal is estimating the { } regressor, f(.), using samples (Xi, Yi) , i = 1, , n. There are books written on thi{ s topic,}but for·n· o· w, we consider a simple (but efficient) method called Kernel regression estimator. Let h > 0 be a positive number called the bandwidth. The Nadaraya-Watson kernel estimator is defined by n 1 x xi fˆ(x) = K − Yi n x xj h K − i=1 j=1 h $ % & ! " # Kernels can have different forms. An example of them is Gaussian kernel: 1 K(x) = exp( x2/2). √2π − See L. Wasserman, All of Nonparametric Statistics (Section 5.4) for more information. Let’s implement it! What Do We Need? • Some function f(x) • Noisy samples from f(x) • Kernel regressor • kernel function (e.g. Gaussian) A Script File as the Skeleton of the Program edit KernelRegressionTest First Attempt KernelRegressionTest.m KernelRegressor.m % Number of samples function yhat = KernelRegressor(xQuery,XTrain,YTrain,h) n = 1000; % Noise variance sigma = 1; SizeOfTrainingSet = size(XTrain,2); % Bandwidth yhat = 0; h = 0.05; Normalizer = 0; for m=1:SizeOfTrainingSet Let h > 0 be a positive number called the bandwid Kxi = exp( -(xQueryth. -The XTrain(:,m))^2/hNad );araya-Watson xSamples = (rand(1,n)-0.5)*8; yhat = yhat + Kxi*YTrain(m); xGridkernel = linspace(-4,4,2000)estimator; is defined by Normalizer = Normalizer + Kxi; end ySamples = 2*sin(2*xSamples) + 0.5*cos(5*xSamples) + sigma^2 * randn(1,n); yTrue = 2*sin(2*xGrid) + 0.5*cos(5*xGrid); n yhat = yhat/Normalizer; 1 x x plot(xSamples,ySamples,'.') ˆ i hold on; f(x) = K − Yi n x x plot(xGrid,yTrue,'r','Linewidth',3); − j h j=1 K h i=1 % & $ % Evaluating the regressor on a set of test points " # ! for m=1:length(xGrid) yTest(m) = KernelRegressor(xGrid(:,m),xSamples,ySamples,h); end plot(xGrid,yTest,'g','Linewidth',2); First Attempt tic; KernelRegressionTest; toc @ Command line Elapsed time is 31.262343 seconds. profile on KernelRegressionTest profile report Source of slowness: for loop. Vectorization Avoid loops; vectorize! FOR loop solution: tic; i = 1; for x=0:0.001:8*pi; y(i) = sin(x); i = i+1; end; toc Elapsed time is 9.641997 seconds. Sin #2: Incremental growing of vectors Sin #1: for loop Vectorized solution: >> tic; x = 0:0.001:8*pi; y = sin(x); toc A = [1 2 3;4 5 6] Elapsed time is 0.013867 seconds. A = 1 2 3 4 5 6 >> repmat(A,3,2) Related trick: ans = 1 2 3 1 2 3 4 5 6 4 5 6 1 2 3 1 2 3 4 5 6 4 5 6 1 2 3 1 2 3 4 5 6 4 5 6 Vectorization What can be done for the previous code? for m=1:SizeOfTrainingSet Kxi = exp( -((xQuery - XTrain(:,m))^2)/h ); yhat = yhat + Kxi*YTrain(m); Normalizer = Normalizer + Kxi; end scalar XTrain is originally a vector, but we de-vecotorize it to a scalar here. Let’s vectorize xQuery. xQueryRepeated = repmat(xQuery,1,SizeOfTrainingSet); and do all calculations at once. KX = exp( -((xQueryRepeated - XTrain).^2) / h); yhat = sum(KX.*YTrain)/sum(KX); Second Attempt function yhat = KernelRegressor(xQuery,XTrain,YTrain,h) tic; KernelRegressionTest; toc Elapsed time is 0.514618 seconds. SizeOfTrainingSet = size(XTrain,2); % First solution (slow) if 1>2 yhat = 0; Normalizer = 0; for m=1:SizeOfTrainingSet Kxi = exp( -((xQuery - XTrain(:,m))^2)/h ); yhat = yhat + Kxi*YTrain(m); Normalizer = Normalizer + Kxi; end % Second solution (faster) else xQueryRepeated = repmat(xQuery,1,SizeOfTrainingSet); KX = exp( -((xQueryRepeated - XTrain).^2) / h); yhat = sum(KX.*YTrain)/sum(KX); end Remarks on Kernel Regressor • Easily extendable to multi-dimensions • Selecting the bandwidth is important • Model selection • Theoretical results Other Useful Functions, Commands, ... • inv(A), pinv(A), det(A), cond(A), svd(A) • & (logical AND), | (logical OR), ~ (logical NOT) • clear, whos, load, save • help • surf, mesh, plot3, comet, ... • Lots of other functions (ODE, optimization, control toolbox, etc.) Resources • MATLAB’s help (command line’s help or the manual) • E. Pekalska and M. van der Glas, Introduction to MATLAB, 2002. (http://www.cs.ualberta.ca/ ~dale/cmput466/w06/matlab_manual.pdf) • MATLAB 7.6 demos (http:// www.mathworks.com/products/matlab/ demos.html) • Many others.
Recommended publications
  • Least Squares Regression Principal Component Analysis a Supervised Dimensionality Reduction Method for Machine Learning in Scientific Applications
    Universitat Politecnica` de Catalunya Bachelor's Degree in Engineering Physics Bachelor's Thesis Least Squares Regression Principal Component Analysis A supervised dimensionality reduction method for machine learning in scientific applications Supervisor: Author: Dr. Xin Yee H´ector Pascual Herrero Dr. Joan Torras June 29, 2020 Abstract Dimension reduction is an important technique in surrogate modeling and machine learning. In this thesis, we present three existing dimension reduction methods in de- tail and then we propose a novel supervised dimension reduction method, `Least Squares Regression Principal Component Analysis" (LSR-PCA), applicable to both classifica- tion and regression dimension reduction tasks. To show the efficacy of this method, we present different examples in visualization, classification and regression problems, com- paring it to state-of-the-art dimension reduction methods. Furthermore, we present the kernel version of LSR-PCA for problems where the input are correlated non-linearly. The examples demonstrated that LSR-PCA can be a competitive dimension reduction method. 1 Acknowledgements I would like to express my gratitude to my thesis supervisor, Professor Xin Yee. I would like to thank her for giving me this wonderful opportunity and for her guidance and support during all the passing of this semester, putting herself at my disposal during the difficult times of the COVID-19 situation. Without her, the making of this thesis would not have been possible. I would like to extend my thanks to Mr. Pere Balsells, for allowing students like me to conduct their thesis abroad, as well as to the Balsells Foundation for its help and support throughout the whole stay.
    [Show full text]
  • 2.2 Kernel and Range of a Linear Transformation
    2.2 Kernel and Range of a Linear Transformation Performance Criteria: 2. (c) Determine whether a given vector is in the kernel or range of a linear trans- formation. Describe the kernel and range of a linear transformation. (d) Determine whether a transformation is one-to-one; determine whether a transformation is onto. When working with transformations T : Rm → Rn in Math 341, you found that any linear transformation can be represented by multiplication by a matrix. At some point after that you were introduced to the concepts of the null space and column space of a matrix. In this section we present the analogous ideas for general vector spaces. Definition 2.4: Let V and W be vector spaces, and let T : V → W be a transformation. We will call V the domain of T , and W is the codomain of T . Definition 2.5: Let V and W be vector spaces, and let T : V → W be a linear transformation. • The set of all vectors v ∈ V for which T v = 0 is a subspace of V . It is called the kernel of T , And we will denote it by ker(T ). • The set of all vectors w ∈ W such that w = T v for some v ∈ V is called the range of T . It is a subspace of W , and is denoted ran(T ). It is worth making a few comments about the above: • The kernel and range “belong to” the transformation, not the vector spaces V and W . If we had another linear transformation S : V → W , it would most likely have a different kernel and range.
    [Show full text]
  • 23. Kernel, Rank, Range
    23. Kernel, Rank, Range We now study linear transformations in more detail. First, we establish some important vocabulary. The range of a linear transformation f : V ! W is the set of vectors the linear transformation maps to. This set is also often called the image of f, written ran(f) = Im(f) = L(V ) = fL(v)jv 2 V g ⊂ W: The domain of a linear transformation is often called the pre-image of f. We can also talk about the pre-image of any subset of vectors U 2 W : L−1(U) = fv 2 V jL(v) 2 Ug ⊂ V: A linear transformation f is one-to-one if for any x 6= y 2 V , f(x) 6= f(y). In other words, different vector in V always map to different vectors in W . One-to-one transformations are also known as injective transformations. Notice that injectivity is a condition on the pre-image of f. A linear transformation f is onto if for every w 2 W , there exists an x 2 V such that f(x) = w. In other words, every vector in W is the image of some vector in V . An onto transformation is also known as an surjective transformation. Notice that surjectivity is a condition on the image of f. 1 Suppose L : V ! W is not injective. Then we can find v1 6= v2 such that Lv1 = Lv2. Then v1 − v2 6= 0, but L(v1 − v2) = 0: Definition Let L : V ! W be a linear transformation. The set of all vectors v such that Lv = 0W is called the kernel of L: ker L = fv 2 V jLv = 0g: 1 The notions of one-to-one and onto can be generalized to arbitrary functions on sets.
    [Show full text]
  • On the Range-Kernel Orthogonality of Elementary Operators
    140 (2015) MATHEMATICA BOHEMICA No. 3, 261–269 ON THE RANGE-KERNEL ORTHOGONALITY OF ELEMENTARY OPERATORS Said Bouali, Kénitra, Youssef Bouhafsi, Rabat (Received January 16, 2013) Abstract. Let L(H) denote the algebra of operators on a complex infinite dimensional Hilbert space H. For A, B ∈ L(H), the generalized derivation δA,B and the elementary operator ∆A,B are defined by δA,B(X) = AX − XB and ∆A,B(X) = AXB − X for all X ∈ L(H). In this paper, we exhibit pairs (A, B) of operators such that the range-kernel orthogonality of δA,B holds for the usual operator norm. We generalize some recent results. We also establish some theorems on the orthogonality of the range and the kernel of ∆A,B with respect to the wider class of unitarily invariant norms on L(H). Keywords: derivation; elementary operator; orthogonality; unitarily invariant norm; cyclic subnormal operator; Fuglede-Putnam property MSC 2010 : 47A30, 47A63, 47B15, 47B20, 47B47, 47B10 1. Introduction Let H be a complex infinite dimensional Hilbert space, and let L(H) denote the algebra of all bounded linear operators acting on H into itself. Given A, B ∈ L(H), we define the generalized derivation δA,B : L(H) → L(H) by δA,B(X)= AX − XB, and the elementary operator ∆A,B : L(H) → L(H) by ∆A,B(X)= AXB − X. Let δA,A = δA and ∆A,A = ∆A. In [1], Anderson shows that if A is normal and commutes with T , then for all X ∈ L(H) (1.1) kδA(X)+ T k > kT k, where k·k is the usual operator norm.
    [Show full text]
  • Nonparametric Regression
    Nonparametric Regression Statistical Machine Learning, Spring 2015 Ryan Tibshirani (with Larry Wasserman) 1 Introduction, and k-nearest-neighbors 1.1 Basic setup, random inputs Given a random pair (X; Y ) Rd R, recall that the function • 2 × f0(x) = E(Y X = x) j is called the regression function (of Y on X). The basic goal in nonparametric regression is ^ d to construct an estimate f of f0, from i.i.d. samples (x1; y1);::: (xn; yn) R R that have the same joint distribution as (X; Y ). We often call X the input, predictor,2 feature,× etc., and Y the output, outcome, response, etc. Importantly, in nonparametric regression we do not assume a certain parametric form for f0 Note for i.i.d. samples (x ; y );::: (x ; y ), we can always write • 1 1 n n yi = f0(xi) + i; i = 1; : : : n; where 1; : : : n are i.i.d. random errors, with mean zero. Therefore we can think about the sampling distribution as follows: (x1; 1);::: (xn; n) are i.i.d. draws from some common joint distribution, where E(i) = 0, and then y1; : : : yn are generated from the above model In addition, we will assume that each i is independent of xi. As discussed before, this is • actually quite a strong assumption, and you should think about it skeptically. We make this assumption for the sake of simplicity, and it should be noted that a good portion of theory that we cover (or at least, similar theory) also holds without the assumption of independence between the errors and the inputs 1.2 Basic setup, fixed inputs Another common setup in nonparametric regression is to directly assume a model • yi = f0(xi) + i; i = 1; : : : n; where now x1; : : : xn are fixed inputs, and 1; : : : are still i.i.d.
    [Show full text]
  • Low-Level Image Processing with the Structure Multivector
    Low-Level Image Processing with the Structure Multivector Michael Felsberg Bericht Nr. 0202 Institut f¨ur Informatik und Praktische Mathematik der Christian-Albrechts-Universitat¨ zu Kiel Olshausenstr. 40 D – 24098 Kiel e-mail: [email protected] 12. Marz¨ 2002 Dieser Bericht enthalt¨ die Dissertation des Verfassers 1. Gutachter Prof. G. Sommer (Kiel) 2. Gutachter Prof. U. Heute (Kiel) 3. Gutachter Prof. J. J. Koenderink (Utrecht) Datum der mundlichen¨ Prufung:¨ 12.2.2002 To Regina ABSTRACT The present thesis deals with two-dimensional signal processing for computer vi- sion. The main topic is the development of a sophisticated generalization of the one-dimensional analytic signal to two dimensions. Motivated by the fundamental property of the latter, the invariance – equivariance constraint, and by its relation to complex analysis and potential theory, a two-dimensional approach is derived. This method is called the monogenic signal and it is based on the Riesz transform instead of the Hilbert transform. By means of this linear approach it is possible to estimate the local orientation and the local phase of signals which are projections of one-dimensional functions to two dimensions. For general two-dimensional signals, however, the monogenic signal has to be further extended, yielding the structure multivector. The latter approach combines the ideas of the structure tensor and the quaternionic analytic signal. A rich feature set can be extracted from the structure multivector, which contains measures for local amplitudes, the local anisotropy, the local orientation, and two local phases. Both, the monogenic signal and the struc- ture multivector are combined with an appropriate scale-space approach, resulting in generalized quadrature filters.
    [Show full text]
  • A Guided Tour to the Plane-Based Geometric Algebra PGA
    A Guided Tour to the Plane-Based Geometric Algebra PGA Leo Dorst University of Amsterdam Version 1.15{ July 6, 2020 Planes are the primitive elements for the constructions of objects and oper- ators in Euclidean geometry. Triangulated meshes are built from them, and reflections in multiple planes are a mathematically pure way to construct Euclidean motions. A geometric algebra based on planes is therefore a natural choice to unify objects and operators for Euclidean geometry. The usual claims of `com- pleteness' of the GA approach leads us to hope that it might contain, in a single framework, all representations ever designed for Euclidean geometry - including normal vectors, directions as points at infinity, Pl¨ucker coordinates for lines, quaternions as 3D rotations around the origin, and dual quaternions for rigid body motions; and even spinors. This text provides a guided tour to this algebra of planes PGA. It indeed shows how all such computationally efficient methods are incorporated and related. We will see how the PGA elements naturally group into blocks of four coordinates in an implementation, and how this more complete under- standing of the embedding suggests some handy choices to avoid extraneous computations. In the unified PGA framework, one never switches between efficient representations for subtasks, and this obviously saves any time spent on data conversions. Relative to other treatments of PGA, this text is rather light on the mathematics. Where you see careful derivations, they involve the aspects of orientation and magnitude. These features have been neglected by authors focussing on the mathematical beauty of the projective nature of the algebra.
    [Show full text]
  • The Kernel of a Linear Transformation Is a Vector Subspace
    The kernel of a linear transformation is a vector subspace. Given two vector spaces V and W and a linear transformation L : V ! W we define a set: Ker(L) = f~v 2 V j L(~v) = ~0g = L−1(f~0g) which we call the kernel of L. (some people call this the nullspace of L). Theorem As defined above, the set Ker(L) is a subspace of V , in particular it is a vector space. Proof Sketch We check the three conditions 1 Because we know L(~0) = ~0 we know ~0 2 Ker(L). 2 Let ~v1; ~v2 2 Ker(L) then we know L(~v1 + ~v2) = L(~v1) + L(~v2) = ~0 + ~0 = ~0 and so ~v1 + ~v2 2 Ker(L). 3 Let ~v 2 Ker(L) and a 2 R then L(a~v) = aL(~v) = a~0 = ~0 and so a~v 2 Ker(L). Math 3410 (University of Lethbridge) Spring 2018 1 / 7 Example - Kernels Matricies Describe and find a basis for the kernel, of the linear transformation, L, associated to 01 2 31 A = @3 2 1A 1 1 1 The kernel is precisely the set of vectors (x; y; z) such that L((x; y; z)) = (0; 0; 0), so 01 2 31 0x1 001 @3 2 1A @yA = @0A 1 1 1 z 0 but this is precisely the solutions to the system of equations given by A! So we find a basis by solving the system! Theorem If A is any matrix, then Ker(A), or equivalently Ker(L), where L is the associated linear transformation, is precisely the solutions ~x to the system A~x = ~0 This is immediate from the definition given our understanding of how to associate a system of equations to M~x = ~0: Math 3410 (University of Lethbridge) Spring 2018 2 / 7 The Kernel and Injectivity Recall that a function L : V ! W is injective if 8~v1; ~v2 2 V ; ((L(~v1) = L(~v2)) ) (~v1 = ~v2)) Theorem A linear transformation L : V ! W is injective if and only if Ker(L) = f~0g.
    [Show full text]
  • Enabling Deeper Learning on Big Data for Materials Informatics Applications
    www.nature.com/scientificreports OPEN Enabling deeper learning on big data for materials informatics applications Dipendra Jha1, Vishu Gupta1, Logan Ward2,3, Zijiang Yang1, Christopher Wolverton4, Ian Foster2,3, Wei‑keng Liao1, Alok Choudhary1 & Ankit Agrawal1* The application of machine learning (ML) techniques in materials science has attracted signifcant attention in recent years, due to their impressive ability to efciently extract data‑driven linkages from various input materials representations to their output properties. While the application of traditional ML techniques has become quite ubiquitous, there have been limited applications of more advanced deep learning (DL) techniques, primarily because big materials datasets are relatively rare. Given the demonstrated potential and advantages of DL and the increasing availability of big materials datasets, it is attractive to go for deeper neural networks in a bid to boost model performance, but in reality, it leads to performance degradation due to the vanishing gradient problem. In this paper, we address the question of how to enable deeper learning for cases where big materials data is available. Here, we present a general deep learning framework based on Individual Residual learning (IRNet) composed of very deep neural networks that can work with any vector‑ based materials representation as input to build accurate property prediction models. We fnd that the proposed IRNet models can not only successfully alleviate the vanishing gradient problem and enable deeper learning, but also lead to signifcantly (up to 47%) better model accuracy as compared to plain deep neural networks and traditional ML techniques for a given input materials representation in the presence of big data.
    [Show full text]
  • Some Key Facts About Transpose
    Some key facts about transpose Let A be an m × n matrix. Then AT is the matrix which switches the rows and columns of A. For example 0 1 T 1 2 1 01 5 3 41 5 7 3 2 7 0 9 = B C @ A B3 0 2C 1 3 2 6 @ A 4 9 6 We have the following useful identities: (AT )T = A (A + B)T = AT + BT (kA)T = kAT Transpose Facts 1 (AB)T = BT AT (AT )−1 = (A−1)T ~v · ~w = ~vT ~w A deeper fact is that Rank(A) = Rank(AT ): Transpose Fact 2 Remember that Rank(B) is dim(Im(B)), and we compute Rank as the number of leading ones in the row reduced form. Recall that ~u and ~v are perpendicular if and only if ~u · ~v = 0. The word orthogonal is a synonym for perpendicular. n ? n If V is a subspace of R , then V is the set of those vectors in R which are perpendicular to every vector in V . V ? is called the orthogonal complement to V ; I'll often pronounce it \Vee perp" for short. ? n You can (and should!) check that V is a subspace of R . It is geometrically intuitive that dim V ? = n − dim V Transpose Fact 3 and that (V ?)? = V: Transpose Fact 4 We will prove both of these facts later in this note. In this note we will also show that Ker(A) = Im(AT )? Im(A) = Ker(AT )? Transpose Fact 5 As is often the case in math, the best order to state results is not the best order to prove them.
    [Show full text]
  • Orthogonal Polynomial Kernels and Canonical Correlations for Dirichlet
    Bernoulli 19(2), 2013, 548–598 DOI: 10.3150/11-BEJ403 Orthogonal polynomial kernels and canonical correlations for Dirichlet measures ROBERT C. GRIFFITHS1 and DARIO SPANO` 2 1 Department of Statistics, University of Oxford, 1 South Parks Road Oxford OX1 3TG, UK. E-mail: griff@stats.ox.ac.uk 2Department of Statistics, University of Warwick, Coventry CV4 7AL, UK. E-mail: [email protected] We consider a multivariate version of the so-called Lancaster problem of characterizing canon- ical correlation coefficients of symmetric bivariate distributions with identical marginals and orthogonal polynomial expansions. The marginal distributions examined in this paper are the Dirichlet and the Dirichlet multinomial distribution, respectively, on the continuous and the N- discrete d-dimensional simplex. Their infinite-dimensional limit distributions, respectively, the Poisson–Dirichlet distribution and Ewens’s sampling formula, are considered as well. We study, in particular, the possibility of mapping canonical correlations on the d-dimensional continuous simplex (i) to canonical correlation sequences on the d + 1-dimensional simplex and/or (ii) to canonical correlations on the discrete simplex, and vice versa. Driven by this motivation, the first half of the paper is devoted to providing a full characterization and probabilistic interpretation of n-orthogonal polynomial kernels (i.e., sums of products of orthogonal polynomials of the same degree n) with respect to the mentioned marginal distributions. We establish several identities and some integral representations which are multivariate extensions of important results known for the case d = 2 since the 1970s. These results, along with a common interpretation of the mentioned kernels in terms of dependent P´olya urns, are shown to be key features leading to several non-trivial solutions to Lancaster’s problem, many of which can be extended naturally to the limit as d →∞.
    [Show full text]
  • COVID-19 and Machine Learning: Investigation and Prediction Team #5
    COVID-19 and Machine Learning: Investigation and Prediction Team #5: Kiran Brar Olivia Alexander Kimberly Segura 1 Abstract The COVID-19 virus has affected over four million people in the world. In the U.S. alone, the number of positive cases have exceeded one million, making it the most affected country. There is clear urgency to predict and ultimately decrease the spread of this infectious disease. Therefore, this project was motivated to test and determine various machine learning models that can accurately predict the number of confirmed COVID-19 cases in the U.S. using available time-series data. COVID-19 data was coupled with state demographic data to investigate the distribution of cases and potential correlations between demographic features. Concerning the four machine learning models tested, it was hypothesized that LSTM and XGBoost would result in the lowest errors due to the complexity and power of these models, followed by SVR and linear regression. However, linear regression and SVR had the best performance in this study which demonstrates the importance of testing simpler models and only adding complexity if the data requires it. We note that LSTM’s low performance was most likely due to the size of the training dataset available at the time of this research as deep learning requires a vast amount of data. Additionally, each model’s accuracy improved after implementing time-series preprocessing techniques of power transformations, normalization, and the overall restructuring of the time-series problem to a supervised machine learning problem using lagged values. This research can be furthered by predicting the number of deaths and recoveries as well as extending the models by integrating healthcare capacity and social restrictions in order to increase accuracy or to forecast infection, death, and recovery rates for future dates.
    [Show full text]