Lecture 12 Kernel Methods and Poisson Processes
Total Page:16
File Type:pdf, Size:1020Kb
LECTURE 12 KERNEL Methods AND POISSON Processes Dennis Sun Stanford UnivERSITY Stats 253 July 29, 2015 1 KERNEL Methods 2 Gaussian Processes 3 POINT Processes 4 Course LOGISTICS 1 KERNEL Methods 2 Gaussian Processes 3 POINT Processes 4 Course LOGISTICS KERNEL Methods KERNEL METHODS ARE POPULAR IN MACHINE learning. The IDEA IS THAT MANY METHODS ONLY DEPEND ON INNER PRODUCTS BETWEEN observations. Example (ridge regression): LikE OLS, BUT COEFfiCIENTS ARE SHRUNK TOWARDS zero: ^ridge 2 2 β = argminβ jjy − Xβjj + λjjβjj : = (XT X + λI)−1XT y Suppose I WANT TO PREDICT y0 AT A NEW POINT x0. Ridge PREDICTION IS T T −1 T y^0 = x0 (X X + λIp) X y T T T −1 = x0 X (XX + λIn) y −1 = K01(K11 + λI) y WHERE K01 = hx0; x1i · · · hx0; xni AND (K11)ij = hxi; xji. KERNEL Methods FOR NOW, THE KERNEL IS THE LINEAR KERNEL: K(xi; xj) = hxi; xji. WE CAN REPLACE THE LINEAR KERNEL BY ANY POSITIVe-semidefiNITE Kernel, SUCH AS 0 2 0 −θ2jjx−x jj K(x; x ) = θ1e : What WILL THIS do? KERNEL Regression: A Simple Example ● 6 ● ● ● ● ● 4 ● ● ●● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● y ● ● 0 ● ● ● ● ● ● ● ● ● ● ●● −2 ● ● ● ● ● −4 ● ● ● ● −6 ● −4 −2 0 2 4 x KERNEL Regression: A Simple Example Linear KERNEL ● 6 ● ● ● ● ● 4 ● ● ●● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● y ● ● 0 ● ● ● ● ● ● ● ● ● ● ●● −2 ● ● ● ● ● −4 ● ● ● ● −6 ● −4 −2 0 2 4 x KERNEL Regression: A Simple Example 0 2 K(x; x0) = e−(x−x ) ● 6 ● ● ● ● ● 4 ● ● ●● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● y ● ● 0 ● ● ● ● ● ● ● ● ● ● ●● −2 ● ● ● ● ● −4 ● ● ● ● −6 ● −4 −2 0 2 4 x WhY DOES KERNEL REGRESSION work? TWO INTERPRETATIONS: 1 Implicit BASIS Expansion: Since THE KERNEL IS POSITIVE SEMIDEfinite, WE CAN DO AN eigendecomposition: 1 0 X 0 K(x; x ) = λi'i(x)'i(x ): k=1 WE CAN THINK OF THIS INFORMALLY as: K(x; x0) = hΦ(x); Φ(x0)i; def p p WHERE Φ(x) = λ1'1(x) λ2'2(x) ··· . ^ 2 Representer theorem: The SOLUTION f SATISfiES n ^ X f(x) = αiK(x; xi) i=1 FOR SOME COEFfiCIENTS αi. IllustrATION OF THE Representer Theorem 2 −(x−x30) K(x; x30) = e ● 6 ● ● ● ● ● 4 ● ● ●● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● y ● ● 0 ● ● ● ● ● ● ● ● ● ● ●● −2 ● ● ● ● ● −4 ● ● ● ● −6 ● −4 −2 0 2 4 x IllustrATION OF THE Representer Theorem n 2 ^ X −(x−xi) f(x) = αie i=1 ● 6 ● ● ● ● ● 4 ● ● ●● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● y ● ● 0 ● ● ● ● ● ● ● ● ● ● ●● −2 ● ● ● ● ● −4 ● ● ● ● −6 ● −4 −2 0 2 4 x IllustrATION OF THE Representer Theorem n 2 ^ X −(x−xi) f(x) = αie i=1 ● 6 ● ● ● ● ● 4 ● ● ●● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● y ● ● 0 ● ● ● ● ● ● ● ● ● ● ●● −2 ● ● ● ● ● −4 ● ● ● ● −6 ● −4 −2 0 2 4 x KERNEL Methods AND Kriging Hopefully, You’vE REALIZED BY NOW THAT A POSITIVE SEMIDEfiNITE “kernel” IS JUST A COVARIANCE function. The PREDICTION EQUATION −1 y^0 = K01(K11 + λI) y SHOULD REMIND YOU OF THE KRIGING PREDICTION T ^GLS −1 ^GLS y^0 = x0 β + Σ01Σ11 (y − Xβ ): If WE DO NOT HAVE ANY PREDICTORS (the ONLY INFORMATION IS THE SPATIAL coordinates), THEN THIS REDUCES TO THE KERNEL PREDICTIONS (without THE λI. −1 y^0 = Σ01Σ11 y: Spatial Interpretation OF KERNEL Regression 0 2 Σ(x; x0) = e−(x−x ) ● 6 ● ● ● ● ● 4 ● ● ●● ● ● ● ● ● ● ● 2 ● ● ● ● ● ● ● ● ● y ● ● 0 ● ● ● ● ● ● ● ● ● ● ●● −2 ● ● ● ● ● −4 ● ● ● ● −6 ● −4 −2 0 2 4 x 1 KERNEL Methods 2 Gaussian Processes 3 POINT Processes 4 Course LOGISTICS Gaussian Process Model Suppose THE FUNCTION f IS Random, SAMPLED FROM A Gaussian PROCESS WITH MEAN 0 AND COVARIANCE FUNCTION Σ(x; x0). WE OBSERVE NOISY OBSERVATIONS yi = f(xi) + i, WHERE i ∼ N(0; λ). TO ESTIMATE f, WE SHOULD USE THE POSTERIOR MEAN f^ = E[fjy]. −1 f^(x0) = Σ01(Σ11 + λI) y; WHICH AGAIN IS EQUIVALENT TO KRIGING AND KERNEL methods! 1 KERNEL Methods 2 Gaussian Processes 3 POINT Processes 4 Course LOGISTICS POINT Processes In POINT PROCESSES, THE LOCATIONS THEMSELVES ARE Random! 37.80 37.79 Latitude 37.78 37.77 −122.46 −122.45 −122.44 −122.43 −122.42 −122.41 −122.40 −122.39 Longitude TYPICAL Question: Does THE PROCESS EXHIBIT clustering? POISSON POINT Process The (homogeneous) POISSON POINT PROCESS IS A BASELINE model. It HAS TWO DEfiNING CHARacteristics: 37.80 1. The NUMBER OF EVENTS IN A GIVEN REGION A IS Poisson, WITH 37.79 MEAN PROPORTIONAL TO THE SIZE Latitude A OF A: 37.78 N(A) ∼ Pois(λjAj) 37.77 −122.46 −122.45 −122.44 −122.43 −122.42 −122.41 −122.40 −122.39 Longitude POISSON POINT Process The (homogeneous) POISSON POINT PROCESS IS A BASELINE model. It HAS TWO DEfiNING CHARacteristics: 37.80 B 2. The NUMBER OF EVENTS IN TWO DISJOINT REGIONS A AND B ARE in- 37.79 Latitude A dependent. 37.78 N(A) ? N(B) 37.77 −122.46 −122.45 −122.44 −122.43 −122.42 −122.41 −122.40 −122.39 Longitude How WOULD YOU SIMULATE A POISSON POINT process? How TO TEST FOR homogeneity? QuadrAT test: Divide THE DOMAIN INTO DISJOINT regions. The NUMBER IN EACH REGION SHOULD BE INDEPENDENT Poisson, SO YOU CAN CALCULATE 2 X (Or − Er) X2 = E r r AND COMPARE TO A χ2 distribution. R Code: library(spatstat) data.ppp <- ppp(data$Longitude, data$Latitude, range(data$Longitude), range(data$Latitude)) quadrat.test(data.ppp) Chi-squared test of CSR using quadrat counts Pearson X2 statistic X2 = 863.05, df = 24, p-value < 2.2e-16 Quadrats: 5 by 5 grid of tiles How TO TEST FOR homogeneity? Ripley’s K function: Calculate THE NUMBER OF PAIRS LESS THAN A DISTANCE r APART FOR EVERY r > 0, GIVING US A FUNCTION K(r). Now SIMULATE THE DISTRIBUTION OF K∗(r) UNDER THE NULL HYPOTHESIS AND COMPARE IT TO K(r). plot(envelope(data.ppp, Kest)) ^ Kobs(r) K theo(r) 0.00015 ^ Khi(r) ^ Klo(r) 0.00010 ) r ( K 0.00005 0.00000 0.000 0.001 0.002 0.003 0.004 0.005 r So You’vE REJECTED THE null. Now what? If THE PROCESS ISN’t A HOMOGENEOUS POISSON process, WHAT IS it? It CAN BE AN INHOMOGENEOUS POISSON PROCESS. There’s SOME “density” λ(s) SUCH THAT N(A) IS POISSON WITH MEAN Z λ(s) ds: A Note THAT IF λ(s) ≡ λ, THEN THIS REDUCES TO A HOMOGENEOUS POISSON process. How DO WE ESTIMATE λ(s)? KERNEL DENSITY estimation! Choose A KERNEL SATISFYING Z K(s; s0) ds = 1: D 37.80 37.79 Latitude 37.78 37.77 −122.46 −122.45 −122.44 −122.43 −122.42 −122.41 −122.40 −122.39 Longitude How DO WE ESTIMATE λ(s)? ^ Pn The ESTIMATE IS JUST λ(s) = i=1 K(s; si). 1 KERNEL Methods 2 Gaussian Processes 3 POINT Processes 4 Course LOGISTICS Remaining Items • Homework 3 IS DUE THIS FRIDAY. • Please SUBMIT PROJECT PROPOSALS BY THIS FRIDAY AS well. I WILL RESPOND TO THEM OVER THE WEEKend. • Please RESUBMIT ALL QUIZZES BY NEXT ThursdaY. The GRADED QUIZZES CAN BE PICKED UP TOMORROW AT Jingshu’s OFfiCE hours. • WE MAY HAVE A GUEST LECTURE ON GeogrAPHIC Information Systems (GIS) AND MAPPING NEXT WEDNESDAY. I WILL KEEP YOU POSTED BY e-mail. • WE MAY OR MAY NOT HAVE fiNAL presentations. If WE do, THEY WOULD BE DURING THE SCHEDULED fiNAL EXAM TIME FOR THIS class, WHICH IS FRIDAY afternoon..