Kernel Smoothing Methods

Total Page:16

File Type:pdf, Size:1020Kb

Kernel Smoothing Methods Kernel Smoothing Methods Hanchen Wang Ph.D Candidate in Information Engineering, University of Cambridge September 29, 2019 Hanchen Wang ([email protected]) Kernel Smoothing Methods September 29, 2019 1 / 18 Overview 1 6.0 what is kernel smoothing? 2 6.1 one-dimensional kernel smoothers 3 6.2 selecting the width λ of the kernel p 4 6.3 local regression in R p 5 6.4 structured local regression models in R 6 6.5 local likelihood and other models 7 6.6 kernel density estimation and classification 8 6.7 radial basis functions and kernels 9 6.8 mixture models for density estimation and classifications 10 6.9 computation considerations 11 Q & A: relationship between kernel smoothing methods and kernel methods 12 one more thing: solution manual to these textbooks Hanchen Wang ([email protected]) Kernel Smoothing Methods September 29, 2019 2 / 18 6.0 what is kernel smoothing method? a class of regression techniques that achieve flexibility in esti- p mating function f (X ) over the domain R by fitting a different but simple model separately at each query point x0. p resulting estimated function f (X ) is smooth in R fitting gets done at evaluation time, memory-based methods require in principle little or no training, similar as kNN lazy learning require hyperparameter setting such as metric window size λ kernels are mostly used as a device for localization rather than high-dimensional (implicit) feature extractor in kernel methods Hanchen Wang ([email protected]) Kernel Smoothing Methods September 29, 2019 3 / 18 6.1 one-dimensional kernel smoothers, overview Y = sin(4X ) + "; X ∼ U[0; 1];" ∼ N(0; 1=3) red point ! f^(x0), red circles ! observations contributing to the fit at x0, solid yellow region ! the weights assigned to observations Hanchen Wang ([email protected]) Kernel Smoothing Methods September 29, 2019 4 / 18 6.1 one-dimensional kernel smoothers, overview k-nearest-neighbor average: discontinuous; equal weight for neighborhoods ^ f (x) = Ave (yi jxi 2 Nk (x)) Nadaraya{Watson kernel-weighted average PN i=1 Kλ (x0; xi ) yi jx − x0j f^(x0) = K (x0; x) = D PN λ λ i=1 Kλ (x0; xi ) 3 1 − t2 if jtj ≤ 1 Epanechnikov quadratic kernel: D(t) = 4 0 otherwise more general with adaptive neighborhood: jx − x0j Kλ (x0; x) = D hλ (x0) 1 − jtj33 if jtj ≤ 1 tri-cube kernel: D(t) = 0 otherwise compact or not? differentiable at boundary? Hanchen Wang ([email protected]) Kernel Smoothing Methods September 29, 2019 5 / 18 6.1 one-dimensional kernel smoothers, local linear boundary issues arise ! fit a locally weighted linear regression N ^ ^ X 2 f (x0) =α ^ (x0) + β (x0) x0 ! min Kλ (x0; xi )[yi − α (x0) − β (x0) xi ] α(x ),β(x ) 0 0 i=1 Hanchen Wang ([email protected]) Kernel Smoothing Methods September 29, 2019 6 / 18 6.1 one-dimensional kernel smoothers, local linear N X 2 min Kλ (x0; xi )[yi − α (x0) − β (x0) xi ] α(x ),β(x ) 0 0 i=1 in matrix form: 00 1 0 1 1T 0 100 1 0 1 1 y1 1 x1 Kλ(x0; x1) y1 1 x1 y 1 x α(x ) K (x ; x ) y 1 x α(x ) min BB 2 C− B 2 C 0 C diag B λ 0 2 CBB 2 C− B 2 C 0 C BB ::: C B ::: C β(x ) C B ::: CBB ::: C B ::: C β(x ) C α(x0),β(x0)@@ A @ A 0 A @ A@@ A @ A 0 A yn 1 xn Kλ(x0; xn) yn 1 xn α(x ) T α(x ) min y − B 0 W(x ) y − B 0 β(x ) 0 β(x ) α(x0),β(x0) 0 0 N ^ 1 T −1 T X ! f (x0) = B W (x0) B B W (x0) y = li (x0) yi x0 i=1 f^(x0) is linear w.r.t yi li (x0) ! Kλ(x0; xi ) + least squares operations, equivalent kernel Hanchen Wang ([email protected]) Kernel Smoothing Methods September 29, 2019 7 / 18 6.1 one-dimensional kernel smoothers, local linear why 'this bias is removed to first order': N N N ^ X X 0 X E(f (x0)) = li (x0) f (xi ) =f (x0) li (x0) + f (x0) (xi − x0) li (x0) i=1 i=1 i=1 00 N f (x0) X 2 3 + (xi − x0) li (x0) + O(x ) 2 i=1 PN PN it can be proved: i=1 li (x0) = 1 and i=1 (xi − x0) li (x0) = 0 there is still room for improvement: quadratic fits outperform linear at curvature region Hanchen Wang ([email protected]) Kernel Smoothing Methods September 29, 2019 8 / 18 6.1 one-dimensional kernel smoothers, local polynomial ^ Pd ^ j we can fit local polynomial fits of any degree d: f (x0) =α ^ (x0) + j=1 βj (x0) x0 2 N 2 d 3 X X j min Kλ (x0; xi ) 4yi − α (x0) − βj (x0) xi 5 α(x ),β (x );j=1;:::;d 0 j 0 i=1 j=1 bias of d-degree fitting is provably to have components of degree d + 1 and higher no free lunch ! increased variance 2 2 2 2 yi = f (xi ) + "i ;" ∼ N (0; ); Var f^(x0) = σ kl (x0)k ; d " kl (x0)k " - Local linear fits, bias decrease at the boundaries at a modest cost in variance - Local quadratic fit, do little at the boundaries for bias, increase the variance a lot; but most helpful in reducing bias due to curvature in the interior of the domain. Hanchen Wang ([email protected]) Kernel Smoothing Methods September 29, 2019 9 / 18 6.2 and 6.3 selecting the width λ of the kernel a natural bias{variance tradeoff narrower window, larger variance, smaller bias wider window, smaller variance, larger bias same intuition for local regression(linear/polynomial) estimates p local regression in R p-dimensional 6= p × 1-dimensional ! interaction terms between dimensions T consider p = 3, thus each point is 3 × 1 vector: x := (x(1); x(2); x(3)) then the general form for local kernel regression with d order polynomial is: N T 2 X (d) T (0) (1) (1) (d) min Kλ (x0; xi ) yi − B (xi ) β (x0); β (x0); β (x0); :::; β (x0) (k) 1 1 2 d(d+1)=2 βj (x0) i=1 where kx − x0k (0) T (1) T K (x0; xi ) = D B (x) = (1) B (x) = 1; x ; x ; x λ λ (1) (2) (3) (2) T 2 2 2 B (xi ) = 1; x(1); x(2); x(3); x(1); x(2); x(3); x(1)x(2); x(1)x(3); x(2)x(3) While boundary effects is a much bigger problem in higher dimensions, since the fraction of points on the boundary is larger. In fact, one of the manifestations of the curse of dimensionality is that the fraction of points close to the boundary increases to one as the dimension grows. Hanchen Wang ([email protected]) Kernel Smoothing Methods September 29, 2019 10 / 18 6.4 structured local regression models in Rp structured kernel modify the kernel ! standardize each dimension to unit standard deviation more general, use a positive semidefinite matrix A ! Mahalanobis metric1 T ! (x − x0) A (x − x0) K (x0; x) = D λ,A λ structured regression functions analysis-of-variance (ANOVA) decompositions X X E(yjX0) = f (X1; X2;:::; Xp) = α + gj Xj + gk` (Xk ; X`) + ··· j k<` varying coefficient model ∼ latent variable ??? f (X ) = α(Z) + β1(Z)X1 + ··· + βq(Z)Xq N X 2 min Kλ (z0; zi ) yi − α (z0) − x1i β1 (z0) − · · · − xqi βq (z0) α(z ),β(z ) 0 0 i=1 1http://contrib.scikit-learn.org/metric-learn/ Hanchen Wang ([email protected]) Kernel Smoothing Methods September 29, 2019 11 / 18 6.5 local likelihood and other models the concept of local regression and varying coefficient models is extremely broad T local likelihood inference: y0 = θ (x0) = x0 β (x0) N X T l(β(x0)) = Kλ(x0; xi )l(yi ; xi β(x0)) i=1 N X T l(θ(z0)) = Kλ(z0; zi )l(yi ; η(xi ; θ(z0))) η(x; θ) = x θ i=1 autoregressive time series model: yt = β0 + β1yt−1 + β2yt−2 + ··· + βk yt−k + "t T lag set: zt = (yt−1; yt−2;:::; yt−k ) yt = zt β + "t recall multiclass linear logistic regression model from ch.4 β +βT x e j0 j Pr(G = jjX = x) = β +βT x PJ−1 k0 k 1 + k=1 e N 8 2 J−1 39 X < X = l(β(x )) = K (x ; x ) β (x ) + β (x )T (x − x ) − log 1 + exp β (x ) + β (x )T (x − x ) 0 λ 0 i gi 0 0 gi 0 i 0 4 k0 0 k 0 i 0 5 i=1 : k=1 ; Hanchen Wang ([email protected]) Kernel Smoothing Methods September 29, 2019 12 / 18 6.6 kernel density estimation and classification Kernel Density Estimation Suppose we have a random sample x1; :::; xN drawn from a probability density fX(x), and we wish to estimate fX at a point x0 #xi 2 N (x0) f^ (x0) = X N(λ) N N 1 1 1 2 ^ X ^ X − (jjxi −x0jj/λ) fX (x0) = Kλ (x0; xi ) ! fX (x0) = p e 2 N(λ) 2 i=1 N (2λ π) 2 i=1 Kernel Density Classification nonparametric density estimates for classification using Bayes' theorem a J class problem, fit nonparametric density estimates f^j (X ); j = 1;:::; J, along with estimates of the class priorsπ ^j (usually the sample proportions), then π^ f^ (x ) Pr^ (G = jjX = x ) = j j 0 0 PJ ^ k=1 π^k fk (x0) The Naive Bayes Classifier especially appropriate when the dim p is high, making density estimation unattractive, the naive Bayes model assumes that given a class G = j, the features Xk are independent: p Y fj (X ) = fjk (Xk ) k=1 Hanchen Wang ([email protected]) Kernel Smoothing Methods September 29, 2019 13 / 18 6.7 radial basis functions and kernels OMITTED Hanchen Wang ([email protected]) Kernel Smoothing Methods September 29, 2019 14 / 18 6.8 and 6.9 Mixture Models for Density Estimation and Classification Gaussian Mixture Model(GMM), more in ch.8 M α^ φ x ;µ ^ ; Σ^ X X m i m m f (x) = αmφ (x; µm; Σm) αm = 1r ^im = PM ^ m=1 m k=1 α^k φ xi ;µ ^k ; Σk Computational Considerations both local regression and density estimation are memory-based methods model is the entire training data set, fitting is done at evaluation or prediction time computational cost to fit at a single observation x0 is O(N) flops, for comparison, an expansion in M basis functions costs O(M) for one evaluation, and typically M ∼ O(log N) basis function methods have an initial cost of at least O(NM2 + M3) smoothing parameter(s) λ for kernel methods are typically determined off-line, for example using cross-validation, at a cost of O(N2) flops Popular implementations of local regression(such as loess function in and R) has some optimization techniques O(NM)s Hanchen Wang ([email protected]) Kernel Smoothing Methods September 29, 2019 15 / 18 Q & A: relationship between kernel smoothing methods and kernel methods - confused due to abuse of terminology Kernel
Recommended publications
  • Least Squares Regression Principal Component Analysis a Supervised Dimensionality Reduction Method for Machine Learning in Scientific Applications
    Universitat Politecnica` de Catalunya Bachelor's Degree in Engineering Physics Bachelor's Thesis Least Squares Regression Principal Component Analysis A supervised dimensionality reduction method for machine learning in scientific applications Supervisor: Author: Dr. Xin Yee H´ector Pascual Herrero Dr. Joan Torras June 29, 2020 Abstract Dimension reduction is an important technique in surrogate modeling and machine learning. In this thesis, we present three existing dimension reduction methods in de- tail and then we propose a novel supervised dimension reduction method, `Least Squares Regression Principal Component Analysis" (LSR-PCA), applicable to both classifica- tion and regression dimension reduction tasks. To show the efficacy of this method, we present different examples in visualization, classification and regression problems, com- paring it to state-of-the-art dimension reduction methods. Furthermore, we present the kernel version of LSR-PCA for problems where the input are correlated non-linearly. The examples demonstrated that LSR-PCA can be a competitive dimension reduction method. 1 Acknowledgements I would like to express my gratitude to my thesis supervisor, Professor Xin Yee. I would like to thank her for giving me this wonderful opportunity and for her guidance and support during all the passing of this semester, putting herself at my disposal during the difficult times of the COVID-19 situation. Without her, the making of this thesis would not have been possible. I would like to extend my thanks to Mr. Pere Balsells, for allowing students like me to conduct their thesis abroad, as well as to the Balsells Foundation for its help and support throughout the whole stay.
    [Show full text]
  • Nonparametric Regression
    Nonparametric Regression Statistical Machine Learning, Spring 2015 Ryan Tibshirani (with Larry Wasserman) 1 Introduction, and k-nearest-neighbors 1.1 Basic setup, random inputs Given a random pair (X; Y ) Rd R, recall that the function • 2 × f0(x) = E(Y X = x) j is called the regression function (of Y on X). The basic goal in nonparametric regression is ^ d to construct an estimate f of f0, from i.i.d. samples (x1; y1);::: (xn; yn) R R that have the same joint distribution as (X; Y ). We often call X the input, predictor,2 feature,× etc., and Y the output, outcome, response, etc. Importantly, in nonparametric regression we do not assume a certain parametric form for f0 Note for i.i.d. samples (x ; y );::: (x ; y ), we can always write • 1 1 n n yi = f0(xi) + i; i = 1; : : : n; where 1; : : : n are i.i.d. random errors, with mean zero. Therefore we can think about the sampling distribution as follows: (x1; 1);::: (xn; n) are i.i.d. draws from some common joint distribution, where E(i) = 0, and then y1; : : : yn are generated from the above model In addition, we will assume that each i is independent of xi. As discussed before, this is • actually quite a strong assumption, and you should think about it skeptically. We make this assumption for the sake of simplicity, and it should be noted that a good portion of theory that we cover (or at least, similar theory) also holds without the assumption of independence between the errors and the inputs 1.2 Basic setup, fixed inputs Another common setup in nonparametric regression is to directly assume a model • yi = f0(xi) + i; i = 1; : : : n; where now x1; : : : xn are fixed inputs, and 1; : : : are still i.i.d.
    [Show full text]
  • Enabling Deeper Learning on Big Data for Materials Informatics Applications
    www.nature.com/scientificreports OPEN Enabling deeper learning on big data for materials informatics applications Dipendra Jha1, Vishu Gupta1, Logan Ward2,3, Zijiang Yang1, Christopher Wolverton4, Ian Foster2,3, Wei‑keng Liao1, Alok Choudhary1 & Ankit Agrawal1* The application of machine learning (ML) techniques in materials science has attracted signifcant attention in recent years, due to their impressive ability to efciently extract data‑driven linkages from various input materials representations to their output properties. While the application of traditional ML techniques has become quite ubiquitous, there have been limited applications of more advanced deep learning (DL) techniques, primarily because big materials datasets are relatively rare. Given the demonstrated potential and advantages of DL and the increasing availability of big materials datasets, it is attractive to go for deeper neural networks in a bid to boost model performance, but in reality, it leads to performance degradation due to the vanishing gradient problem. In this paper, we address the question of how to enable deeper learning for cases where big materials data is available. Here, we present a general deep learning framework based on Individual Residual learning (IRNet) composed of very deep neural networks that can work with any vector‑ based materials representation as input to build accurate property prediction models. We fnd that the proposed IRNet models can not only successfully alleviate the vanishing gradient problem and enable deeper learning, but also lead to signifcantly (up to 47%) better model accuracy as compared to plain deep neural networks and traditional ML techniques for a given input materials representation in the presence of big data.
    [Show full text]
  • COVID-19 and Machine Learning: Investigation and Prediction Team #5
    COVID-19 and Machine Learning: Investigation and Prediction Team #5: Kiran Brar Olivia Alexander Kimberly Segura 1 Abstract The COVID-19 virus has affected over four million people in the world. In the U.S. alone, the number of positive cases have exceeded one million, making it the most affected country. There is clear urgency to predict and ultimately decrease the spread of this infectious disease. Therefore, this project was motivated to test and determine various machine learning models that can accurately predict the number of confirmed COVID-19 cases in the U.S. using available time-series data. COVID-19 data was coupled with state demographic data to investigate the distribution of cases and potential correlations between demographic features. Concerning the four machine learning models tested, it was hypothesized that LSTM and XGBoost would result in the lowest errors due to the complexity and power of these models, followed by SVR and linear regression. However, linear regression and SVR had the best performance in this study which demonstrates the importance of testing simpler models and only adding complexity if the data requires it. We note that LSTM’s low performance was most likely due to the size of the training dataset available at the time of this research as deep learning requires a vast amount of data. Additionally, each model’s accuracy improved after implementing time-series preprocessing techniques of power transformations, normalization, and the overall restructuring of the time-series problem to a supervised machine learning problem using lagged values. This research can be furthered by predicting the number of deaths and recoveries as well as extending the models by integrating healthcare capacity and social restrictions in order to increase accuracy or to forecast infection, death, and recovery rates for future dates.
    [Show full text]
  • Input Output Kernel Regression: Supervised and Semi-Supervised Structured Output Prediction with Operator-Valued Kernels
    Journal of Machine Learning Research 17 (2016) 1-48 Submitted 11/15; Revised 7/16; Published 9/16 Input Output Kernel Regression: Supervised and Semi-Supervised Structured Output Prediction with Operator-Valued Kernels C´elineBrouard [email protected] Helsinki Institute for Information Technology HIIT Department of Computer Science, Aalto University 02150 Espoo, Finland IBISC, Universit´ed'Evry´ Val d'Essonne 91037 Evry´ cedex, France Marie Szafranski [email protected] ENSIIE & LaMME, Universit´ed'Evry´ Val d'Essonne, CNRS, INRA 91037 Evry´ cedex, France IBISC, Universit´ed'Evry´ Val d'Essonne 91037 Evry´ cedex, France Florence d'Alch´e-Buc [email protected] LTCI, CNRS, T´el´ecom ParisTech Universit´eParis-Saclay 46, rue Barrault 75013 Paris, France IBISC, Universit´ed'Evry´ Val d'Essonne 91037 Evry´ cedex, France Editor: Koji Tsuda Abstract In this paper, we introduce a novel approach, called Input Output Kernel Regression (IOKR), for learning mappings between structured inputs and structured outputs. The approach belongs to the family of Output Kernel Regression methods devoted to regres- sion in feature space endowed with some output kernel. In order to take into account structure in input data and benefit from kernels in the input space as well, we use the Re- producing Kernel Hilbert Space theory for vector-valued functions. We first recall the ridge solution for supervised learning and then study the regularized hinge loss-based solution used in Maximum Margin Regression. Both models are also developed in the context of semi-supervised setting. In addition we derive an extension of Generalized Cross Validation for model selection in the case of the least-square model.
    [Show full text]
  • Kernel Smoothing Toolbox ∗ for MATLAB
    Kernel Smoothing Toolbox ∗ for MATLAB Contents 1 Kernel regression 2 1.1 Startmenu .................................. 2 1.2 Basicmenuandsettingofparameters . ... 4 1.3 Estimationofoptimalparameters . ... 6 1.4 “Eye-control”method ............................ 7 1.5 Comparing of methods for bandwidth selection . ...... 8 1.6 The final estimation of the regression function . ....... 9 2 Kernel quality indexes 11 2.1 Startmenu .................................. 11 2.2 Basicmenu .................................. 14 3 Two-dimensional density estimation 16 3.1 Startmenu .................................. 16 3.2 Basicmenuandsettingofparameters . ... 19 3.3 Finalestimationofthedensity. ... 22 ∗Jan Kol´aˇcek, Dept. of Mathematics and Statistics, Masaryk University, Kotl´aˇrsk´a2, Brno, Czech Republic, [email protected], available on http : //math.muni.cz/ ∼ kolacek/docs/kerns.zip 1 1 Kernel regression 1.1 Start menu The Start menu (Figure 1) for kernel regression is called up by command >> ksregress 1♠ ❙ ❙❙✇ 2♠ ✑✰✑ 3♠ 4♠ ✑✰✑ Figure 1: Start menu You can skip this menu by typing input data as an argument >> ksregress(x,y); where vectors x and y should have the same length n and they mark x and y axes of measurements. If we know also the right regression function f (for example for sim- ulated data), we can set it as the next argument. For more see >> help ksregress. After execution of this command directly the window on Figure 3 is called up. In the Start menu, you have several possibilities how to define input data. You can load it from a file (button 1♠) or simulate data (button 2♠). In fields 3♠you can list your variables in current workspace to define input data. If your workspace is empty, these fields are non-active.
    [Show full text]
  • Krls: a Stata Package for Kernel-Based Regularized Least
    JSS Journal of Statistical Software , Volume , Issue . http://www.jstatsoft.org/ krls:A Stata Package for Kernel-Based Regularized Least Squares Jeremy Ferwerda Jens Hainmueller Massachusetts Institute of Technology Stanford University Chad J. Hazlett University of California Los Angeles Abstract The Stata package krls implements kernel-based regularized least squares (KRLS), a machine learning method described in Hainmueller and Hazlett(2014) that allows users to tackle regression and classification problems without strong functional form assumptions or a specification search. The flexible KRLS estimator learns the functional form from the data, thereby protecting inferences against misspecification bias. Yet it nevertheless allows for interpretability and inference in ways similar to ordinary regression models. In particular, KRLS provides closed-form estimates for the predicted values, variances, and the pointwise partial derivatives that characterize the marginal effects of each independent variable at each data point in the covariate space. The method is thus a convenient and powerful alternative to OLS and other GLMs for regression-based analyses. We also provide a companion package and replication code that implements the method in R. Keywords: machine learning, regression, classification, prediction, Stata, R. 1. Overview GLMs remain the workhorse modeling technology for most regression and classification prob- lems in social science research. GLMs are relatively easy to use and interpret, and allow a variety of outcome variable types with different assumed conditional distributions. However, by using the data in a linear way within the appropriate link function, all GLMs impose strin- gent functional form assumptions that are often potentially inaccurate for social science data. For example, linear regression typically requires that the marginal effect of each covariate is constant across the covariate space.
    [Show full text]
  • Iprior: an R Package for Regression Modelling Using I-Priors
    iprior: An R Package for Regression Modelling using I-priors Haziq Jamil Wicher Bergsma Universiti Brunei Darussalam London School of Economics Abstract This is an overview of the R package iprior, which implements a unified methodology for fitting parametric and nonparametric regression models, including additive models, multilevel models, and models with one or more functional covariates. Based on the principle of maximum entropy, an I-prior is an objective Gaussian process prior for the regression function with covariance kernel equal to its Fisher information. The regres- sion function is estimated by its posterior mean under the I-prior, and hyperparameters are estimated via maximum marginal likelihood. Estimation of I-prior models is simple and inference straightforward, while small and large sample predictive performances are comparative, and often better, to similar leading state-of-the-art models. We illustrate the use of the iprior package by analysing a simulated toy data set as well as three real- data examples, in particular, a multilevel data set, a longitudinal data set, and a dataset involving a functional covariate. Keywords: Gaussian, process, regression, objective, prior, empirical, Bayes, RKHS, kernel, EM, algorithm, Nystr¨om, random, effects, multilevel, longitudinal, functional, I-prior, R. 1. Introduction For subject i 2 f1; : : : ; ng, assume a real-valued response yi has been observed, as well as a row vector of p covariates xi = (xi1; : : : ; xip), where each xik belongs to some set Xk, k = 1; : : : ; p. To describe the dependence of the yi on the xi, we consider the regression model yi = α + f(xi) + i (1) iid −1 i ∼ N(0; ) where f is some regression function to be estimated, and α is an intercept.
    [Show full text]
  • Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains
    Fourier Features Let Networks Learn High Frequency Functions in Low Dimensional Domains Matthew Tancik1∗ Pratul P. Srinivasan1;2∗ Ben Mildenhall1∗ Sara Fridovich-Keil1 Nithin Raghavan1 Utkarsh Singhal1 Ravi Ramamoorthi3 Jonathan T. Barron2 Ren Ng1 1University of California, Berkeley 2Google Research 3University of California, San Diego Abstract We show that passing input points through a simple Fourier feature mapping enables a multilayer perceptron (MLP) to learn high-frequency functions in low- dimensional problem domains. These results shed light on recent advances in computer vision and graphics that achieve state-of-the-art results by using MLPs to represent complex 3D objects and scenes. Using tools from the neural tangent kernel (NTK) literature, we show that a standard MLP fails to learn high frequencies both in theory and in practice. To overcome this spectral bias, we use a Fourier feature mapping to transform the effective NTK into a stationary kernel with a tunable bandwidth. We suggest an approach for selecting problem-specific Fourier features that greatly improves the performance of MLPs for low-dimensional regression tasks relevant to the computer vision and graphics communities. 1 Introduction A recent line of research in computer vision and graphics replaces traditional discrete representations of objects, scene geometry, and appearance (e.g. meshes and voxel grids) with continuous functions parameterized by deep fully-connected networks (also called multilayer perceptrons or MLPs). These MLPs, which we will call “coordinate-based” MLPs, take low-dimensional coordinates as inputs (typically points in R3) and are trained to output a representation of shape, density, and/or color at each input location (see Figure 1).
    [Show full text]
  • Kernel Regularized Least Squares: Reducing Misspecification Bias with a Flexible and Interpretable Machine Learning Approach
    Advance Access publication October 10, 2013 Political Analysis (2014) 22:143–168 doi:10.1093/pan/mpt019 Kernel Regularized Least Squares: Reducing Misspecification Bias with a Flexible and Interpretable Machine Learning Approach Jens Hainmueller Department of Political Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139 e-mail: [email protected] (corresponding author) Chad Hazlett Downloaded from Department of Political Science, Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, MA 02139 e-mail: [email protected] Edited by R. Michael Alvarez http://pan.oxfordjournals.org/ We propose the use of Kernel Regularized Least Squares (KRLS) for social science modeling and inference problems. KRLS borrows from machine learning methods designed to solve regression and classification problems without relying on linearity or additivity assumptions. The method constructs a flexible hypothesis space that uses kernels as radial basis functions and finds the best-fitting surface in this space by minimizing a complexity-penalized least squares problem. We argue that the method is well-suited for social science inquiry because it avoids strong parametric assumptions, yet allows interpretation in ways at Stanford University on December 4, 2015 analogous to generalized linear models while also permitting more complex interpretation to examine nonlinearities, interactions, and heterogeneous effects. We also extend the method in several directions to make it more effective for social inquiry, by (1) deriving estimators for the pointwise marginal effects and their variances, (2) establishing unbiasedness, consistency, and asymptotic normality of the KRLS estimator under fairly general conditions, (3) proposing a simple automated rule for choosing the kernel bandwidth, and (4) providing companion software.
    [Show full text]
  • Deconvoluting Kernel Density Estimation and Regression for Locally Diferentially Private Data Farhad Farokhi
    www.nature.com/scientificreports OPEN Deconvoluting kernel density estimation and regression for locally diferentially private data Farhad Farokhi Local diferential privacy has become the gold-standard of privacy literature for gathering or releasing sensitive individual data points in a privacy-preserving manner. However, locally diferential data can twist the probability density of the data because of the additive noise used to ensure privacy. In fact, the density of privacy-preserving data (no matter how many samples we gather) is always fatter in comparison with the density function of the original data points due to convolution with privacy- preserving noise density function. The efect is especially more pronounced when using slow-decaying privacy-preserving noises, such as the Laplace noise. This can result in under/over-estimation of the heavy-hitters. This is an important challenge facing social scientists due to the use of diferential privacy in the 2020 Census in the United States. In this paper, we develop density estimation methods using smoothing kernels. We use the framework of deconvoluting kernel density estimators to remove the efect of privacy-preserving noise. This approach also allows us to adapt the results from non-parametric regression with errors-in-variables to develop regression models based on locally diferentially private data. We demonstrate the performance of the developed methods on fnancial and demographic datasets. Government regulations, such as the roll-out of the General Data Protection Regulation in the European Union (EU) (https ://gdpr-info.eu), the California Consumer Privacy Act (https ://oag.ca.gov/priva cy/ccpa), and the development of the Data Sharing and Release Bill in Australia (https ://www.pmc.gov.au/publi c-data/data-shari ng-and-relea se-refor ms.) increasingly prohibit sharing customer’s data without explicit consent 1.
    [Show full text]
  • Nonparametric Methods
    Nonparametric Methods Franco Peracchi University of Rome “Tor Vergata” and EIEF January 2011 Contents 1 Density estimators 2 1.1Empiricaldensities......................... 4 1.2Thekernelmethod......................... 10 1.3Statisticalpropertiesofthekernelmethod............ 16 1.4Othermethodsforunivariatedensityestimation......... 30 1.5Multivariatedensityestimators.................. 34 1.6Statacommands.......................... 40 2 Linear nonparametric regression estimators 43 2.1Polynomialregressions....................... 45 2.2Regressionsplines.......................... 47 2.3Thekernelmethod......................... 51 2.4Thenearestneighbormethod................... 55 2.5Cubicsmoothingsplines...................... 59 2.6Statisticalpropertiesoflinearsmoothers............. 62 2.7Averagederivatives......................... 68 2.8Methodsforhigh-dimensionaldata................ 70 2.9Statacommands.......................... 77 3 Distribution function and quantile function estimators 80 3.1Thedistributionfunctionandthequantilefunction....... 80 3.2Theempiricaldistributionfunction................ 88 3.3Theempiricalquantilefunction.................. 98 3.4Conditionaldfandqf........................113 3.5Estimatingtheconditionalqf...................116 3.6Estimatingtheconditionaldf...................128 3.7Relationshipsbetweenthetwoapproaches............136 3.8Statacommands..........................139 1 1 Density estimators Let the data Z1,...,Zn be a sample from the distribution of a random vector Z. We are interested in the general problem
    [Show full text]