Properly-Weighted Graph Laplacian for Semi-Supervised Learning
PROPERLY-WEIGHTED GRAPH LAPLACIAN FOR SEMI-SUPERVISED LEARNING JEFF CALDER Department of Mathematics, University of Minnesota DEJAN SLEPČEV Department of Mathematical Sciences, Carnegie Mellon University Abstract. The performance of traditional graph Laplacian methods for semi-supervised learning degrades substantially as the ratio of labeled to unlabeled data decreases, due to a degeneracy in the graph Laplacian. Several approaches have been proposed recently to address this, however we show that some of them remain ill-posed in the large-data limit. In this paper, we show a way to correctly set the weights in Laplacian regularization so that the estimator remains well posed and stable in the large-sample limit. We prove that our semi-supervised learning algorithm converges, in the infinite sample size limit, to the smooth solution of a continuum variational problem that attains the labeled values continuously. Our method is fast and easy to implement. 1. Introduction For many applications of machine learning, such as medical image classification and speech recognition, labeling data requires human input and is expensive [13], while unlabeled data is relatively cheap. Semi-supervised learning aims to exploit this dichotomy by utilizing the geometric or topological properties of the unlabeled data, in conjunction with the labeled data, to obtain better learning algorithms. A significant portion of the semi-supervised literature is on transductive learning, whereby a function is learned only at the unlabeled points, and not as a parameterized function on an ambient space. In the transductive setting, graph based algorithms, such the graph Laplacian-based learning pioneered by [53], are widely used and have achieved great success [3,26,27,44–48,50, 52].
[Show full text]