Statistical Machine Learning: Introduction

Total Page:16

File Type:pdf, Size:1020Kb

Statistical Machine Learning: Introduction Statistical Machine Learning: Introduction Dino Sejdinovic Department of Statistics University of Oxford 22-24 June 2015, Novi Sad slides available at: http://www.stats.ox.ac.uk/~sejdinov/talks.html Tom Mitchell, 1997 Any computer program that improves its performance at some task through experience. Kevin Murphy, 2012 To develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest. Introduction Introduction What is Machine Learning? Arthur Samuel, 1959 Field of study that gives computers the ability to learn without being explicitly programmed. Kevin Murphy, 2012 To develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest. Introduction Introduction What is Machine Learning? Arthur Samuel, 1959 Field of study that gives computers the ability to learn without being explicitly programmed. Tom Mitchell, 1997 Any computer program that improves its performance at some task through experience. Introduction Introduction What is Machine Learning? Arthur Samuel, 1959 Field of study that gives computers the ability to learn without being explicitly programmed. Tom Mitchell, 1997 Any computer program that improves its performance at some task through experience. Kevin Murphy, 2012 To develop methods that can automatically detect patterns in data, and then to use the uncovered patterns to predict future data or other outcomes of interest. Introduction Introduction What is Machine Learning? Information Structure Prediction Decisions Actions data http://gureckislab.org Larry Page about DeepMind’s ML systems that can learn to play video games like humans Introduction Introduction What is Machine Learning? statistics business computer finance science cognitive biology Machine science genetics Learning psychology physics mathematics engineering operations research Introduction Introduction What is Data Science? ’Data Scientists’ Meld Statistics and Software WSJ article Introduction Introduction Information Revolution Traditional Statistical Inference Problems Well formulated question that we would like to answer. Expensive data gathering and/or expensive computation. Create specially designed experiments to collect high quality data. Information Revolution Improvements in data processing and data storage. Powerful, cheap, easy data capturing. Lots of (low quality) data with potentially valuable information inside. Introduction Introduction Statistics and Machine Learning in the age of Big Data ML becoming a thorough blending of computer science, engineering and statistics unified framework of data, inferences, procedures, algorithms statistics taking computation seriously computing taking statistical risk seriously scale and granularity of data personalization, societal and business impact multidisciplinarity - and you are the interdisciplinary glue it’s just getting started Michael Jordan: On the Computational and Statistical Interface and "Big Data" Introduction Applications of Machine Learning Applications of Machine Learning spam filtering recommendation fraud detection systems self-driving cars stock market analysis image recognition ImageNet: Krizhevsky et al, 2012; Machine Learning is Eating the World: Forbes article Introduction Types of Machine Learning Types of Machine Learning Supervised learning Data contains “labels”: every example is an input-output pair classification, regression Goal: prediction on new examples Unsupervised learning Extract key features of the “unlabelled” data clustering, signal separation, density estimation Goal: representation, hypothesis generation, visualization Introduction Types of Machine Learning Types of Machine Learning Semi-supervised Learning A database of examples, only a small subset of which are labelled. Multi-task Learning A database of examples, each of which has multiple labels corresponding to different prediction tasks. Reinforcement Learning An agent acting in an environment, given rewards for performing appropriate actions, learns to maximize their reward. Introduction Types of Machine Learning Literature Supervised Learning Supervised Learning Supervised Learning n We observe the input-output pairs f(xi; yi)gi=1, xi 2 X , yi 2 Y Types of supervised learning: Classification: discrete responses, e.g. Y = f+1; −1g or f1;:::; Kg. Regression: a numerical value is observed and Y = R. The goal is to accurately predict the response Y on new observations of X, i.e., to learn a function f : Rp !Y, such that f (X) will be close to the true response Y. Supervised Learning Decision Theory Loss function Suppose we made a prediction Y^ = f (X) 2 Y based on observation of X. How good is the prediction? We can use a loss function L : Y × Y 7! R+ to formalize the quality of the prediction. Typical loss functions: Misclassification loss (or 0-1 loss) for classification 0 f (X) = Y L(Y; f (X)) = : 1 f (X) 6= Y Squared loss for regression L(Y; f (X)) = (f (X) − Y)2 : Many other choices are possible, e.g., weighted misclassification loss. In classification, the vector of estimated probabilities (^p(k))k2Y can be returned, and in this case log-likelihood loss (or log loss) L(Y; ^p) = − log ^p(Y) is often used. Supervised Learning Decision Theory Risk n Paired observations f(xi; yi)gi=1 viewed as i.i.d. realizations of a random variable (X; Y) on X × Y with joint distribution PXY Risk For a given loss function L, the risk R of a learned function f is given by the expected loss R(f ) = EPXY [L(Y; f (X))] ; where the expectation is with respect to the true (unknown) joint distribution of (X; Y). The risk is unknown, but we can compute the empirical risk: n 1 X R (f ) = L(y ; f (x )): n n i i i=1 Supervised Learning Decision Theory The Bayes Classifier What is the optimal classifier if the joint distribution (X; Y) were known? The density g of X can be written as a mixture of K components (corresponding to each of the classes): K X g(x) = πkgk(x); k=1 where, for k = 1;:::; K, P(Y = k) = πk are the class probabilities, gk(x) is the conditional density of X, given Y = k. The Bayes classifier fBayes : x 7! f1;:::; Kg is the one with minimum risk: R(f ) =E [L(Y; f (X))] = EX EYjX[L(Y; f (X))jX] Z = E [L(Y; f (X))jX = x] g(x)dx X The minimum risk attained by the Bayes classifier is called Bayes risk. Minimizing E[L(Y; f (X))jX = x] separately for each x suffices. Supervised Learning Decision Theory The Bayes Classifier Consider the 0-1 loss. The risk simplifies to: K h i X E L(Y; f (X)) X = x = L(k; f (x))P(Y = kjX = x) k=1 =1 − P(Y = f (x)jX = x) The risk is minimized by choosing the class with the greatest posterior probability: fBayes(x) = arg max P(Y = kjX = x) k=1;:::;K π g (x) arg max k k arg max g x = PK = πk k( ): k=1;:::;K j=1 πjgj(x) k=1;:::;K The functions x 7! πkgk(x) are called discriminant functions. The discriminant function with maximum value determines the predicted class of x. Supervised Learning Decision Theory The Bayes Classifier: Example A simple two Gaussians example: Suppose X ∼ N (µY ; 1), where µ1 = −1 and µ2 = 1 and assume equal priors π1 = π2 = 1=2. 1 (x + 1)2 1 (x − 1)2 g1(x) = p exp − and g2(x) = p exp − : 2π 2 2π 2 0.25 0.4 0.20 0.3 0.15 0.2 marginal density conditional densities 0.10 0.1 0.05 0.0 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 x x ( 1 if x < 0, Optimal classification is fBayes(x) = arg max πkgk(x) = k=1;:::;K 2 if x ≥ 0. Supervised Learning Decision Theory The Bayes Classifier: Example How do you classify a new observation x if now the standard deviation is still 1 for class 1 but 1=3 for class 2? 1.2 1e−04 1.0 0.8 1e−11 0.6 1e−18 conditional densities conditional densities 0.4 1e−25 0.2 0.0 1e−32 −3 −2 −1 0 1 2 3 −3 −2 −1 0 1 2 3 x x Looking at density in a log-scale, optimal classification is to select class 2 if and only if x 2 [0:34; 2:16]. Supervised Learning Decision Theory Plug-in Classification The Bayes Classifier chooses the class with the greatest posterior probability fBayes(x) = arg max πkgk(x): k=1;:::;K We know neither the conditional densities gk nor the class probabilities πk! The plug-in classifier chooses the class f (x) = arg max π^k^gk(x); k=1;:::;K where we plugged in estimates π^k of πk and k = 1;:::; K and estimates ^gk(x) of conditional densities, Supervised Learning Linear Discriminant Analysis Linear Discriminant Analysis LDA is the simplest example of plug-in classification. Assume multivariate normal conditional density gk(x) for each class k: XjY = k ∼N (µk; Σ); 1 g (x) =(2π)−p=2jΣj−1=2 exp − (x − µ )>Σ−1(x − µ ) ; k 2 k k each class can have a different mean µk, all classes share the same covariance Σ. For an observation x, the k-th log-discriminant function is 1 log π g (x) = const + log π − (x − µ )>Σ−1(x − µ ) k k k 2 k k > = const + ak + bk x linear log-discriminant functions $ linear decision boundaries. > −1 The quantity (x − µk) Σ (x − µk) is the squared Mahalanobis distance between x and µk. 1 If Σ = Ip and πk = K , LDA simply chooses the class k with the nearest (in the Euclidean sense) class mean. Supervised Learning Linear Discriminant Analysis Iris Dataset ● ●● 2.5 ● ● ●●●● ● ● ● ● ● ● ● ●●●● ● ● ●●●● ● ● 2.0 ●● ● ● ●● ● ●● ● ● ● ● ● ● ● ● ● ● ●●● ●●● 1.5 ● ● ●●● ● ● ●●●●●●● ●● ● ● ● ● ●● Petal.Width ● ● ● ●● 1.0 ● ● 0.5 ● ●●● ● ●●● ● ● ●●●●●● ● ● ●● 1 2 3 4 5 6 7 Petal.Length Supervised Learning Statistical Learning Theory Generative vs Discriminative Learning Generative learning: find parameters which explain all the data available.
Recommended publications
  • Statistical Risk Estimation for Communication System Design: Results of the HETE-2 Test Case
    IPN Progress Report 42-197 • May 15, 2014 Statistical Risk Estimation for Communication System Design: Results of the HETE-2 Test Case Alessandra Babuscia* and Kar-Ming Cheung* ABSTRACT. — The Statistical Risk Estimation (SRE) technique described in this article is a methodology to quantify the likelihood that the major design drivers of mass and power of a space system meet the spacecraft and mission requirements and constraints through the design and development lifecycle. The SRE approach addresses the long-standing challenges of small sample size and unclear evaluation path of a space system, and uses a combina- tion of historical data and expert opinions to estimate risk. Although the methodology is applicable to the entire spacecraft, this article is focused on a specific subsystem: the com- munication subsystem. Using this approach, the communication system designers will be able to evaluate and to compare different communication architectures in a risk trade-off perspective. SRE was introduced in two previous papers. This article aims to present addi- tional results of the methodology by adding a new test case from a university mission, the High-Energy Transient Experiment (HETE)-2. The results illustrate the application of SRE to estimate the risks of exceeding constraints in mass and power, hence providing crucial risk information to support a project’s decision on requirements rescope and/or system redesign. I. Introduction The Statistical Risk Estimation (SRE) technique described in this article is a methodology to quantify the likelihood that the major design drivers of mass and power of a space system meet the spacecraft and mission requirements and constraints through the design and development lifecycle.
    [Show full text]
  • Chapter 4 How Do We Measure Risk?
    1 CHAPTER 4 HOW DO WE MEASURE RISK? If you accept the argument that risk matters and that it affects how managers and investors make decisions, it follows logically that measuring risk is a critical first step towards managing it. In this chapter, we look at how risk measures have evolved over time, from a fatalistic acceptance of bad outcomes to probabilistic measures that allow us to begin getting a handle on risk, and the logical extension of these measures into insurance. We then consider how the advent and growth of markets for financial assets has influenced the development of risk measures. Finally, we build on modern portfolio theory to derive unique measures of risk and explain why they might be not in accordance with probabilistic risk measures. Fate and Divine Providence Risk and uncertainty have been part and parcel of human activity since its beginnings, but they have not always been labeled as such. For much of recorded time, events with negative consequences were attributed to divine providence or to the supernatural. The responses to risk under these circumstances were prayer, sacrifice (often of innocents) and an acceptance of whatever fate meted out. If the Gods intervened on our behalf, we got positive outcomes and if they did not, we suffered; sacrifice, on the other hand, appeased the spirits that caused bad outcomes. No measure of risk was therefore considered necessary because everything that happened was pre-destined and driven by forces outside our control. This is not to suggest that the ancient civilizations, be they Greek, Roman or Chinese, were completely unaware of probabilities and the quantification of risk.
    [Show full text]
  • Loss-Based Risk Measures Rama Cont, Romain Deguest, Xuedong He
    Loss-Based Risk Measures Rama Cont, Romain Deguest, Xuedong He To cite this version: Rama Cont, Romain Deguest, Xuedong He. Loss-Based Risk Measures. 2011. hal-00629929 HAL Id: hal-00629929 https://hal.archives-ouvertes.fr/hal-00629929 Submitted on 7 Oct 2011 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Loss-based risk measures Rama CONT1,3, Romain DEGUEST 2 and Xue Dong HE3 1) Laboratoire de Probabilit´es et Mod`eles Al´eatoires CNRS- Universit´ePierre et Marie Curie, France. 2) EDHEC Risk Institute, Nice (France). 3) IEOR Dept, Columbia University, New York. 2011 Abstract Starting from the requirement that risk measures of financial portfolios should be based on their losses, not their gains, we define the notion of loss-based risk measure and study the properties of this class of risk measures. We charac- terize loss-based risk measures by a representation theorem and give examples of such risk measures. We then discuss the statistical robustness of estimators of loss-based risk measures: we provide a general criterion for qualitative ro- bustness of risk estimators and compare this criterion with sensitivity analysis of estimators based on influence functions.
    [Show full text]
  • Statistical Regularization and Learning Theory 1 Three Elements
    ECE901 Spring 2004 Statistical Regularization and Learning Theory Lecture: 1 Statistical Regularization and Learning Theory Lecturer: Rob Nowak Scribe: Rob Nowak 1 Three Elements of Statistical Data Analysis 1. Probabilistic Formulation of learning from data and prediction problems. 2. Performance Characterization: concentration inequalities uniform deviation bounds approximation theory rates of convergence 3. Practical Algorithms that run in polynomial time (e.g., decision trees, wavelet methods, support vector machines). 2 Learning from Data To formulate the basic learning from data problem, we must specify several basic elements: data spaces, probability measures, loss functions, and statistical risk. 2.1 Data Spaces Learning from data begins with a specification of two spaces: X ≡ Input Space Y ≡ Output Space The input space is also sometimes called the “feature space” or “signal domain.” The output space is also called the “class label space,” “outcome space,” “response space,” or “signal range.” Example 1 X = Rd d-dimensional Euclidean space of “feature vectors” Y = {0, 1} two classes or “class labels” Example 2 X = R one-dimensional signal domain (e.g., time-domain) Y = R real-valued signal A classic example is estimating a signal f in noise: Y = f(X) + W where X is a random sample point on the real line and W is a noise independent of X. 1 Statistical Regularization and Learning Theory 2 2.2 Probability Measure and Expectation Define a joint probability distribution on X × Y denoted PX,Y . Let (X, Y ) denote a pair of random variables distributed according to PX,Y . We will also have use for marginal and conditional distributions.
    [Show full text]
  • Chapter 2 — Risk and Risk Aversion
    Chapter 2 — Risk and Risk Aversion The previous chapter discussed risk and introduced the notion of risk aversion. This chapter examines those concepts in more detail. In particular, it answers questions like, when can we say that one prospect is riskier than another or one agent is more averse to risk than another? How is risk related to statistical measures like variance? Risk aversion is important in finance because it determines how much of a given risk a person is willing to take on. Consider an investor who might wish to commit some funds to a risk project. Suppose that each dollar invested will return x dollars. Ignoring time and any other prospects for investing, the optimal decision for an agent with wealth w is to choose the amount k to invest to maximize [uw (+− kx( 1) )]. The marginal benefit of increasing investment gives the first order condition ∂u 0 = = u′( w +− kx( 1)) ( x − 1) (1) ∂k At k = 0, the marginal benefit isuw′( ) [ x − 1] which is positive for all increasing utility functions provided the risk has a better than fair return, paying back on average more than one dollar for each dollar committed. Therefore, all risk averse agents who prefer more to less should be willing to take on some amount of the risk. How big a position they might take depends on their risk aversion. Someone who is risk neutral, with constant marginal utility, would willingly take an unlimited position, because the right-hand side of (1) remains positive at any chosen level for k. For a risk-averse agent, marginal utility declines as prospects improves so there will be a finite optimum.
    [Show full text]
  • Third Draft 1
    Guidelines on risk management practices in statistical organizations – THIRD DRAFT 1 GUIDELINES ON RISK MANAGEMENT PRACTICES IN STATISTICAL ORGANIZATIONS THIRD DRAFT November, 2016 Prepared by: In cooperation with: Guidelines on risk management practices in statistical organizations – THIRD DRAFT 2 This page has been left intentionally blank Guidelines on risk management practices in statistical organizations – THIRD DRAFT 3 TABLE OF CONTENTS List of Reviews .................................................................................................................................. 5 FOREWORD ....................................................................................................................................... 7 The guidelines ............................................................................................................................... 7 Definition of risk and risk management ...................................................................................... 9 SECTION 1: RISK MANAGEMENT FRAMEWORK ........................................................................... 13 1. Settling the risk management system ................................................................................... 15 1.1 Risk management mandate and strategy ............................................................................. 15 1.2 Establishing risk management policy .................................................................................... 17 1.3 Risk management approaches .............................................................................................
    [Show full text]
  • On Statistical Risk Assessment for Spatial Processes Manaf Ahmed
    On statistical risk assessment for spatial processes Manaf Ahmed To cite this version: Manaf Ahmed. On statistical risk assessment for spatial processes. Statistics [math.ST]. Université de Lyon, 2017. English. NNT : 2017LYSE1098. tel-01586218 HAL Id: tel-01586218 https://tel.archives-ouvertes.fr/tel-01586218 Submitted on 12 Sep 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. No d’ordre NNT : 2017LYSE1098 THÈSE DE DOCTORAT DE L’UNIVERSITÉ DE LYON opérée au sein de l’Université Claude Bernard Lyon 1 École Doctorale 512 InfoMaths Spécialité de doctorat : Mathématiques Appliquées/ Statistiques Soutenue publiquement le 29/06/2017, par : Manaf AHMED Sur l’évaluation statistique des risques pour les processus spatiaux On statistical risk assessment for spatial processes Devant le jury composé de : Bacro Jean-Noël, Professeur, Université Montpellier Rapporteur Brouste Alexandre, Professeur, Université du Maine Rapporteur Bel Liliane, Professeur, AgroParisTech Examinatrice Toulemonde Gwladys, Maître de conférences, Université Montpellier Examinatrice Vial Céline, Maître de conférences, Université Claude Bernard Lyon 1 Directrice de thèse Maume-Deschamps Véronique, Professeur,Université Claude Bernard Lyon 1 Co-Directrice de thèse Ribereau Pierre, Maître de conférences, Université Claude Bernard Lyon 1 Co-Directeur de thèse Résumé La modélisation probabiliste des événements climatiques et environnementaux doit prendre en compte leur nature spatiale.
    [Show full text]
  • Bipartite Ranking: a Risk-Theoretic Perspective
    Journal of Machine Learning Research 17 (2016) 1-102 Submitted 7/14; Revised 9/16; Published 11/16 Bipartite Ranking: a Risk-Theoretic Perspective Aditya Krishna Menon [email protected] Robert C. Williamson [email protected] Data61 and the Australian National University Canberra, ACT, Australia Editor: Nicolas Vayatis Abstract We present a systematic study of the bipartite ranking problem, with the aim of explicating its connections to the class-probability estimation problem. Our study focuses on the properties of the statistical risk for bipartite ranking with general losses, which is closely related to a generalised notion of the area under the ROC curve: we establish alternate representations of this risk, relate the Bayes-optimal risk to a class of probability divergences, and characterise the set of Bayes-optimal scorers for the risk. We further study properties of a generalised class of bipartite risks, based on the p-norm push of Rudin(2009). Our analysis is based on the rich framework of proper losses, which are the central tool in the study of class-probability estimation. We show how this analytic tool makes transparent the generalisations of several existing results, such as the equivalence of the minimisers for four seemingly disparate risks from bipartite ranking and class-probability estimation. A novel practical implication of our analysis is the design of new families of losses for scenarios where accuracy at the head of ranked list is paramount, with comparable empirical performance to the p-norm push. Keywords: bipartite ranking, class-probability estimation, proper losses, Bayes-optimality, ranking the best 1.
    [Show full text]
  • A Critical Review of Fair Machine Learning∗
    The Measure and Mismeasure of Fairness: A Critical Review of Fair Machine Learning∗ Sam Corbett-Davies Sharad Goel Stanford University Stanford University September 11, 2018 Abstract The nascent field of fair machine learning aims to ensure that decisions guided by algorithms are equitable. Over the last several years, three formal definitions of fairness have gained promi- nence: (1) anti-classification, meaning that protected attributes|like race, gender, and their proxies|are not explicitly used to make decisions; (2) classification parity, meaning that com- mon measures of predictive performance (e.g., false positive and false negative rates) are equal across groups defined by the protected attributes; and (3) calibration, meaning that conditional on risk estimates, outcomes are independent of protected attributes. Here we show that all three of these fairness definitions suffer from significant statistical limitations. Requiring anti- classification or classification parity can, perversely, harm the very groups they were designed to protect; and calibration, though generally desirable, provides little guarantee that decisions are equitable. In contrast to these formal fairness criteria, we argue that it is often preferable to treat similarly risky people similarly, based on the most statistically accurate estimates of risk that one can produce. Such a strategy, while not universally applicable, often aligns well with policy objectives; notably, this strategy will typically violate both anti-classification and classification parity. In practice, it requires significant effort to construct suitable risk estimates. One must carefully define and measure the targets of prediction to avoid retrenching biases in the data. But, importantly, one cannot generally address these difficulties by requiring that algorithms satisfy popular mathematical formalizations of fairness.
    [Show full text]
  • Risk and Uncertainty
    M Q College RISK AND UNCERTAINTY Janet Gough July 1988 Information Paper No. 10 Centre for Resource Management University of Canterbury and Lincoln College Parks, Rec(~E'Uon & ToV,.:.m D~ po. 80x. 2.i1 Lincoln Ur,Nersity canterbury _ • ""1'~ .... I~l"'1Ir\ The Centre for Resource Management acknowledges the financial support received from the Ministry for the Environment in the production of this publication. The Centre for Resource Management offers research staff the freedom of enquiry. Therefore, the views expressed in this publication are those of the author and do not necessarily reflect those of the Centre for Resource Management or Ministry for the Environment. ISSN 0112-0875 ISBN 1-86931-065-9 Contents page Summary 1. An introduction 1 2. About this report 3 3. What do we mean by risk and uncertainty? 6 3.1 Distinction between risk and uncertainty 7 3.2 Risk versus hazard 9 3.3 Characteristics of risk 10 3.3.1 Quantification of risk and decision criteria 12 3.4 Types of risk 13 3.5 Risk factors 16 4. Approaches to risk - what is risk analysis? 17 4.1 Risk perception 18 4.2 Risk assessment 21 4.2.1 Identification 23 4.2.2 Estimation 25 4.2.3 Evaluation 28 4.3 Acceptable risk 30 4.3.1 Risk comparisons 37 4.3.2 Preferences 39 4.3.3 Value of Life Discussion 41 4.3.4 Acceptable risk problems as decision 42 making problems 4.4 Risk management 43 5. Experience with risk assessment 46 5.1 Quantitative risk assessment 47 5.2 Communication, implementation and monitoring 48 5.2.1 The role of the media 50 5.2.2 Conflict between 'actual' and 'perceived' risk 51 5.3 Is there any hope? 53 6.
    [Show full text]
  • 1 Value of Statistical Life Analysis and Environmental Policy
    Value of Statistical Life Analysis and Environmental Policy: A White Paper Chris Dockins, Kelly Maguire, Nathalie Simon, Melonie Sullivan U.S. Environmental Protection Agency National Center for Environmental Economics April 21, 2004 For presentation to Science Advisory Board - Environmental Economics Advisory Committee Correspondence: Nathalie Simon 1200 Pennsylvania Ave., NW (1809T) Washington, DC 20460 202-566-2347 [email protected] 1 Table of Contents Value of Statistical Life Analysis and Environmental Policy: A White Paper 1 Introduction..............................................................................................................3 2 Current Guidance on Valuing Mortality Risks........................................................3 2.1 “Adjustments” to the Base VSL ..................................................................4 2.2 Sensitivity and Alternate Estimates .............................................................5 3 Robustness of Estimates From Mortality Risk Valuation Literature.......................6 3.1 Hedonic Wage Literature.............................................................................6 3.2 Contingent Valuation Literature ..................................................................8 3.3 Averting Behavior Literature.....................................................................10 4 Meta Analyses of the Mortality Risk Valuation Literature ...................................11 4.1 Summary of Kochi, Hubbell, and Kramer .................................................12
    [Show full text]
  • 2DI70 - Statistical Learning Theory Lecture Notes
    2DI70 - Statistical Learning Theory Lecture Notes Rui Castro April 3, 2018 Some of the material in these notes will be published by Cambridge University Press as Statistical Machine Learning: A Gentle Primer by Rui M. Castro and Robert D. Nowak. This early draft is free to view and download for personal use only. Not for re-distribution, re-sale or use in derivative works. c Rui M. Castro and Robert D. Nowak, 2017. 2 Contents Contents 2 1 Introduction 9 1.1 Learning from Data..................................9 1.1.1 Data Spaces...................................9 1.1.2 Loss Functions................................. 10 1.1.3 Probability Measure and Expectation.................... 11 1.1.4 Statistical Risk................................. 11 1.1.5 The Learning Problem............................. 12 2 Binary Classification and Regression 15 2.1 Binary Classification.................................. 15 2.2 Regression........................................ 18 2.3 Empirical Risk Minimization............................. 19 2.4 Model Complexity and Overfitting.......................... 21 2.5 Exercises........................................ 25 3 Competing Goals: approximation vs. estimation 29 3.1 Strategies To Avoid Overfitting............................ 30 3.1.1 Method of Sieves................................ 31 3.1.2 Complexity Penalization Methods...................... 32 3.1.3 Hold-out Methods............................... 33 4 Estimation of Lipschitz smooth functions 37 4.1 Setting.......................................... 37 4.2 Analysis........................................
    [Show full text]