Graphical Models

Total Page:16

File Type:pdf, Size:1020Kb

Graphical Models Graphical Models Zoubin Ghahramani University of Cambridge, UK Uber AI Labs, San Francisco, USA [email protected] http://learning.eng.cam.ac.uk/zoubin/ MLSS 2017 T¨ubingen Representing knowledge through graphical models A B C D E • Nodes correspond to random variables • Edges represent statistical dependencies between the variables Why do we need graphical models? • Graphs are an intuitive way of representing and visualising the relationships between many variables. (Examples: family trees, electric circuit diagrams, neural networks) • A graph allows us to abstract out the conditional independence relationships between the variables from the details of their parametric forms. Thus we can answer questions like: \Is A dependent on B given that we know the value of C?" just by looking at the graph. • Graphical models allow us to define general message-passing algorithms that implement probabilistic inference efficiently. Thus we can answer queries like \What is p(AjC = c)?" without enumerating all settings of all variables in the model. Graphical models = statistics × graph theory × computer science. Directed Acyclic Graphical Models (Bayesian Networks) A B C D A DAG Model / Bayesian network1 corresponds to a factorization of the joint probability distribution: E p(A; B; C; D; E) = p(A)p(B)p(CjA; B)p(DjB; C)p(EjC; D) In general: n Y p(X1;:::;Xn) = p(XijXpa(i)) i=1 where pa(i) are the parents of node i. 1\Bayesian networks" can and often are learned using non-Bayesian (i.e. frequentist) methods; Bayesian networks (i.e. DAGs) do not require parameter or structure learning using Bayesian methods. Also called \belief networks". Directed Acyclic Graphical Models (Bayesian Networks) A B Semantics: X??Y jV if V d-separatesCX from Y Definition: V d-separates X from Y if every undirected path2 between X and Y is blocked by V. A path is blocked by V if there is aD node W on the path such that either: E 1. W has converging arrows along the path (! W )3 and neither W nor its descendants are observed (in V), or 2. W does not have converging arrows along the path (! W ! or W !) and W is observed (W 2 V). Corollary: Markov Boundary for X: fparents(X) [ children(X) [ parents-of-children(X)g. 2An undirected path ignores the direction of the edges. 3Note that converging arrows along the path only refers to what happens on that path. Also called a collider. Directed Graphs for Statistical Models: Plate Notation Consider the following simple model. A data set of N points is generated i.i.d. from a Gaussian with mean µ and standard deviation σ: N Y p(x1; : : : ; xN ; µ, σ) = p(µ)p(σ) p(xnjµ, σ) n=1 This can be represented graphically as follows: μ σ μ σ ≡ xn x1 x2 ... xN N Inference in a graphical model A B Consider the following graph: which represents: p(A; B; C; D; E) = p(A)p(B)p(CjA; B)p(DjB; C)p(EjC; D) C Inference: evaluate the probability distribution over some set of variables, given the values of another set of variables. D For example, how can we computeE p(AjC = c)? Assume each variable is binary. Naive method: X p(A; C = c) = p(A; B; C = c; D; E) [16 terms] B;D;E X p(C = c) = p(A; C = c) [2 terms] A p(A; C = c) p(AjC = c) = [2 terms] p(C = c) Total: 16+2+2 = 20 terms have to be computed and summed Inference in a graphical model A B Consider the following graph: which represents: p(A; B; C; D; E) = p(A)p(BC)p(CjA; B)p(DjB; C)p(EjC; D) Computing p(AjC = c). D More efficient method: E X p(A; C = c) = p(A)p(B)p(C = cjA; B)p(DjB; C = c)p(EjC = c; D) B;D;E X X X = p(A)p(B)p(C = cjA; B) p(DjB; C = c) p(EjC = c; D) B D E X = p(A)p(B)p(C = cjA; B) [4 terms] B Total: 4+2+2 = 8 terms Belief propagation methods use the conditional independence relationships in a graph to do efficient inference (for singly connected graphs, exponential gains in efficiency!). Factor graph propagation Algorithmically and implementationally, it's often easier to convert directed and undirected graphs into factor graphs, and run factor graph propagation. p(x) = p(x1)p(x2jx1)p(x3jx2)p(x4jx2) ≡ f1(x1; x2)f2(x2; x3)f3(x2; x4) Singly connected vs Multiply connected factor graphs: x3 x3 f 2 f2 f4 f 1 x f1 2 x2 x1 f x4 x x 3 1 f3 4 Factor Graphs In a factor graph, the joint probability distribution is written as a product of factors. Consider a vector of variables x = (x1; : : : ; xn) 1 Y p(x) = p(x ; : : : ; x ) = f (x ) 1 n Z j Sj j where Z is the normalisation constant, Sj denotes the subset of f1; : : : ; ng which participate in factor fj and xSj = fxi : i 2 Sjg. x3 f2 f4 f1 x2 x1 f x4 variables nodes: we draw open circles for each variable xi in the distribution.3 factor nodes: we draw filled dots for each factor fj in the distribution. Propagation in Factor Graphs Let n(x) denote the set of factor nodes that are neighbors of x. Let n(f) denote the set of variable nodes that are neighbors of f. We can compute probabilities in a factor graph by propagating messages from variable nodes to factor nodes and viceversa. message from variable x to factor f: Y µx!f (x) = µh!x(x) h2n(x)nffg message from factor f to variable x: 0 1 X Y µf!x(x) = @f(x) µy!f (y)A xnx y2n(f)nfxg P where x are the variables that factor f depends on, and xnx is a sum over all variables neighboring factor f except x. Propagation in Factor Graphs n(x) denotes the set of factor nodes that are neighbors of x. n(f) denotes the set of variable nodes that are neighbors of f. message from variable x to factor f: Y µx!f (x) = µh!x(x) h2n(x)nffg message from factor f to variable x: 0 1 X Y µf!x(x) = @f(x) µy!f (y)A xnx y2n(f)nfxg If a variable has only one factor as a neighbor, it can initiate message propagation. Once a variable has received all messages from its neighboring factor nodes, one can compute the probability of that variable by multiplying all the messages and renormalising: Y p(x) / µh!x(x) h2n(x) Inference in Hidden markov models and Linear Gaussian state-space models ¢ ¡ X 1 X 2 X 3 X T ¢ ¡ Y1 Y2 Y 3 YT T Y p(X1;:::;T ;Y1;:::;T ) = p(X1)p(Y1jX1) [p(XtjXt−1)p(YtjXt)] t=2 • In HMMs, the states Xt are discrete. • In linear Gaussian SSMs, the states are real Gaussian vectors. • Both HMMs and SSMs can be represented as singly connected DAGs. • The forward{backward algorithm in hidden Markov models (HMMs), and the Kalman smoothing algorithm in SSMs are both instances of belief propagation / factor graph propagation. Software for Graphical Models • BUGS and WinBUGS: inference via Gibbs sampling, not very scalable • HUGIN: widely used, commercial, focus on exact inference • Kevin Murphy's Bayes Net Toolbox: Matlab, widely used • Microsoft's Infer.NET: advanced scalable libraries implementing factor graph propagation, EP, and variational message passing. • Jeff Bilmes' GMTK: very good at HMMs and related time series models • many others, see http://people.cs.ubc.ca/∼murphyk/Software/bnsoft.html • Much of this is subsumed by general Probabilistic Programming frameworks: e.g. Church, WebPPL, Turing, Anglican, Edward, STAN,... Summary • inference consists of the problem of computing p(variables of interestjobserved variables) • for singly connected graphs, belief propagation/ factor graph propagation solves this problem exactly. • well-known algorithms such as Kalman smoothing and forward-backward are special cases these general propagation algorithms. • for multiply connected graphs, the junction tree algorithm solves the exact inference problem, but can be very slow (exponential in the cardinality of the largest clique). • one approximate inference algorithm is \loopy belief propagation"|run propagation as if graph is simply connected; often works well in practice. Learning parameters x1 x2 θ2 x2 0.2 0.3 0.5 x x3 1 x4 0.1 0.6 0.3 p(x1)p(x2jx1)p(x3jx1)p(x4jx2) Assume each variable xi is discrete and can take on Ki values. The parameters of this model can be represented as 4 tables: θ1 has K1 entries, θ2 has K1 × K2 entries, etc. These are called conditional probability tables (CPTs) with the following semantics: 0 p(x1 = k) = θ1;k p(x2 = k jx1 = k) = θ2;k;k0 If node i has M parents, θi can be represented either as an M + 1 dimensional Q table, or as a 2-dimensional table with j2pa(i) Kj × Ki entries by collapsing all P the states of the parents of node i. Note that k0 θi;k;k0 = 1. (n) N Assume a data set D = fx gn=1. How do we learn θ from D? Learning parameters x 1 (n) N Assume a data set D = fx gn=1. How do we learn θ from D? x2 p(xjθ) = p(x1jθ1)p(x2jx1; θ2)p(x3jx1; θ3)p(x4jx2; θ4) x3 Likelihood: x N 4 Y p(Djθ) = p(x(n)jθ) n=1 Log Likelihood: N X X (n) (n) log p(Djθ) = log p(xi jxpa(i); θi) n=1 i This decomposes into sum of functions of θi.
Recommended publications
  • Bayesian Network - Wikipedia
    2019/9/16 Bayesian network - Wikipedia Bayesian network A Bayesian network, Bayes network, belief network, decision network, Bayes(ian) model or probabilistic directed acyclic graphical model is a probabilistic graphical model (a type of statistical model) that represents a set of variables and their conditional dependencies via a directed acyclic graph (DAG). Bayesian networks are ideal for taking an event that occurred and predicting the likelihood that any one of several possible known causes was the contributing factor. For example, a Bayesian network A simple Bayesian network. Rain could represent the probabilistic relationships between diseases and influences whether the sprinkler symptoms. Given symptoms, the network can be used to compute the is activated, and both rain and probabilities of the presence of various diseases. the sprinkler influence whether the grass is wet. Efficient algorithms can perform inference and learning in Bayesian networks. Bayesian networks that model sequences of variables (e.g. speech signals or protein sequences) are called dynamic Bayesian networks. Generalizations of Bayesian networks that can represent and solve decision problems under uncertainty are called influence diagrams. Contents Graphical model Example Inference and learning Inferring unobserved variables Parameter learning Structure learning Statistical introduction Introductory examples Restrictions on priors Definitions and concepts Factorization definition Local Markov property Developing Bayesian networks Markov blanket Causal networks Inference complexity and approximation algorithms Software History See also Notes References https://en.wikipedia.org/wiki/Bayesian_network 1/14 2019/9/16 Bayesian network - Wikipedia Further reading External links Graphical model Formally, Bayesian networks are DAGs whose nodes represent variables in the Bayesian sense: they may be observable quantities, latent variables, unknown parameters or hypotheses.
    [Show full text]
  • An Introduction to Factor Graphs
    An Introduction to Factor Graphs Hans-Andrea Loeliger MLSB 2008, Copenhagen 1 Definition A factor graph represents the factorization of a function of several variables. We use Forney-style factor graphs (Forney, 2001). Example: f(x1, x2, x3, x4, x5) = fA(x1, x2, x3) · fB(x3, x4, x5) · fC(x4). fA fB x1 x3 x5 x2 x4 fC Rules: • A node for every factor. • An edge or half-edge for every variable. • Node g is connected to edge x iff variable x appears in factor g. (What if some variable appears in more than 2 factors?) 2 Example: Markov Chain pXYZ(x, y, z) = pX(x) pY |X(y|x) pZ|Y (z|y). X Y Z pX pY |X pZ|Y We will often use capital letters for the variables. (Why?) Further examples will come later. 3 Message Passing Algorithms operate by passing messages along the edges of a factor graph: - - - 6 6 ? ? - - - - - ... 6 6 ? ? 6 6 ? ? 4 A main point of factor graphs (and similar graphical notations): A Unified View of Historically Different Things Statistical physics: - Markov random fields (Ising 1925) Signal processing: - linear state-space models and Kalman filtering: Kalman 1960. - recursive least-squares adaptive filters - Hidden Markov models: Baum et al. 1966. - unification: Levy et al. 1996. Error correcting codes: - Low-density parity check codes: Gallager 1962; Tanner 1981; MacKay 1996; Luby et al. 1998. - Convolutional codes and Viterbi decoding: Forney 1973. - Turbo codes: Berrou et al. 1993. Machine learning, statistics: - Bayesian networks: Pearl 1988; Shachter 1988; Lauritzen and Spiegelhalter 1988; Shafer and Shenoy 1990. 5 Other Notation Systems for Graphical Models Example: p(u, w, x, y, z) = p(u)p(w)p(x|u, w)p(y|x)p(z|x).
    [Show full text]
  • Bayesian Methods for Unsupervised Learning
    Bayesian Methods for Unsupervised Learning Zoubin Ghahramani Gatsby Computational Neuroscience Unit University College London, UK [email protected] http://www.gatsby.ucl.ac.uk Mathematical Psychology Conference July 2003 What is Unsupervised Learning? Unsupervised learning: given some data, learn a probabilistic model of the data: clustering models (e.g. k-means, mixture models) • dimensionality reduction models (e.g. factor analysis, PCA) • generative models which relate the hidden causes or sources to the observed data • (e.g. ICA, hidden markov models) models of conditional independence between variables (e.g. graphical models) • other models of the data density • Supervised learning: given some input and target data, learn a mapping from inputs to targets: classification/discrimination (e.g. perceptron, logistic regression) • function approximation (e.g. linear regression) • Reinforcement learning: systems learns to produce actions while interacting in an environment so as to maximize its expected sum of long-term rewards. Formally equivalent to sequential decision theory and optimal adaptive control theory. Bayesian Learning Consider a data set , and a model m with parameters θ. D Prior over model parameters: p(θ m) Likelihood of model parameters for dataj set : p( θ; m) Prior over model class: p(m) D Dj The likelihood and parameter priors are combined into the posterior for a particular model; batch and online versions: p( θ; m)p(θ m) p(x θ; ; m)p(θ ; m) p(θ ; m) = Dj j p(θ ; x; m) = j D jD jD p( m) jD p(x ; m) Dj jD Predictions are made by integrating over the posterior: p(x ; m) = Z dθ p(x θ; ; m) p(θ ; m): jD j D jD Bayesian Model Comparison A data set , and a model m with parameters θ.
    [Show full text]
  • Graphical Models for Discrete and Continuous Data Arxiv:1609.05551V3
    Graphical Models for Discrete and Continuous Data Rui Zhuang Department of Biostatistics, University of Washington and Noah Simon Department of Biostatistics, University of Washington and Johannes Lederer Department of Mathematics, Ruhr-University Bochum June 18, 2019 Abstract We introduce a general framework for undirected graphical models. It generalizes Gaussian graphical models to a wide range of continuous, discrete, and combinations of different types of data. The models in the framework, called exponential trace models, are amenable to estimation based on maximum likelihood. We introduce a sampling-based approximation algorithm for computing the maximum likelihood estimator, and we apply this pipeline to learn simultaneous neural activities from spike data. Keywords: Non-Gaussian Data, Graphical Models, Maximum Likelihood Estimation arXiv:1609.05551v3 [math.ST] 15 Jun 2019 1 Introduction Gaussian graphical models (Drton & Maathuis 2016, Lauritzen 1996, Wainwright & Jordan 2008) describe the dependence structures in normally distributed random vectors. These models have become increasingly popular in the sciences, because their representation of the dependencies is lucid and can be readily estimated. For a brief overview, consider a 1 random vector X 2 Rp that follows a centered normal distribution with density 1 −x>Σ−1x=2 fΣ(x) = e (1) (2π)p=2pjΣj with respect to Lebesgue measure, where the population covariance Σ 2 Rp×p is a symmetric and positive definite matrix. Gaussian graphical models associate these densities with a graph (V; E) that has vertex set V := f1; : : : ; pg and edge set E := f(i; j): i; j 2 −1 f1; : : : ; pg; i 6= j; Σij 6= 0g: The graph encodes the dependence structure of X in the sense that any two entries Xi;Xj; i 6= j; are conditionally independent given all other entries if and only if (i; j) 2= E.
    [Show full text]
  • One-Network Adversarial Fairness
    One-network Adversarial Fairness Tameem Adel Isabel Valera Zoubin Ghahramani Adrian Weller University of Cambridge, UK MPI-IS, Germany University of Cambridge, UK University of Cambridge, UK [email protected] [email protected] Uber AI Labs, USA The Alan Turing Institute, UK [email protected] [email protected] Abstract is general, in that it may be applied to any differentiable dis- criminative model. We establish a fairness paradigm where There is currently a great expansion of the impact of machine the architecture of a deep discriminative model, optimized learning algorithms on our lives, prompting the need for ob- for accuracy, is modified such that fairness is imposed (the jectives other than pure performance, including fairness. Fair- ness here means that the outcome of an automated decision- same paradigm could be applied to a deep generative model making system should not discriminate between subgroups in future work). Beginning with an ordinary neural network characterized by sensitive attributes such as gender or race. optimized for prediction accuracy of the class labels in a Given any existing differentiable classifier, we make only classification task, we propose an adversarial fairness frame- slight adjustments to the architecture including adding a new work performing a change to the network architecture, lead- hidden layer, in order to enable the concurrent adversarial op- ing to a neural network that is maximally uninformative timization for fairness and accuracy. Our framework provides about the sensitive attributes of the data as well as predic- one way to quantify the tradeoff between fairness and accu- tive of the class labels.
    [Show full text]
  • Remote Sensing
    remote sensing Article A Method for the Estimation of Finely-Grained Temporal Spatial Human Population Density Distributions Based on Cell Phone Call Detail Records Guangyuan Zhang 1, Xiaoping Rui 2,* , Stefan Poslad 1, Xianfeng Song 3 , Yonglei Fan 3 and Bang Wu 1 1 School of Electronic Engineering and Computer Science, Queen Mary University of London, London E1 4NS, UK; [email protected] (G.Z.); [email protected] (S.P.); [email protected] (B.W.) 2 School of Earth Sciences and Engineering, Hohai University, Nanjing 211000, China 3 College of Resources and Environment, University of Chinese Academy of Sciences, Beijing 100049, China; [email protected] (X.S.); [email protected] (Y.F.) * Correspondence: [email protected] Received: 10 June 2020; Accepted: 8 August 2020; Published: 10 August 2020 Abstract: Estimating and mapping population distributions dynamically at a city-wide spatial scale, including those covering suburban areas, has profound, practical, applications such as urban and transportation planning, public safety warning, disaster impact assessment and epidemiological modelling, which benefits governments, merchants and citizens. More recently, call detail record (CDR) of mobile phone data has been used to estimate human population distributions. However, there is a key challenge that the accuracy of such a method is difficult to validate because there is no ground truth data for the dynamic population density distribution in time scales such as hourly. In this study, we present a simple and accurate method to generate more finely grained temporal-spatial population density distributions based upon CDR data. We designed an experiment to test our method based upon the use of a deep convolutional generative adversarial network (DCGAN).
    [Show full text]
  • On Construction of Hybrid Logistic Regression-Naıve Bayes Model For
    JMLR: Workshop and Conference Proceedings vol 52, 523-534, 2016 PGM 2016 On Construction of Hybrid Logistic Regression-Na¨ıve Bayes Model for Classification Yi Tan & Prakash P. Shenoy [email protected], [email protected] University of Kansas School of Business 1654 Naismith Drive Lawrence, KS 66045 Moses W. Chan & Paul M. Romberg fMOSES.W.CHAN, [email protected] Advanced Technology Center Lockheed Martin Space Systems Sunnyvale, CA 94089 Abstract In recent years, several authors have described a hybrid discriminative-generative model for clas- sification. In this paper we examine construction of such hybrid models from data where we use logistic regression (LR) as a discriminative component, and na¨ıve Bayes (NB) as a generative com- ponent. First, we estimate a Markov blanket of the class variable to reduce the set of features. Next, we use a heuristic to partition the set of features in the Markov blanket into those that are assigned to the LR part, and those that are assigned to the NB part of the hybrid model. The heuristic is based on reducing the conditional dependence of the features in NB part of the hybrid model given the class variable. We implement our method on 21 different classification datasets, and we compare the prediction accuracy of hybrid models with those of pure LR and pure NB models. Keywords: logistic regression, na¨ıve Bayes, hybrid logistic regression-na¨ıve Bayes 1. Introduction Data classification is a very common task in machine learning and statistics. It is considered an in- stance of supervised learning. For a standard supervised learning problem, we implement a learning algorithm on a set of training examples, which are pairs consisting of an input object, a vector of explanatory variables F1;:::;Fn, called features, and a desired output object, a response variable C, called class variable.
    [Show full text]
  • Graphical Models
    Graphical Models Michael I. Jordan Computer Science Division and Department of Statistics University of California, Berkeley 94720 Abstract Statistical applications in fields such as bioinformatics, information retrieval, speech processing, im- age processing and communications often involve large-scale models in which thousands or millions of random variables are linked in complex ways. Graphical models provide a general methodology for approaching these problems, and indeed many of the models developed by researchers in these applied fields are instances of the general graphical model formalism. We review some of the basic ideas underlying graphical models, including the algorithmic ideas that allow graphical models to be deployed in large-scale data analysis problems. We also present examples of graphical models in bioinformatics, error-control coding and language processing. Keywords: Probabilistic graphical models; junction tree algorithm; sum-product algorithm; Markov chain Monte Carlo; variational inference; bioinformatics; error-control coding. 1. Introduction The fields of Statistics and Computer Science have generally followed separate paths for the past several decades, which each field providing useful services to the other, but with the core concerns of the two fields rarely appearing to intersect. In recent years, however, it has become increasingly evident that the long-term goals of the two fields are closely aligned. Statisticians are increasingly concerned with the computational aspects, both theoretical and practical, of models and inference procedures. Computer scientists are increasingly concerned with systems that interact with the external world and interpret uncertain data in terms of underlying probabilistic models. One area in which these trends are most evident is that of probabilistic graphical models.
    [Show full text]
  • Neural Trees for Learning on Graphs Rajat Talak, Siyi Hu, Lisa Peng, and Luca Carlone
    1 Neural Trees for Learning on Graphs Rajat Talak, Siyi Hu, Lisa Peng, and Luca Carlone Abstract—Graph Neural Networks (GNNs) have emerged as a Various GNN architectures have been proposed, that go flexible and powerful approach for learning over graphs. Despite beyond local message passing, in order to improve expressivity. this success, existing GNNs are constrained by their local message- Graph isomorphism testing, G-invariant function approximation, passing architecture and are provably limited in their expressive power. In this work, we propose a new GNN architecture – and the generalized k-order WL (k-WL) tests have served as the Neural Tree. The neural tree architecture does not perform end objectives and guided recent progress of this inquiry. For message passing on the input graph, but on a tree-structured example, k-order linear GNN [Maron et al., 2019b] and k-order graph, called the H-tree, that is constructed from the input graph. folklore GNN [Maron et al., 2019a] have expressive powers Nodes in the H-tree correspond to subgraphs in the input graph, equivalent to k-WL and (k + 1)-WL test, respectively [Azizian and they are reorganized in a hierarchical manner such that a parent-node of a node in the H-tree always corresponds to a and Lelarge, 2021]. While these architectures can theoretically larger subgraph in the input graph. We show that the neural tree approximate any G-invariant function (as k ! 1), they use architecture can approximate any smooth probability distribution k-order tensors for representations, rendering them impractical function over an undirected graph, as well as emulate the junction for any k ≥ 3.
    [Show full text]
  • Duality of Graphical Models and Tensor Networks Arxiv:1710.01437V1
    Duality of Graphical Models and Tensor Networks Elina Robeva and Anna Seigal Abstract In this article we show the duality between tensor networks and undirected graphical models with discrete variables. We study tensor networks on hypergraphs, which we call tensor hypernetworks. We show that the tensor hypernetwork on a hypergraph exactly corresponds to the graphical model given by the dual hypergraph. We translate various notions under duality. For example, marginalization in a graphical model is dual to contraction in the tensor network. Algorithms also translate under duality. We show that belief propagation corresponds to a known algorithm for tensor network contraction. This article is a reminder that the research areas of graphical models and tensor networks can benefit from interaction. 1 Introduction Graphical models and tensor networks are very popular but mostly separate fields of study. Graphical models are used in artificial intelligence, machine learning, and statistical mechan- ics [16]. Tensor networks show up in areas such as quantum information theory and partial differential equations [7, 10]. Tensor network states are tensors which factor according to the adjacency structure of the vertices of a graph. On the other hand, graphical models are probability distributions which factor according to the clique structure of a graph. The joint probability distribution of several discrete random variables is naturally organized into a tensor. Hence both graphical arXiv:1710.01437v1 [math.ST] 4 Oct 2017 models and tensor networks are ways to represent families of tensors that factorize according to a graph structure. The relationship between particular graphical models and particular tensor networks has been studied in the past.
    [Show full text]
  • Data-Driven Topo-Climatic Mapping with Machine Learning Methods
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by RERO DOC Digital Library Nat Hazards (2009) 50:497–518 DOI 10.1007/s11069-008-9339-y ORIGINAL PAPER Data-driven topo-climatic mapping with machine learning methods A. Pozdnoukhov Æ L. Foresti Æ M. Kanevski Received: 31 October 2007 / Accepted: 19 December 2008 / Published online: 16 January 2009 Ó Springer Science+Business Media B.V. 2009 Abstract Automatic environmental monitoring networks enforced by wireless commu- nication technologies provide large and ever increasing volumes of data nowadays. The use of this information in natural hazard research is an important issue. Particularly useful for risk assessment and decision making are the spatial maps of hazard-related parameters produced from point observations and available auxiliary information. The purpose of this article is to present and explore the appropriate tools to process large amounts of available data and produce predictions at fine spatial scales. These are the algorithms of machine learning, which are aimed at non-parametric robust modelling of non-linear dependencies from empirical data. The computational efficiency of the data-driven methods allows producing the prediction maps in real time which makes them superior to physical models for the operational use in risk assessment and mitigation. Particularly, this situation encounters in spatial prediction of climatic variables (topo-climatic mapping). In complex topographies of the mountainous regions, the meteorological processes are highly influ- enced by the relief. The article shows how these relations, possibly regionalized and non- linear, can be modelled from data using the information from digital elevation models.
    [Show full text]
  • Machine Learning
    Machine Learning Bert Kappen SNN Radboud University, Nijmegen Gatsby Unit, UCL London October 9, 2020 Bert Kappen Course setup • 3 ec course. 7 weeks. • Lectures are prerecorded. Available through brightspace. • examination based on weekly exercises only – You make a group of maximal 3 persons – Each week, you make the exercises with your group. – You can ask questions in tutorial class – Your group hands them in before the next tutorial class. Hard rule, because answers may be discussed in tutorial class. • All course materials (slides, exercises) and schedule via http://www.snn.ru.nl/~bertk/machinelearning/ Bert Kappen ML 1 Content 1. Probabilities, Bayes rule, information theory 2. Model selection 3. Classification, perceptron, gradient descent learning rules 4. Multi layered perceptrons 5. Deep learning 6. Graphical models, latent variable models, EM 7. Variational autoencoders Bert Kappen ML 2 Lecture 1a Based on Mackay ch. 2: Probability, Entropy and Inference • Probabilities • Bayesian inference, inverse probabilities, Bayes rule Bert Kappen ML 3 Forward probabilities Forward probabilities is the usual way to compute the probabilities of possible out- comes given a probability model p(xj f ) ! p(outcomej f ) Exercise 2.4 Urn with B black balls and W white balls and K = W + B. Draw N times a ball from the urn with replacement. What is the probability to draw NB black balls? Bert Kappen ML 4 Forward probabilities Define f = B=K the fraction of black balls in the urn. Then N p(NB = 0) = (1 − f ) N−1 p(NB = 1) = f (1 − f ) N ! N NB N−NB p(NB) = f (1 − f ) NB Expected value and variance: N N X X 2 ENB = p(NB)NB = ::: = N f VNB = p(NB)(NB − N f ) = ::: = N f (1 − f ) NB=0 NB=0 Suppose K = 10; B = 2; f = 0:2.
    [Show full text]