Bayesian Programming
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
Bayes and the Law
Bayes and the Law Norman Fenton, Martin Neil and Daniel Berger [email protected] January 2016 This is a pre-publication version of the following article: Fenton N.E, Neil M, Berger D, “Bayes and the Law”, Annual Review of Statistics and Its Application, Volume 3, 2016, doi: 10.1146/annurev-statistics-041715-033428 Posted with permission from the Annual Review of Statistics and Its Application, Volume 3 (c) 2016 by Annual Reviews, http://www.annualreviews.org. Abstract Although the last forty years has seen considerable growth in the use of statistics in legal proceedings, it is primarily classical statistical methods rather than Bayesian methods that have been used. Yet the Bayesian approach avoids many of the problems of classical statistics and is also well suited to a broader range of problems. This paper reviews the potential and actual use of Bayes in the law and explains the main reasons for its lack of impact on legal practice. These include misconceptions by the legal community about Bayes’ theorem, over-reliance on the use of the likelihood ratio and the lack of adoption of modern computational methods. We argue that Bayesian Networks (BNs), which automatically produce the necessary Bayesian calculations, provide an opportunity to address most concerns about using Bayes in the law. Keywords: Bayes, Bayesian networks, statistics in court, legal arguments 1 1 Introduction The use of statistics in legal proceedings (both criminal and civil) has a long, but not terribly well distinguished, history that has been very well documented in (Finkelstein, 2009; Gastwirth, 2000; Kadane, 2008; Koehler, 1992; Vosk and Emery, 2014). -
Estimating the Accuracy of Jury Verdicts
Institute for Policy Research Northwestern University Working Paper Series WP-06-05 Estimating the Accuracy of Jury Verdicts Bruce D. Spencer Faculty Fellow, Institute for Policy Research Professor of Statistics Northwestern University Version date: April 17, 2006; rev. May 4, 2007 Forthcoming in Journal of Empirical Legal Studies 2040 Sheridan Rd. ! Evanston, IL 60208-4100 ! Tel: 847-491-3395 Fax: 847-491-9916 www.northwestern.edu/ipr, ! [email protected] Abstract Average accuracy of jury verdicts for a set of cases can be studied empirically and systematically even when the correct verdict cannot be known. The key is to obtain a second rating of the verdict, for example the judge’s, as in the recent study of criminal cases in the U.S. by the National Center for State Courts (NCSC). That study, like the famous Kalven-Zeisel study, showed only modest judge-jury agreement. Simple estimates of jury accuracy can be developed from the judge-jury agreement rate; the judge’s verdict is not taken as the gold standard. Although the estimates of accuracy are subject to error, under plausible conditions they tend to overestimate the average accuracy of jury verdicts. The jury verdict was estimated to be accurate in no more than 87% of the NCSC cases (which, however, should not be regarded as a representative sample with respect to jury accuracy). More refined estimates, including false conviction and false acquittal rates, are developed with models using stronger assumptions. For example, the conditional probability that the jury incorrectly convicts given that the defendant truly was not guilty (a “type I error”) was estimated at 0.25, with an estimated standard error (s.e.) of 0.07, the conditional probability that a jury incorrectly acquits given that the defendant truly was guilty (“type II error”) was estimated at 0.14 (s.e. -
General Probability, II: Independence and Conditional Proba- Bility
Math 408, Actuarial Statistics I A.J. Hildebrand General Probability, II: Independence and conditional proba- bility Definitions and properties 1. Independence: A and B are called independent if they satisfy the product formula P (A ∩ B) = P (A)P (B). 2. Conditional probability: The conditional probability of A given B is denoted by P (A|B) and defined by the formula P (A ∩ B) P (A|B) = , P (B) provided P (B) > 0. (If P (B) = 0, the conditional probability is not defined.) 3. Independence of complements: If A and B are independent, then so are A and B0, A0 and B, and A0 and B0. 4. Connection between independence and conditional probability: If the con- ditional probability P (A|B) is equal to the ordinary (“unconditional”) probability P (A), then A and B are independent. Conversely, if A and B are independent, then P (A|B) = P (A) (assuming P (B) > 0). 5. Complement rule for conditional probabilities: P (A0|B) = 1 − P (A|B). That is, with respect to the first argument, A, the conditional probability P (A|B) satisfies the ordinary complement rule. 6. Multiplication rule: P (A ∩ B) = P (A|B)P (B) Some special cases • If P (A) = 0 or P (B) = 0 then A and B are independent. The same holds when P (A) = 1 or P (B) = 1. • If B = A or B = A0, A and B are not independent except in the above trivial case when P (A) or P (B) is 0 or 1. In other words, an event A which has probability strictly between 0 and 1 is not independent of itself or of its complement. -
Propensities and Probabilities
ARTICLE IN PRESS Studies in History and Philosophy of Modern Physics 38 (2007) 593–625 www.elsevier.com/locate/shpsb Propensities and probabilities Nuel Belnap 1028-A Cathedral of Learning, University of Pittsburgh, Pittsburgh, PA 15260, USA Received 19 May 2006; accepted 6 September 2006 Abstract Popper’s introduction of ‘‘propensity’’ was intended to provide a solid conceptual foundation for objective single-case probabilities. By considering the partly opposed contributions of Humphreys and Miller and Salmon, it is argued that when properly understood, propensities can in fact be understood as objective single-case causal probabilities of transitions between concrete events. The chief claim is that propensities are well-explicated by describing how they fit into the existing formal theory of branching space-times, which is simultaneously indeterministic and causal. Several problematic examples, some commonsense and some quantum-mechanical, are used to make clear the advantages of invoking branching space-times theory in coming to understand propensities. r 2007 Elsevier Ltd. All rights reserved. Keywords: Propensities; Probabilities; Space-times; Originating causes; Indeterminism; Branching histories 1. Introduction You are flipping a fair coin fairly. You ascribe a probability to a single case by asserting The probability that heads will occur on this very next flip is about 50%. ð1Þ The rough idea of a single-case probability seems clear enough when one is told that the contrast is with either generalizations or frequencies attributed to populations asserted while you are flipping a fair coin fairly, such as In the long run; the probability of heads occurring among flips is about 50%. ð2Þ E-mail address: [email protected] 1355-2198/$ - see front matter r 2007 Elsevier Ltd. -
Conditional Probability and Bayes Theorem A
Conditional probability And Bayes theorem A. Zaikin 2.1 Conditional probability 1 Conditional probablity Given events E and F ,oftenweareinterestedinstatementslike if even E has occurred, then the probability of F is ... Some examples: • Roll two dice: what is the probability that the sum of faces is 6 given that the first face is 4? • Gene expressions: What is the probability that gene A is switched off (e.g. down-regulated) given that gene B is also switched off? A. Zaikin 2.2 Conditional probability 2 This conditional probability can be derived following a similar construction: • Repeat the experiment N times. • Count the number of times event E occurs, N(E),andthenumberoftimesboth E and F occur jointly, N(E ∩ F ).HenceN(E) ≤ N • The proportion of times that F occurs in this reduced space is N(E ∩ F ) N(E) since E occurs at each one of them. • Now note that the ratio above can be re-written as the ratio between two (unconditional) probabilities N(E ∩ F ) N(E ∩ F )/N = N(E) N(E)/N • Then the probability of F ,giventhatE has occurred should be defined as P (E ∩ F ) P (E) A. Zaikin 2.3 Conditional probability: definition The definition of Conditional Probability The conditional probability of an event F ,giventhataneventE has occurred, is defined as P (E ∩ F ) P (F |E)= P (E) and is defined only if P (E) > 0. Note that, if E has occurred, then • F |E is a point in the set P (E ∩ F ) • E is the new sample space it can be proved that the function P (·|·) defyning a conditional probability also satisfies the three probability axioms. -
CONDITIONAL EXPECTATION Definition 1. Let (Ω,F,P)
CONDITIONAL EXPECTATION 1. CONDITIONAL EXPECTATION: L2 THEORY ¡ Definition 1. Let (,F ,P) be a probability space and let G be a σ algebra contained in F . For ¡ any real random variable X L2(,F ,P), define E(X G ) to be the orthogonal projection of X 2 j onto the closed subspace L2(,G ,P). This definition may seem a bit strange at first, as it seems not to have any connection with the naive definition of conditional probability that you may have learned in elementary prob- ability. However, there is a compelling rationale for Definition 1: the orthogonal projection E(X G ) minimizes the expected squared difference E(X Y )2 among all random variables Y j ¡ 2 L2(,G ,P), so in a sense it is the best predictor of X based on the information in G . It may be helpful to consider the special case where the σ algebra G is generated by a single random ¡ variable Y , i.e., G σ(Y ). In this case, every G measurable random variable is a Borel function Æ ¡ of Y (exercise!), so E(X G ) is the unique Borel function h(Y ) (up to sets of probability zero) that j minimizes E(X h(Y ))2. The following exercise indicates that the special case where G σ(Y ) ¡ Æ for some real-valued random variable Y is in fact very general. Exercise 1. Show that if G is countably generated (that is, there is some countable collection of set B G such that G is the smallest σ algebra containing all of the sets B ) then there is a j 2 ¡ j G measurable real random variable Y such that G σ(Y ). -
ST 371 (VIII): Theory of Joint Distributions
ST 371 (VIII): Theory of Joint Distributions So far we have focused on probability distributions for single random vari- ables. However, we are often interested in probability statements concerning two or more random variables. The following examples are illustrative: • In ecological studies, counts, modeled as random variables, of several species are often made. One species is often the prey of another; clearly, the number of predators will be related to the number of prey. • The joint probability distribution of the x, y and z components of wind velocity can be experimentally measured in studies of atmospheric turbulence. • The joint distribution of the values of various physiological variables in a population of patients is often of interest in medical studies. • A model for the joint distribution of age and length in a population of ¯sh can be used to estimate the age distribution from the length dis- tribution. The age distribution is relevant to the setting of reasonable harvesting policies. 1 Joint Distribution The joint behavior of two random variables X and Y is determined by the joint cumulative distribution function (cdf): (1.1) FXY (x; y) = P (X · x; Y · y); where X and Y are continuous or discrete. For example, the probability that (X; Y ) belongs to a given rectangle is P (x1 · X · x2; y1 · Y · y2) = F (x2; y2) ¡ F (x2; y1) ¡ F (x1; y2) + F (x1; y1): 1 In general, if X1; ¢ ¢ ¢ ;Xn are jointly distributed random variables, the joint cdf is FX1;¢¢¢ ;Xn (x1; ¢ ¢ ¢ ; xn) = P (X1 · x1;X2 · x2; ¢ ¢ ¢ ;Xn · xn): Two- and higher-dimensional versions of probability distribution functions and probability mass functions exist. -
The Semantics of Subroutines and Iteration in the Bayesian Programming Language Probt
1 The semantics of Subroutines and Iteration in the Bayesian Programming language ProBT R. LAURENT∗, K. MEKHNACHA∗, E. MAZERy and P. BESSIERE` z ∗ProbaYes S.A.S, Grenoble, France yUniversity of Grenoble-Alpes, CNRS/LIG, Grenoble, France zUniversity Pierre et Marie Curie, CNRS/ISIR, Paris, France Abstract—Bayesian models are tools of choice when solving 8 8 8 problems with incomplete information. Bayesian networks pro- > > >Variables > > <> vide a first but limited approach to address such problems. > <> Decomposition <> Specification (π) For real world applications, additional semantics is needed to Description > P arametric construct more complex models, especially those with repetitive > :>F orms > > P rogram structures or substructures. ProBT, a Bayesian a programming > :> > Identification (based on δ) language, provides a set of constructs for developing and applying :> complex models with substructures and repetitive structures. Question The goal of this paper is to present and discuss the semantics associated to these constructs. Figure 1. A Bayesian program is constructed from a Description and a Question. The Description is given by the programmer who writes a Index Terms—Probabilistic Programming semantics , Bayesian Specification of a model π and an Identification of its parameter values, Programming, ProBT which can be set in the program or obtained through a learning process from a data set δ. The Specification is constructed from a set of relevant variables, a decomposition of the joint probability distribution over these variables and a set of parametric forms (mathematical models) for the terms I. INTRODUCTION of this decomposition. ProBT [1] was designed to translate the ideas of E.T. Jaynes [2] into an actual programming language. -
Teaching Bayesian Behaviours to Video Game Characters
Robotics and Autonomous Systems 47 (2004) 177–185 Teaching Bayesian behaviours to video game characters Ronan Le Hy∗, Anthony Arrigoni, Pierre Bessière, Olivier Lebeltel GRAVIR/IMAG, INRIA Rhˆone-Alpes, ZIRST, 38330 Montbonnot, France Abstract This article explores an application of Bayesian programming to behaviours for synthetic video games characters. We address the problem of real-time reactive selection of elementary behaviours for an agent playing a first person shooter game. We show how Bayesian programming can lead to condensed and easier formalisation of finite state machine-like behaviour selection, and lend itself to learning by imitation, in a fully transparent way for the player. © 2004 Published by Elsevier B.V. Keywords: Bayesian programming; Video games characters; Finite state machine; Learning by imitation 1. Introduction After listing our practical objectives, we will present our Bayesian model. We will show how we use it to Today’s video games feature synthetic characters specify by hand a behaviour, and how we use it to involved in complex interactions with human players. learn a behaviour. We will tackle learning by exam- A synthetic character may have one of many different ple using a high-level interface, and then the natural roles: tactical enemy, partner for the human, strategic controls of the game. We will show that it is possible opponent, simple unit amongst many, commenter, etc. to map the player’s actions onto bot states, and use In all of these cases, the game developer’s ultimate this reconstruction to learn our model. Finally, we will objective is for the synthetic character to act like a come back to our objectives as a conclusion. -
Two-Phase Auto-Piloted Synchronous Motors and Actuators
IMPERIAL COLLEGE OF SCIENCE AND TECHNOLOGY DEPARTMENT OF ELECTRICAL ENGINEERING TWO-PHASE AUTO-PILOTED SYNCHRONOUS MOTORS AND ACTUATORS it Thesis submitted for the degree of Doctor of Philosophy of the University of London. it by Amadeu Leao Santos Rodrigues, M.Sc. (D.I.C.) July 1983 The thesis is concerned with certain aspects of the design and per- formance of drives based on variable speed synchronous motors fed from variable frequency sources, controlled by means of rotor-position sensors. This auto-piloted or self-synchronised form of control enables the machine to operate either with an adjustable load angle or an adjustable torque angle depending on whether a voltage or a current source feed is used. D.c. machine-type characteristics can thus be obtained from the system. The thesis commences with an outline of some fundamental design aspects and a summary of torque production mechanisms in electrical machines. The influence of configuration and physical size on machine performance is discussed and the advantages of the use for servo applic- ations of direct-drive as opposed to step-down transmissions are explained. Developments in permanent magnet materials have opened the way to permanent magnet motors of improved performance, and a brief review of the properties of the various materials presently available is given. A finite-difference method using magnetic scalar potential for calculating the magnetic field of permanent magnets in the presence of iron cores and macroscopic currents is described. A comparison with the load line method is made for some typical cases. Analogies between the mechanical commutator of a d.c. -
The Logical Basis of Bayesian Reasoning and Its Application on Judicial Judgment
2018 International Workshop on Advances in Social Sciences (IWASS 2018) The Logical Basis of Bayesian Reasoning and Its Application on Judicial Judgment Juan Liu Zhengzhou University of Industry Technology, Xinzheng, Henan, 451100, China Keywords: Bayesian reasoning; Bayesian network; judicial referee Abstract: Bayesian inference is a law that corrects subjective judgments of related probabilities based on observed phenomena. The logical basis is that when the sample's capacity is close to the population, the probability of occurrence of events in the sample is close to the probability of occurrence of the population. The basic expression is: posterior probability = prior probability × standard similarity. Bayesian networks are applications of Bayesian inference, including directed acyclic graphs (DAGs) and conditional probability tables (CPTs) between nodes. Using the Bayesian programming tool to construct the Bayesian network, the ECHO model is used to analyze the node structure of the proposition in the first trial of von Blo, and the jury can be simulated by the insertion of the probability value in the judgment of the jury in the first instance, but find and set The difficulty of all conditional probabilities limits the effectiveness of its display of causal structures. 1. Introduction The British mathematician Thomas Bayes (about 1701-1761) used inductive reasoning for the basic theory of probability theory and created Bayesian statistical theory, namely Bayesian reasoning. After the continuous improvement of scholars in later generations, a scientific methodology system has gradually formed, "applied to many fields and developed many branches." [1] Bayesian reasoning needs to reason about the estimates and hypotheses to be made based on the sample information observed by the observer and the relevant experience of the inferencer. -
Debugging Probabilistic Programs
Debugging Probabilistic Programs Chandrakana Nandi Adrian Sampson Todd Mytkowicz Dan Grossman Cornell University, Ithaca, USA Microsoft Research, Redmond, USA University of Washington, Seattle, USA [email protected] [email protected] fcnandi, [email protected] Kathryn S. McKinley Google, USA [email protected] Abstract ity of being true in a given execution. Even though these assertions Many applications compute with estimated and uncertain data. fail when the program’s results are unexpected, they do not give us While advances in probabilistic programming help developers build any information about the cause of the failure. To help determine such applications, debugging them remains extremely challenging. the cause of failure, we identify three types of common probabilistic New types of errors in probabilistic programs include 1) ignoring de- programming defects. pendencies and correlation between random variables and in training Modeling errors and insufficient evidence. Probabilistic pro- data, 2) poorly chosen inference hyper-parameters, and 3) incorrect grams may use incorrect statistical models, e.g., using Gaussian statistical models. A partial solution to prevent these errors in some (0.0, 1.0) instead of Gaussian (1.0, 1.0), where Gaussian (µ, s) rep- languages forbids developers from explicitly invoking inference. resents a Gaussian distribution with mean µ and standard deviation While this prevents some dependence errors, it limits composition s. On the other hand,even if the statistical model is correct, their and control over inference, and does not guarantee absence of other input data (e.g., training data) may be erroneous, insufficient or inappropriate for performing a given statistical task. types of errors.