Problem Set 2

Total Page:16

File Type:pdf, Size:1020Kb

Problem Set 2 Problem Set 2 MAS 622J/1.126J: Pattern Recognition and Analysis Due: 5:00 p.m. on September 30 [Note: All instructions to plot data or write a program should be carried out using Matlab. In order to maintain a reasonable level of consistency and simplicity we ask that you do not use other software tools.] If you collaborated with other members of the class, please write their names at the end of the assignment. Moreover, you will need to write and sign the following statement: \In preparing my solutions, I did not look at any old homeworks, copy anybody's answers or let them copy mine." Problem 1: [10 Points] In many pattern classification problems we have the option to either assign a pattern to one of c classes, or reject it as unrecognizable - if the cost to reject is not too high. Let the cost of classification be defined as: 8 0 if ! = ! , (i.e. Correct Classification) <> i j λ(!ij!j) = λr if ! i = ! 0, (i.e. Rejection) > :λs Otherwise, (i.e. Substitution Error) Show that for the minimum risk classification, the decision rule should asso- ciate a test vector x with class ! i, if P(! ijx) ≥ P(! jjx) for all j and and P(! j x) ≥ 1 - λr , and reject otherwise. i λs Solution: Average risk is choosing class !i: c c X ∗ X R(!ijx) = λ(!ij!j)P (!jjx) = 0 P (!ijx) + λsP (!jjx) j=1 j=1;j!=i 1 where λ(!ij!j) is used to mean the cost of choosing class !i where the true class is !j. Hence: R(!ijx) = λs(1 − P (!ijx)) Associate x with the class !i if highest posterior class probability and the average risk is less than the cost of rejection: λs(1 − P (!ijx)) ≤ λr P (!ijx) ≥ 1 − λr/λs Problem 2: [16 points] In a particular binary hypothesis testing application, the conditional density for a scalar feature x given class w1 is 2 p(xjw1) = k1 exp(−x =20) Given class w2 the conditional density is 2 p(xjw2) = k2 exp(−(x − 6) =12) a. Find k1 and k2, and plot the two densities on a single graph using Matlab. Solution: We solve for the parameters k1 and k2 by recognizing that the two equations are in the form of the normal Gaussian distribution. 2 2 −(x−µ1) −x 1 2 k e 20 = p e 2σ1 1 2 2πσ1 2 2 −x −(x − µ1) = 2 20 2σ1 2 µ1 = 0; σ1 = 10 1 k1 = p 20π 2 0.16 p(x|w ) 1 p(x|w ) 0.14 2 0.12 R R R 0.1 1 2 1 p 0.08 0.06 0.04 0.02 0 −10 −5 0 5 10 15 20 25 30 x Figure 1: The pdf and decision boundaries In a similar fashion, we find that 2 µ2 = 6; σ2 = 6 1 k2 = p 12π These distributions are plotted in Figure 1. b. Assume that the prior probabilities of the two classes are equal, and that the cost for choosingp correctly isp zero. If the costs for choosing incorrectly are C12 = 3 and C21 = 5 (where Cij corresponds to predicting class i when it belongs to class j), what is the expression for the conditional risk? Solution: The conditional risk for Two-Category Classification is dis- cussed in Section 2.2.1 in D.H.S. where we can see that the formulas for the risk associated with the action, αi, of classifying a feature vector, x, as class, !i, is different for each action: R(α1jx) = λ11P (!1jx) + λ12P (!2jx) R(α2jx) = λ21P (!1jx) + λ22P (!2jx) The prior probabilities of the two classes are equal: 1 P (! ) = P (! ) = 1 2 2 3 The assumption that the cost for choosing correctly is zero: λ11 = 0; λ22 = 0 The costs for choosing incorrectly are given as C12 and C21: p λ12 = C12 = 3 p λ21 = C21 = 5 Thus the expression for the conditional risk of α1 is: R(α1jx) = λ11P (!1jx) + λ12P (!2jx) p R(α1jx) = 0P (!1jx) + 3P (!2jx) p R(α1jx) = 3P (!2jx) And the expression for the conditional risk of α2: R(α2jx) = λ21P (!1jx) + λ22P (!2jx) p R(α2jx) = 5P (!1jx) + 0P (!2jx) p R(α2jx) = 5P (!1jx) c. Find the decision regions which minimize the Bayes risk, and indicate them on the plot you made in part (a) Solution: The Bayes Risk is the integral of the conditional risk when we use the optimal decision regions, R1 and R2. So, solving for the optimal decision boundary is a matter of solving for the roots of the equation: R(α1jx) = R(α2jx) p p 3P (!2jx) = 5P (!1jx) p p 3P (xj! )P (! ) 5P (xj! )P (! ) 2 2 = 1 1 P (x) P (x) Given the priors are equal this simplifies to: p p 3P (xj!2) = 5P (xj!1) 4 Next, using the values, k1 and k2, from part (a), we have expressions for pxj!1 and pxj!2 . 2 p 1 −(x−6) p 1 −x2 3p e 12 = 5p e 20 12π 20π 2 −(x−6) −x2 e 12 = e 20 −(x − 6)2 −x2 = 12 20 20(x2 − 12x + 36) = 12x2 8x2 − 240x + 720 = 0 The decisionp boundary is found by solvingp for the roots of this quadratic, x1 = 15 − 3 15 = 3:381 and x2 = 15 + 3 15 = 26:62 d. For the decision regions in part (c), what is the numerical value of the Bayes risk? p Solution:p For x < 15 − 3 15p the decision rule will choose !1 For 15 − 3p15 < x < 15 + 3 15 the decision rule will choose !2 For 15 + 3 15 < x the decision rule will choose !1 p p Thus the decision region R1 is xp< 15 − 3 15 andpx > 15 + 3 15, and the decision region R2 is 15 − 3 15 < x < 15 + 3 15. 5 Z Risk = λ11P (!1jx) + λ12P (!2jx) R1 Z + λ21P (!1jy) + λ22P (!2jy) R2 Z Z = λ12P (yj!2)P (!2) + λ21P (yj!1)P (!1) R1 R2 p Z y=15−3 15 p 1 Z y=inf p 1 = 3N(6; 6) + p 3N(6; 6) y=− inf 2 y=15+3 15 2 p Z y=15+3 15 p 1 + 5N(0; 10) p 2 py=15−3 15 3 1 1 p p 1 1 p p = ( + erf(((15 − 3 15) − 6)= 12)) + ( − erf(((15 + 3 15) − 6)= 12)) 2 2 2 2 2 p 5 1 p p 1 p p + erf((15 + 3 15)= 20) − erf((15 − 3 15)= 20) 2 2 2 = 0:2827 Problem 3: [16 points] Let's consider a simple communication system. The transmitter sends out 3 1 messages m = 0 or m = 1, occurring with a priori probabilities 4 and 4 respectively. The message is contaminated by a noise n, which is indepen- 1 5 2 dent from m and takes on the values -1, 0, 1 with probabilities 8 , 8 , and 8 respectively. The received signal, or the observation, can be represented as r = m + n. From r, we wish to infer what the transmitted message m was (estimated state), denoted usingm ^ .m ^ also takes values on 0 or 1. When m =m ^ , the detector correctly receives the original message, otherwise an error occurs. a. Find the decision rule that achieves the maximum probability of correct decision. Compute the probability of error for this decision rule. Solution: It is equivalent to find the decision rule that achieves the minimum probability of error. The receiver decides the transmitted message is 1, i.e.,m ^ = 1 if 6 Figure 2: A simple receiver P (rjm = 1) · P r(m = 1) ≥ P (rjm = 0) · P r(m = 0) P (rjm = 1) ≥ P (rjm = 0) · 3 (1) Otherwise the receiver decidesm ^ = 0. The likelihood functions for these two cases are 8 1 if r = 0 > 8 <> 5 8 if r = 1 P (rjm = 1) = 2 > 8 if r = 2 > :0 otherwise 8 1 if r = −1 > 8 <> 5 8 if r = 0 P (rjm = 0) = 2 > 8 if r = 1 > :0 otherwise Eq. 1 holds only when r = 2. Therefore, the decision rule can be summarized as ( 1 if r = 2 m^ = 0 otherwise The probability of error is as follows: P r(e) = P r(m ^ = 1jm = 0) · P r(m = 0) + P r(m ^ = 0jm = 1) · P r(m = 1) = P r(r = 2jm = 0) · P r(m = 0) + P r(r 6= 2jm = 1) · P r(m = 1) 1 5 1 3 = 0 + ( + ) · = = 0:1875 8 8 4 16 7 1 0.9 0.8 0.7 0.6 0.5 p(n) 0.4 0.3 0.2 0.1 0 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 n Figure 3: The pdf of n b. Let's have the noise n be a continuous random variable, uniformly −3 2 distributed between 4 and 4 , and still statistically independent of m. First, plot the pdf of n. Then, find a decision rule that achieves the minimum probability of error, and compute the probability of error. Solution: The uniform distribution is plotted in Figure 3. The decision rule is still determined by using Eq. 1. The likelihood functions become continuous, instead of discrete in (a): ( 4 if 1 < r ≤ 6 P (rjm = 1) = 5 4 4 0 otherwise ( 4 if −3 < r ≤ 2 P (rjm = 0) = 5 4 4 0 otherwise The interesting region is where the two pdf's overlap with each other, 1 2 namely when 4 < r ≤ 4 .
Recommended publications
  • Implications of Rational Inattention
    IMPLICATIONS OF RATIONAL INATTENTION CHRISTOPHER A. SIMS Abstract. A constraint that actions can depend on observations only through a communication channel with finite Shannon capacity is shown to be able to play a role very similar to that of a signal extraction problem or an adjustment cost in standard control problems. The resulting theory looks enough like familiar dynamic rational expectations theories to suggest that it might be useful and practical, while the implications for policy are different enough to be interesting. I. Introduction Keynes's seminal idea was to trace out the equilibrium implications of the hypothe- sis that markets did not function the way a seamless model of continuously optimizing agents, interacting in continuously clearing markets would suggest. His formal de- vice, price \stickiness", is still controversial, but those critics of it who fault it for being inconsistent with the assumption of continuously optimizing agents interacting in continuously clearing markets miss the point. This is its appeal, not its weakness. The influential competitors to Keynes's idea are those that provide us with some other description of the nature of the deviations from the seamless model that might account for important aspects of macroeconomic fluctuations. Lucas's 1973 clas- sic \International Evidence: : :" paper uses the idea that agents may face a signal- extraction problem in distinguishing movements in the aggregate level of prices and wages from movements in the specific prices they encounter in transactions. Much of subsequent rational expectations macroeconomic modeling has relied on the more tractable device of assuming an \information delay", so that some kinds of aggregate data are observable to some agents only with a delay, though without error after the delay.
    [Show full text]
  • The Physics of Optimal Decision Making: a Formal Analysis of Models of Performance in Two-Alternative Forced-Choice Tasks
    Psychological Review Copyright 2006 by the American Psychological Association 2006, Vol. 113, No. 4, 700–765 0033-295X/06/$12.00 DOI: 10.1037/0033-295X.113.4.700 The Physics of Optimal Decision Making: A Formal Analysis of Models of Performance in Two-Alternative Forced-Choice Tasks Rafal Bogacz, Eric Brown, Jeff Moehlis, Philip Holmes, and Jonathan D. Cohen Princeton University In this article, the authors consider optimal decision making in two-alternative forced-choice (TAFC) tasks. They begin by analyzing 6 models of TAFC decision making and show that all but one can be reduced to the drift diffusion model, implementing the statistically optimal algorithm (most accurate for a given speed or fastest for a given accuracy). They prove further that there is always an optimal trade-off between speed and accuracy that maximizes various reward functions, including reward rate (percentage of correct responses per unit time), as well as several other objective functions, including ones weighted for accuracy. They use these findings to address empirical data and make novel predictions about performance under optimality. Keywords: drift diffusion model, reward rate, optimal performance, speed–accuracy trade-off, perceptual choice This article concerns optimal strategies for decision making in It has been known since Hernstein’s (1961, 1997) work that the two-alternative forced-choice (TAFC) task. We present and animals do not achieve optimality under all conditions, and in compare several decision-making models, briefly discuss their behavioral economics, humans often fail to choose optimally (e.g., neural implementations, and relate them to one that is optimal in Kahneman & Tversky, 1984; Loewenstein & Thaler, 1989).
    [Show full text]
  • Statistical Decision Theory Bayesian and Quasi-Bayesian Estimators
    Statistical Decision Theory Bayesian and Quasi-Bayesian estimators Giselle Montamat Harvard University Spring 2020 Statistical Decision Theory Bayesian and Quasi-Bayesian estimators 1 / Giselle Montamat 46 Statistical Decision Theory Framework to make a decision based on data (e.g., find the \best" estimator under some criteria for what \best" means; decide whether to retain/reward a teacher based on observed teacher value added estimates); criteria to decide what a good decision (e.g., a good estimator; whether to retain/reward a teacher) is. Ingredients: Data: \X " Statistical decision: \a" Decision function: \δ(X )" State of the world: \θ" Loss function: \L(a; θ)" Statistical model (likelihood): \f (X jθ)" Risk function (aka expected loss): Z R(δ; θ) = Ef (X jθ)[L(δ(X ); θ)] = L(δ(X ); θ)f (X jθ)dX Statistical Decision Theory Bayesian and Quasi-Bayesian estimators 2 / Giselle Montamat 46 Statistical Decision Theory Objective: estimate µ(θ) (could be µ(θ) = θ) using data X via δ(X ). (Note: here the decision is to choose an estimator; we'll see another example where the decision is a binary choice). Loss function L(a; θ): describes loss that we incur in if we take action a when true parameter value is θ. Note that estimation (\decision") will be based on data via δ(X ) = a, so loss is a function of the data and the true parameter, ie, L(δ(X ); θ). Criteria for what makes a \good" δ(X ), for a given θ: the expected loss (aka, the risk) has to be small, where the expectation is taken over X given model f (X jθ) for a given θ.
    [Show full text]
  • Statistical Decision Theory: Concepts, Methods and Applications
    Statistical Decision Theory: Concepts, Methods and Applications (Special topics in Probabilistic Graphical Models) FIRST COMPLETE DRAFT November 30, 2003 Supervisor: Professor J. Rosenthal STA4000Y Anjali Mazumder 950116380 Part I: Decision Theory – Concepts and Methods Part I: DECISION THEORY - Concepts and Methods Decision theory as the name would imply is concerned with the process of making decisions. The extension to statistical decision theory includes decision making in the presence of statistical knowledge which provides some information where there is uncertainty. The elements of decision theory are quite logical and even perhaps intuitive. The classical approach to decision theory facilitates the use of sample information in making inferences about the unknown quantities. Other relevant information includes that of the possible consequences which is quantified by loss and the prior information which arises from statistical investigation. The use of Bayesian analysis in statistical decision theory is natural. Their unification provides a foundational framework for building and solving decision problems. The basic ideas of decision theory and of decision theoretic methods lend themselves to a variety of applications and computational and analytic advances. This initial part of the report introduces the basic elements in (statistical) decision theory and reviews some of the basic concepts of both frequentist statistics and Bayesian analysis. This provides a foundational framework for developing the structure of decision problems. The second section presents the main concepts and key methods involved in decision theory. The last section of Part I extends this to statistical decision theory – that is, decision problems with some statistical knowledge about the unknown quantities. This provides a comprehensive overview of the decision theoretic framework.
    [Show full text]
  • The Functional False Discovery Rate with Applications to Genomics
    bioRxiv preprint doi: https://doi.org/10.1101/241133; this version posted December 30, 2017. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-ND 4.0 International license. The Functional False Discovery Rate with Applications to Genomics Xiongzhi Chen*,† David G. Robinson,* and John D. Storey*‡ December 29, 2017 Abstract The false discovery rate measures the proportion of false discoveries among a set of hypothesis tests called significant. This quantity is typically estimated based on p-values or test statistics. In some scenarios, there is additional information available that may be used to more accurately estimate the false discovery rate. We develop a new framework for formulating and estimating false discovery rates and q-values when an additional piece of information, which we call an “informative variable”, is available. For a given test, the informative variable provides information about the prior probability a null hypothesis is true or the power of that particular test. The false discovery rate is then treated as a function of this informative variable. We consider two applications in genomics. Our first is a genetics of gene expression (eQTL) experiment in yeast where every genetic marker and gene expression trait pair are tested for associations. The informative variable in this case is the distance between each genetic marker and gene. Our second application is to detect differentially expressed genes in an RNA-seq study carried out in mice.
    [Show full text]
  • Optimal Trees for Prediction and Prescription Jack William Dunn
    Optimal Trees for Prediction and Prescription by Jack William Dunn B.E.(Hons), University of Auckland (2014) Submitted to the Sloan School of Management in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Operations Research at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2018 ○c Massachusetts Institute of Technology 2018. All rights reserved. Author................................................................ Sloan School of Management May 18, 2018 Certified by. Dimitris Bertsimas Boeing Professor of Operations Research Co-director, Operations Research Center Thesis Supervisor Accepted by . Patrick Jaillet Dugald C. Jackson Professor Department of Electrical Engineering and Computer Science Co-Director, Operations Research Center Optimal Trees for Prediction and Prescription by Jack William Dunn Submitted to the Sloan School of Management on May 18, 2018, in partial fulfillment of the requirements for the degree of Doctor of Philosophy in Operations Research Abstract For the past 30 years, decision tree methods have been one of the most widely- used approaches in machine learning across industry and academia, due in large part to their interpretability. However, this interpretability comes at a price—the performance of classical decision tree methods is typically not competitive with state- of-the-art methods like random forests and gradient boosted trees. A key limitation of classical decision tree methods is their use of a greedy heuristic for training. The tree is therefore constructed one locally-optimal split at a time, and so the final tree as a whole may be far from global optimality. Motivated bythe increase in performance of mixed-integer optimization methods over the last 30 years, we formulate the problem of constructing the optimal decision tree using discrete optimization, allowing us to construct the entire decision tree in a single step and hence find the single tree that best minimizes the training error.
    [Show full text]
  • Rational Inattention in Controlled Markov Processes
    Rational Inattention in Controlled Markov Processes Ehsan Shafieepoorfard, Maxim Raginsky and Sean P. Meyn Abstract— The paper poses a general model for optimal Sims considers a model in which a representative agent control subject to information constraints, motivated in part decides about his consumption over subsequent periods of by recent work on information-constrained decision-making by time, while his computational ability to reckon his wealth – economic agents. In the average-cost optimal control frame- work, the general model introduced in this paper reduces the state of the dynamic system – is limited. A special case is to a variant of the linear-programming representation of the considered in which income in one period adds uncertainty average-cost optimal control problem, subject to an additional of wealth in the next period. Other modeling assumptions mutual information constraint on the randomized stationary reduce the model to an LQG control problem. As one policy. The resulting optimization problem is convex and admits justification for introducing the information constraint, Sims a decomposition based on the Bellman error, which is the object of study in approximate dynamic programming. The remarks [7] that “most people are only vaguely aware of their structural results presented in this paper can be used to obtain net worth, are little-influenced in their current behavior by performance bounds, as well as algorithms for computation or the status of their retirement account, and can be induced to approximation of optimal policies. make large changes in savings behavior by minor ‘informa- tional’ changes, like changes in default options on retirement I. INTRODUCTION plans.” Quantitatively, the information constraint is stated in terms of an upper bound on the mutual information in the In typical applications of stochastic dynamic program- sense of Shannon [8] between the state of the system and ming, the controller has access to limited information about the observation available to the agent.
    [Show full text]
  • Optimal Decision Making in Daily Life
    MEOW Economics Student Resource Optimal decision making in daily life Is Castle Hill just a lump of rock in the middle of Townsville, or does it have more value than that? How can we work our its economic value? In our daily life, we usually need to make many decisions. For example, on a Sunday morning, you may need to decide “Shall I stay in bed for another 10 minutes or shall I get up and go running?” When you graduate from high school, you will need to think “Shall I go to university or take a gap year?” For some of these, you make decisions subconsciously, while for the others you carefully think about them and then make decisions. Nevertheless, for these decisions, you may wonder whether you make the best (optimal) decision. Economics can help you here. Economics is about the allocation of scarce resources, and making good decisions. Economists call this optimal decision making. It is said that economists think in terms of margin. What does this mean? To make an optimal decision, economists ask: “What are the extra (marginal) costs and what are the extra (marginal) benefits associated with the decision?” If the extra benefits are bigger than the extra costs, you shall go ahead with the decision, namely the decision is good. Did you know? During World War II, Castle Hill was used as a communications and observation post. Download this and other mathematics teaching resources: https://www.jcu.edu.au/maths-hub. Copying, sharing and adaptation of this material. Aim: To make decisions by “thinking in terms of margin”, using two examples, one real and the other hypothetical.
    [Show full text]
  • Bayesian Decision Theory Chapter 2 (Jan 11, 18, 23, 25)
    Bayesian Decision Theory Chapter 2 (Jan 11, 18, 23, 25) • Bayes decision theory is a fundamental statistical approach to pattern classification • Assumption: decision problem posed in probabilistic terms and relevant probability values are known Decision Making Probabilistic model Known Unknown Bayes Decision Supervised Unsupervised Theory Learning Learning (Chapter 2) Parametric Nonparametric Parametric Nonparametric Approach Approach Approach Approach (Chapter 3) (Chapter 4, 6) (Chapter 10) (Chapter 10) “Optimal” Plug-in Density K-NN, neural Mixture Cluster Analysis Rules Rules Estimation networks models Sea bass v. Salmon Classification • Each fish appearing on the conveyor belt is either sea bass or salmon; two “states of nature” • Let denote the state of nature: 1 = sea bass and 2 = salmon; is a random variable that must be described probabilistically • a priori (prior) probability: P (1) and P(2); P (1) is the probability next fish observed is a sea bass • If no other types of fish are present then • P(1) + P( 2) = 1 (exclusivity and exhaustivity) • P(1) = P(2) (uniform priors) • Prior prob. reflects our prior knowledge about how likely we are to observe a sea bass or salmon; prior prob. may depend on time of the year or the fishing area! • Case 1: Suppose we are asked to make a decision without observing the fish. We only have prior information • Bayes decision rule given only prior information • Decide 1 if P(1) > P(2), otherwise decide 2 • Error rate = Min {P(1) , P(2)} • Suppose now we are allowed to measure a feature on the state of nature - say the fish lightness value • Define class-conditional probability density function (pdf) of feature x; x is a r.v.
    [Show full text]
  • A CART-Based Genetic Algorithm for Constructing Higher Accuracy Decision Trees
    A CART-based Genetic Algorithm for Constructing Higher Accuracy Decision Trees Elif Ersoy a, Erinç Albey b and Enis Kayış c Department of Industrial Engineering, Özyeğin University, 34794, Istanbul, Turkey Keywords: Decision Tree, Heuristic, Genetic Algorithm, Metaheuristic. Abstract: Decision trees are among the most popular classification methods due to ease of implementation and simple interpretation. In traditional methods like CART (classification and regression tree), ID4, C4.5; trees are constructed by myopic, greedy top-down induction strategy. In this strategy, the possible impact of future splits in the tree is not considered while determining each split in the tree. Therefore, the generated tree cannot be the optimal solution for the classification problem. In this paper, to improve the accuracy of the decision trees, we propose a genetic algorithm with a genuine chromosome structure. We also address the selection of the initial population by considering a blend of randomly generated solutions and solutions from traditional, greedy tree generation algorithms which is constructed for reduced problem instances. The performance of the proposed genetic algorithm is tested using different datasets, varying bounds on the depth of the resulting trees and using different initial population blends within the mentioned varieties. Results reveal that the performance of the proposed genetic algorithm is superior to that of CART in almost all datasets used in the analysis. 1 INTRODUCTION guided by the training data (xi, yi), i = 1, . , n. (Bertsimas and Dunn, 2017). DTs recursively Classification is a technique that identifies the partition the training data’s feature space through categories/labels of unknown observations/data splits and assign a label(class) to each partition.
    [Show full text]
  • FDR and Bayesian Multiple Comparisons Rules
    Johns Hopkins University Johns Hopkins University, Dept. of Biostatistics Working Papers Year Paper FDR and Bayesian Multiple Comparisons Rules Peter Muller ∗ Giovanni Parmigiani † Kenneth Rice ‡ ∗M.D. Anderson Cancer Center †The Sydney Kimmel Comprehensive Cancer Center, Johns Hopkins University & Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, [email protected] ‡University of Washington This working paper is hosted by The Berkeley Electronic Press (bepress) and may not be commer- cially reproduced without the permission of the copyright holder. http://www.bepress.com/jhubiostat/paper115 Copyright c 2006 by the authors. FDR and Bayesian Multiple Comparisons Rules Peter Muller, Giovanni Parmigiani, and Kenneth Rice Abstract We discuss Bayesian approaches to multiple comparison problems, using a de- cision theoretic perspective to critically compare competing approaches. We set up decision problems that lead to the use of FDR-based rules and generalizations. Alternative definitions of the probability model and the utility function lead to dif- ferent rules and problem-specific adjustments. Using a loss function that controls realized FDR we derive an optimal Bayes rule that is a variation of the Benjamini and Hochberg (1995) procedure. The cutoff is based on increments in ordered posterior probabilities instead of ordered p- values. Throughout the discussion we take a Bayesian perspective. In particular, we focus on conditional expected FDR, conditional on the data. Variations of the probability model include explicit modeling for dependence. Variations of the utility function include weighting by the extent of a true negative and accounting for the impact in the final decision. Proc. Valencia / ISBA 8th World Meeting on Bayesian Statistics Benidorm (Alicante, Spain), June 1st–6th, 2006 FDR and Bayesian Multiple Comparisons Rules Peter Muller,¨ Giovanni Parmigiani & Kenneth Rice M.D.
    [Show full text]
  • 1 Introduction
    Optimal Decision Rules for Weak GMM By Isaiah Andrews1 and Anna Mikusheva2 Abstract This paper studies optimal decision rules, including estimators and tests, for weakly identi- fied GMM models. We derive the limit experiment for weakly identified GMM, and propose a theoretically-motivated class of priors which give rise to quasi-Bayes decision rules as a limiting case. Together with results in the previous literature, this establishes desirable properties for the quasi-Bayes approach regardless of model identification status, and we recommend quasi-Bayes for settings where identification is a concern. We further propose weighted average power- optimal identification-robust frequentist tests and confidence sets, and prove a Bernstein-von Mises-type result for the quasi-Bayes posterior under weak identification. Keywords: Limit Experiment, Quasi Bayes, Weak Identification, Nonlinear GMM JEL Codes: C11, C12, C20 First draft: July 2020. This draft: July 2021. 1 Introduction Weak identification arises in a wide range of empirical settings. Weakly identified non- linear models have objective functions which are near-flat in certain directions, or have multiple near-optima. Standard asymptotic approximations break down when identifi- cation is weak, resulting in biased and non-normal estimates, as well as invalid standard errors and confidence sets. Further, existing optimality results do not apply in weakly 1Harvard Department of Economics, Littauer Center M18, Cambridge, MA 02138. Email ian- [email protected]. Support from the National Science Foundation under grant number 1654234, and from the Sloan Research Fellowship is gratefully acknowledged. 2Department of Economics, M.I.T., 50 Memorial Drive, E52-526, Cambridge, MA, 02142. Email: [email protected].
    [Show full text]