Common Distributions in Stochastic Analysis and Inverse Modeling Presented by Gordon A

Total Page:16

File Type:pdf, Size:1020Kb

Common Distributions in Stochastic Analysis and Inverse Modeling Presented by Gordon A Common Distributions in Stochastic Analysis and Inverse Modeling Presented by Gordon A. Fenton 63 Common Discrete Distributions Bernoulli Trials: - form the basis for 6 very common distributions (binomial, geometric, negative binomial, Poisson, exponential, and gamma) Fundamental Assumptions; 1. Each trial has two possible outcomes (0/1, true/false, success/failure, on/off, yes/no, etc) 2. Trials are independent (allows us to easily compute probabilities) 3. Probability of “success” remains constant from trial to trial Note: if we relax assumptions 1 and 2 we get Markov Chains, another suite of powerful stochastic models 64 Bernoulli Trials Let Xii =1 if the 'th trial is a "success" = 0 if not then the Bernoulli probability distribution is P1[X i = ] = p P[X i = 0] =1 − p = q 65 Bernoulli Trials 1 E[X i ] =∑ jXP[ i = j] =+= 0( q ) 1( p ) p j=0 1 2= 2= =+=22 EXii∑ jXP[ j] 0( q ) 1() pp j=0 =22 − 2 =−= Var[X i ] EEXi [ Xi ] pp pq 66 Common Discrete Distributions Bernoulli Family of Distributions Discrete Trials Every “Instant” Becomes a Trial 1) Binomial: 4) Poisson: N n = number of “successes” N t = number of “successes” in n trials in time t 2) Geometric: 5) Exponential: (continuous) T 1 = number of trials until T 1 = time to the first the first “success” “success” 3) Negative Binomial: 6) Gamma: (continuous) T k = number of trials until T k = time to the k’th the k’th “success” “success” 67 Binomial Distribution If N n = number of “successes” in n trials, then n Nn = X12+XX ++ n = ∑ X i i=1 since X i = 1 if the trial results in a “success” and 0 if not. nn n = = = = Mean: E[Nn ] EE∑∑Xii[ Xp] ∑np ii=11= i= 1 Variance: nn = = Var[Nn ] Var ∑∑XXi Var[ i ] (since independent) ii=11= n =∑ pq = npq i=1 68 Binomial Distribution Consider P2 [N5 = ] =P {SSFFF} ∪{ SFSFF} ∪∪ { FFFSS} =PP[SSFFF] +[ SFSFF] ++ P[ FFFSS] = PPP[SSFFF] [ ] [ ] P[ ] P[ ] ++ PP[ SFSF] [ ] PP[ ] [ ] P[F] =pp23qq +23 ++ p 23 q 5 23 = pq 2 In general the binomial distribution is n k nk− P[Nn = k] = pq k 69 Binomial Distribution n k nk− P[Npn = kq] = k n = 10 p = 0.4 70 Binomial Distribution: Example Suppose that the probability of failure, p f , of a slope is investigated using Monte Carlo simulation. That is, a series of n = 5000 independent realizations of the slope strength are simulated from a given probability distribution and the number of failures counted to be n f = 120 . What is the estimated failure probability and what is the ‘standard error’ (standard deviation) of this estimate? Solution: each realization is an independent trial with two possible outcomes, success or failure, having constant probability of failure ( p f ). Thus, each realization is a Bernoulli trial, and the number of failures is a realization of N n , which follows a binomial distribution. 71 Binomial Distribution: Example We have nn= 5000 and f = 120 The failure probability is estimated by number of failures 120 pˆ = = = 0.02 4 f number of trials 5000 Nn In general pˆ f = n N 1 so that σ 2 =Varpnˆ = Var n = Var[ ] pˆ f f 2 nn npq pq = = nn2 which, for pp ˆ f = 0.024 gives us 0.024( 1− 0.024) σ = 0.0022 pˆ f 5000 72 Geometric Distribution Let T 1 = number of trials until the first “success” 3 Consider PP[T1 = 4] =[ FFFS] = pq k−1 In general P[T1 = k] = pq ∞ k−1 1 E[T1 ] =∑kpq = k=1 p ∞ − 1 q =2 −2 =21k −= Var[T1 ] EET1 [T1 ] ∑ k pq 2 2 k=1 p p 73 Geometric Distribution Because all trials are independent, the geometric distribution is “memoryless”. That is, it doesn’t matter when we start looking at a sequence of trials – the distribution of the number of trials to the next “success” remains the same. That is, ∞ m−1 P[T >+jk] ∑ = ++ pq P[Tq>+j kT| > j] = 1 = mjk1 = k 1 1 ∞ n−1 P[T1 > j] ∑ nj= +1 pq and ∞ ∞∞ mm−1 km k1 k P[T1 > k] =∑ pq = p ∑∑ q = pq q = pq = q mk=+==1 mk m 0 1− q 74 Geometric Distribution k−1 P[T1 = k] = pq p = 0.4 75 Geometric Distribution: Example A series of piles have been designed to be able to withstand a certain test load with probability p = 0.8 . If the resulting piles are sequentially tested at that design load, what is the probability that the first pile to fail the test is the 7’th pile tested? Solution: If piles can be assumed to fail independently with constant probability, then this is again a sequence of Bernoulli trials. We thus want to compute 76−1 P[T1 = 7] =pq = (0.8)(0.2) = 0.00005 76 Negative Binomial Distribution Let T k = the number of trials until the k’th “success” Consider PP[T3 = 5] =[{ SSFFS} ∪{ SFSFS} ∪∪ { FFSSS}] =PP[SSFFS] +[ SFSFS] ++ P[ FFSSS] 4 32 = p q 2 m −1 k mk− In general P[Tk = m] = pq k −1 77 Negative Binomial Distribution The negative binomial distribution is a generalization of the geometric distribution (the k’th “success”). Its mean and variance can be obtained by realizing that T k is the sum of k independent T 1 's . That is TTT= + ++ T k 1112 1k so that kk k E[Tk ] =EETT1 =[ 1 ] = ∑∑j jj=1 =1 p and kk Vra [T ] = Var TT= Var[ ] (since independent) k ∑∑1j 1 jj=1 =1 kq =kVar[T1 ] = 2 p 78 Negative Binomial Distribution m −1 k mk− P[Tk = mp] = q k −1 p = 0.4 m = 3 79 Negative Binomial Distribution: Example A series of piles have been designed to be able to withstand a certain test load with probability p = 0.8 . If the resulting piles are sequentially tested at that design load, what is the probability that the third pile to fail the test is the 7’th pile tested? Solution: If piles can be assumed to fail independently with constant probability, then this is again a sequence of Bernoulli trials. We thus want to compute k −−1 m km− 71 37−3 P7[T3 = ] = pq = (0.8) (0.2) m −−1 31 6 34 = (0.8) ( 0.2) = 0. 00082 2 80 Poisson Distribution We now let every instant in time (or space) become a Bernoulli trial (i.e., a sequence of independent trials) “Successes” can now occur at any instant in time. PROBLEM: in any time interval, there are now an infinite number of Bernoulli trials. If the probability of “success” of a trial is, say, 0.2, then in any time interval we get an infinite number of “successes”. This is not a very useful model! SOLUTION: we can no longer talk about probability of “success” of individual trials, p (which becomes zero). We must now talk about the mean rate of “success”, λ . 81 Poisson Distribution Governs many rate dependent processes (arrivals of vehicles at an intersection, ‘packets’ through an internet gateway, earthquakes exceeding magnitude M, floods, load extremes, etc.) Let N t = number of “successes” (arrivals) in time interval t The distribution of N t is arrived at in the limit as the number of trials goes to infinity assuming the probability of “success” is proportional to the number of trials (see the course notes for details). We get (λt)k P[Nk=] = e−λt t k! 82 Poisson Distribution t = 4.5 λ = 0.9 (λt)k P[Ne= k] = −λt t k! 83 Poisson Distribution: Example Many research papers suggest that the arrivals of earthquakes follow a Poisson process over time. Suppose that the mean time between earthquakes is 50 years at a particular location. 1. How long must the time period be so that the probability that no earthquakes occur during that period is at most 0.1? Solution: λ =1/ E[T1 ] = 1/ 50 = 0.02 1. Let N t be the number of earthquakes over the time interval t. We want to find t such that 0 (λt) −−λλ P[Ne= 0] =tt = e ≤ 0.1 t 0! this gives t ≥−ln( 0.1) / λ = −ln( 0.1) / 0.02= 115 years 84 Poisson Distribution: Example 2. Suppose that 50 years pass without any earthquakes occurring. What is the probability that another 50 years will pass without any earthquakes occurring? Solution: PP[NN100 =∩=00N50 ] [ 100 = 0] P0[N100= |0N 50 = ] = = PP[NN50 = 00] [ 50 = ] −100λ e −−λ = =ee50 = 1 = 0.368 e−50λ Note that due to memorylessness, −−510λ P[N50 = 0] =ee = = 0.368 85 Simulating a Poisson Distribution Since the Poisson process derives from an infinite number of independent and equilikely Bernoulli trials over time (or space), its simulation is simple: uniformly distributed (equilikely) on any interval. 86 Common Continuous Distributions The Bernoulli Family extends to two continuous distributions when every instant in time (space) becomes a Bernoulli trial; 1. Exponential: T 1 = time until the next “success” (arrival). This is the continuous time analog to the geometric distribution. 2. Gamma: T k = time until the k’th success (or arrival). This is the continuous time analog to the negative binomial distribution. 87 Exponential Distribution Consider the Poisson Distribution which governs the number of “successes” in time interval t when every instant is an independent Bernoulli trial. We know that the Poisson distribution specifies that (λt)k P[Ne= k] = −λt t k! What is the probability that the time to the first “success” is greater than t? If the first “success” occurs later than t, then the number of successes in time interval t, must be zero, i.e.
Recommended publications
  • 3.2.3 Binomial Distribution
    3.2.3 Binomial Distribution The binomial distribution is based on the idea of a Bernoulli trial. A Bernoulli trail is an experiment with two, and only two, possible outcomes. A random variable X has a Bernoulli(p) distribution if 8 > <1 with probability p X = > :0 with probability 1 − p, where 0 ≤ p ≤ 1. The value X = 1 is often termed a “success” and X = 0 is termed a “failure”. The mean and variance of a Bernoulli(p) random variable are easily seen to be EX = (1)(p) + (0)(1 − p) = p and VarX = (1 − p)2p + (0 − p)2(1 − p) = p(1 − p). In a sequence of n identical, independent Bernoulli trials, each with success probability p, define the random variables X1,...,Xn by 8 > <1 with probability p X = i > :0 with probability 1 − p. The random variable Xn Y = Xi i=1 has the binomial distribution and it the number of sucesses among n independent trials. The probability mass function of Y is µ ¶ ¡ ¢ n ¡ ¢ P Y = y = py 1 − p n−y. y For this distribution, t n EX = np, Var(X) = np(1 − p),MX (t) = [pe + (1 − p)] . 1 Theorem 3.2.2 (Binomial theorem) For any real numbers x and y and integer n ≥ 0, µ ¶ Xn n (x + y)n = xiyn−i. i i=0 If we take x = p and y = 1 − p, we get µ ¶ Xn n 1 = (p + (1 − p))n = pi(1 − p)n−i. i i=0 Example 3.2.2 (Dice probabilities) Suppose we are interested in finding the probability of obtaining at least one 6 in four rolls of a fair die.
    [Show full text]
  • Probability Cheatsheet V2.0 Thinking Conditionally Law of Total Probability (LOTP)
    Probability Cheatsheet v2.0 Thinking Conditionally Law of Total Probability (LOTP) Let B1;B2;B3; :::Bn be a partition of the sample space (i.e., they are Compiled by William Chen (http://wzchen.com) and Joe Blitzstein, Independence disjoint and their union is the entire sample space). with contributions from Sebastian Chiu, Yuan Jiang, Yuqi Hou, and Independent Events A and B are independent if knowing whether P (A) = P (AjB )P (B ) + P (AjB )P (B ) + ··· + P (AjB )P (B ) Jessy Hwang. Material based on Joe Blitzstein's (@stat110) lectures 1 1 2 2 n n A occurred gives no information about whether B occurred. More (http://stat110.net) and Blitzstein/Hwang's Introduction to P (A) = P (A \ B1) + P (A \ B2) + ··· + P (A \ Bn) formally, A and B (which have nonzero probability) are independent if Probability textbook (http://bit.ly/introprobability). Licensed and only if one of the following equivalent statements holds: For LOTP with extra conditioning, just add in another event C! under CC BY-NC-SA 4.0. Please share comments, suggestions, and errors at http://github.com/wzchen/probability_cheatsheet. P (A \ B) = P (A)P (B) P (AjC) = P (AjB1;C)P (B1jC) + ··· + P (AjBn;C)P (BnjC) P (AjB) = P (A) P (AjC) = P (A \ B1jC) + P (A \ B2jC) + ··· + P (A \ BnjC) P (BjA) = P (B) Last Updated September 4, 2015 Special case of LOTP with B and Bc as partition: Conditional Independence A and B are conditionally independent P (A) = P (AjB)P (B) + P (AjBc)P (Bc) given C if P (A \ BjC) = P (AjC)P (BjC).
    [Show full text]
  • 4. Stochas C Processes
    51 4. Stochasc processes Be able to formulate models for stochastic processes, and understand how they can be used for estimation and prediction. Be familiar with calculation techniques for memoryless stochastic processes. Understand the classification of discrete state space Markov chains, and be able to calculate the stationary distribution, and recognise limit theorems. Science is often concerned with the laws that describe how a system changes over time, such as Newton’s laws of motion. When we use probabilistic laws to describe how the system changes, the system is called a stochastic process. We used a stochastic process model in Section 1.2 to analyse Bitcoin; the probabilistic part of our model was the randomness of who generates the next Bitcoin block, be it the attacker or the rest of the peer-to-peer network. In Part II, you will come across stochastic process models in several courses: • in Computer Systems Modelling they are used to describe discrete event simulations of communications networks • in Machine Learning and Bayesian Inference they are used for computing posterior distributions • in Information Theory they are used to describe noisy communications channels, and also the data streams sent over such channels. 4.1. Markov chains Example 4.1. The Russian mathematician Andrei Markov (1856–1922) invented a new type of probabilistic model, now given his name, and his first application was to model Pushkin’s poem Eugeny Onegin. He suggested the following method for generating a stream of text C = (C ;C ;C ;::: ) where each C is an alphabetic character: 0 1 2 n As usual, we write C for the random 1 alphabet = [ ’a ’ , ’b ’ ,...] # a l l p o s s i b l e c h a r a c t e r s i n c l .
    [Show full text]
  • 5 Stochastic Processes
    5 Stochastic Processes Contents 5.1. The Bernoulli Process ...................p.3 5.2. The Poisson Process .................. p.15 1 2 Stochastic Processes Chap. 5 A stochastic process is a mathematical model of a probabilistic experiment that evolves in time and generates a sequence of numerical values. For example, a stochastic process can be used to model: (a) the sequence of daily prices of a stock; (b) the sequence of scores in a football game; (c) the sequence of failure times of a machine; (d) the sequence of hourly traffic loads at a node of a communication network; (e) the sequence of radar measurements of the position of an airplane. Each numerical value in the sequence is modeled by a random variable, so a stochastic process is simply a (finite or infinite) sequence of random variables and does not represent a major conceptual departure from our basic framework. We are still dealing with a single basic experiment that involves outcomes gov- erned by a probability law, and random variables that inherit their probabilistic † properties from that law. However, stochastic processes involve some change in emphasis over our earlier models. In particular: (a) We tend to focus on the dependencies in the sequence of values generated by the process. For example, how do future prices of a stock depend on past values? (b) We are often interested in long-term averages,involving the entire se- quence of generated values. For example, what is the fraction of time that a machine is idle? (c) We sometimes wish to characterize the likelihood or frequency of certain boundary events.
    [Show full text]
  • 1 Normal Distribution
    1 Normal Distribution. 1.1 Introduction A Bernoulli trial is simple random experiment that ends in success or failure. A Bernoulli trial can be used to make a new random experiment by repeating the Bernoulli trial and recording the number of successes. Now repeating a Bernoulli trial a large number of times has an irritating side e¤ect. Suppose we take a tossed die and look for a 3 to come up, but we do this 6000 times. 1 This is a Bernoulli trial with a probability of success of 6 repeated 6000 times. What is the probability that we will see exactly 1000 success? This is de…nitely the most likely possible outcome, 1000 successes out of 6000 tries. But it is still very unlikely that an particular experiment like this will turn out so exactly. In fact, if 6000 tosses did produce exactly 1000 successes, that would be rather suspicious. The probability of exactly 1000 successes in 6000 tries almost does not need to be calculated. whatever the probability of this, it will be very close to zero. It is probably too small a probability to be of any practical use. It turns out that it is not all that bad, 0:014. Still this is small enough that it means that have even a chance of seeing it actually happen, we would need to repeat the full experiment as many as 100 times. All told, 600,000 tosses of a die. When we repeat a Bernoulli trial a large number of times, it is unlikely that we will be interested in a speci…c number of successes, and much more likely that we will be interested in the event that the number of successes lies within a range of possibilities.
    [Show full text]
  • Bernoulli Random Forests: Closing the Gap Between Theoretical Consistency and Empirical Soundness
    Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16) Bernoulli Random Forests: Closing the Gap between Theoretical Consistency and Empirical Soundness , , , ? Yisen Wang† ‡, Qingtao Tang† ‡, Shu-Tao Xia† ‡, Jia Wu , Xingquan Zhu ⇧ † Dept. of Computer Science and Technology, Tsinghua University, China ‡ Graduate School at Shenzhen, Tsinghua University, China ? Quantum Computation & Intelligent Systems Centre, University of Technology Sydney, Australia ⇧ Dept. of Computer & Electrical Engineering and Computer Science, Florida Atlantic University, USA wangys14, tqt15 @mails.tsinghua.edu.cn; [email protected]; [email protected]; [email protected] { } Traditional Bernoulli Trial Controlled Abstract Tree Node Splitting Tree Node Splitting Random forests are one type of the most effective ensemble learning methods. In spite of their sound Random Attribute Bagging Bernoulli Trial Controlled empirical performance, the study on their theoreti- Attribute Bagging cal properties has been left far behind. Recently, several random forests variants with nice theoreti- Random Structure/Estimation cal basis have been proposed, but they all suffer Random Bootstrap Sampling Points Splitting from poor empirical performance. In this paper, we (a) Breiman RF (b) BRF propose a Bernoulli random forests model (BRF), which intends to close the gap between the theoreti- Figure 1: Comparisons between Breiman RF (left panel) vs. cal consistency and the empirical soundness of ran- the proposed BRF (right panel). The tree node splitting of dom forests classification. Compared to Breiman’s Breiman RF is deterministic, so the final trees are highly data- original random forests, BRF makes two simplifi- dependent. Instead, BRF employs two Bernoulli distributions cations in tree construction by using two indepen- to control the tree construction.
    [Show full text]
  • (Introduction to Probability at an Advanced Level) - All Lecture Notes
    Fall 2018 Statistics 201A (Introduction to Probability at an advanced level) - All Lecture Notes Aditya Guntuboyina August 15, 2020 Contents 0.1 Sample spaces, Events, Probability.................................5 0.2 Conditional Probability and Independence.............................6 0.3 Random Variables..........................................7 1 Random Variables, Expectation and Variance8 1.1 Expectations of Random Variables.................................9 1.2 Variance................................................ 10 2 Independence of Random Variables 11 3 Common Distributions 11 3.1 Ber(p) Distribution......................................... 11 3.2 Bin(n; p) Distribution........................................ 11 3.3 Poisson Distribution......................................... 12 4 Covariance, Correlation and Regression 14 5 Correlation and Regression 16 6 Back to Common Distributions 16 6.1 Geometric Distribution........................................ 16 6.2 Negative Binomial Distribution................................... 17 7 Continuous Distributions 17 7.1 Normal or Gaussian Distribution.................................. 17 1 7.2 Uniform Distribution......................................... 18 7.3 The Exponential Density...................................... 18 7.4 The Gamma Density......................................... 18 8 Variable Transformations 19 9 Distribution Functions and the Quantile Transform 20 10 Joint Densities 22 11 Joint Densities under Transformations 23 11.1 Detour to Convolutions......................................
    [Show full text]
  • Probability Theory
    Probability Theory Course Notes — Harvard University — 2011 C. McMullen March 29, 2021 Contents I TheSampleSpace ........................ 2 II Elements of Combinatorial Analysis . 5 III RandomWalks .......................... 15 IV CombinationsofEvents . 24 V ConditionalProbability . 29 VI The Binomial and Poisson Distributions . 37 VII NormalApproximation. 44 VIII Unlimited Sequences of Bernoulli Trials . 55 IX Random Variables and Expectation . 60 X LawofLargeNumbers...................... 68 XI Integral–Valued Variables. Generating Functions . 70 XIV RandomWalkandRuinProblems . 70 I The Exponential and the Uniform Density . 75 II Special Densities. Randomization . 94 These course notes accompany Feller, An Introduction to Probability Theory and Its Applications, Wiley, 1950. I The Sample Space Some sources and uses of randomness, and philosophical conundrums. 1. Flipped coin. 2. The interrupted game of chance (Fermat). 3. The last roll of the game in backgammon (splitting the stakes at Monte Carlo). 4. Large numbers: elections, gases, lottery. 5. True randomness? Quantum theory. 6. Randomness as a model (in reality only one thing happens). Paradox: what if a coin keeps coming up heads? 7. Statistics: testing a drug. When is an event good evidence rather than a random artifact? 8. Significance: among 1000 coins, if one comes up heads 10 times in a row, is it likely to be a 2-headed coin? Applications to economics, investment and hiring. 9. Randomness as a tool: graph theory; scheduling; internet routing. We begin with some previews. Coin flips. What are the chances of 10 heads in a row? The probability is 1/1024, less than 0.1%. Implicit assumptions: no biases and independence. 10 What are the chance of heads 5 out of ten times? ( 5 = 252, so 252/1024 = 25%).
    [Show full text]
  • Download English-US Transcript (PDF)
    MITOCW | MITRES6_012S18_L06-06_300k We will now work with a geometric random variable and put to use our understanding of conditional PMFs and conditional expectations. Remember that a geometric random variable corresponds to the number of independent coin tosses until the first head occurs. And here p is a parameter that describes the coin. It is the probability of heads at each coin toss. We have already seen the formula for the geometric PMF and the corresponding plot. We will now add one very important property which is usually called Memorylessness. Ultimately, this property has to do with the fact that independent coin tosses do not have any memory. Past coin tosses do not affect future coin tosses. So consider a coin-tossing experiment with independent tosses and let X be the number of tosses until the first heads. And X is a geometric with parameter p. Suppose that you show up a little after the experiment has started. And you're told that there was so far just one coin toss. And that this coin toss resulted in tails. Now you have to take over and carry out the remaining tosses until heads are observed. What should your model be about the future? Well, you will be making independent coin tosses until the first heads. So the number of such tosses will be a random variable, which is geometric with parameter p. So this duration-- as far as you are concerned-- is geometric with parameter p. Therefore, the number of remaining coin tosses starting from here-- given that the first toss was tails-- has the same geometric distribution as the original random variable X.
    [Show full text]
  • Discrete Distributions: Empirical, Bernoulli, Binomial, Poisson
    Empirical Distributions An empirical distribution is one for which each possible event is assigned a probability derived from experimental observation. It is assumed that the events are independent and the sum of the probabilities is 1. An empirical distribution may represent either a continuous or a discrete distribution. If it represents a discrete distribution, then sampling is done “on step”. If it represents a continuous distribution, then sampling is done via “interpolation”. The way the table is described usually determines if an empirical distribution is to be handled discretely or continuously; e.g., discrete description continuous description value probability value probability 10 .1 0 – 10- .1 20 .15 10 – 20- .15 35 .4 20 – 35- .4 40 .3 35 – 40- .3 60 .05 40 – 60- .05 To use linear interpolation for continuous sampling, the discrete points on the end of each step need to be connected by line segments. This is represented in the graph below by the green line segments. The steps are represented in blue: rsample 60 50 40 30 20 10 0 x 0 .5 1 In the discrete case, sampling on step is accomplished by accumulating probabilities from the original table; e.g., for x = 0.4, accumulate probabilities until the cumulative probability exceeds 0.4; rsample is the event value at the point this happens (i.e., the cumulative probability 0.1+0.15+0.4 is the first to exceed 0.4, so the rsample value is 35). In the continuous case, the end points of the probability accumulation are needed, in this case x=0.25 and x=0.65 which represent the points (.25,20) and (.65,35) on the graph.
    [Show full text]
  • Randomization-Based Inference for Bernoulli Trial Experiments And
    Article Statistical Methods in Medical Research 2019, Vol. 28(5) 1378–1398 ! The Author(s) 2018 Randomization-based inference for Article reuse guidelines: sagepub.com/journals-permissions Bernoulli trial experiments and DOI: 10.1177/0962280218756689 implications for observational studies journals.sagepub.com/home/smm Zach Branson and Marie-Abe`le Bind Abstract We present a randomization-based inferential framework for experiments characterized by a strongly ignorable assignment mechanism where units have independent probabilities of receiving treatment. Previous works on randomization tests often assume these probabilities are equal within blocks of units. We consider the general case where they differ across units and show how to perform randomization tests and obtain point estimates and confidence intervals. Furthermore, we develop rejection-sampling and importance-sampling approaches for conducting randomization-based inference conditional on any statistic of interest, such as the number of treated units or forms of covariate balance. We establish that our randomization tests are valid tests, and through simulation we demonstrate how the rejection-sampling and importance-sampling approaches can yield powerful randomization tests and thus precise inference. Our work also has implications for observational studies, which commonly assume a strongly ignorable assignment mechanism. Most methodologies for observational studies make additional modeling or asymptotic assumptions, while our framework only assumes the strongly ignorable assignment
    [Show full text]
  • Some Basic Probabilistic Processes
    CHAPTER FOUR some basic probabilistic processes This chapter presents a few simple probabilistic processes and develops family relationships among the PMF's and PDF's associated with these processes. Although we shall encounter many of the most common PMF's and PDF's here, it is not our purpose to develop a general catalogue. A listing of the most frequently occurring PRfF's and PDF's and some of their properties appears as an appendix at the end of this book. 4-1 The 8er~ouil1Process A single Bernoulli trial generates an experimental value of discrete random variable x, described by the PMF to expand pkT(z) in a power series and then note the coefficient of zko in this expansion, recalling that any z transform may be written in the form pkT(~)= pk(0) + zpk(1) + z2pk(2) + ' ' This leads to the result known as the binomial PMF, 1-P xo-0 OlPll xo= 1 otherwise where the notation is the common Random variable x, as described above, is known as a Bernoulli random variable, and we note that its PMF has the z transform discussed in Sec. 1-9. Another way to derive the binomial PMF would be to work in a sequential sample space for an experiment which consists of n independ- The sample space for each Bernoulli trial is of the form ent Bernoulli trials, We have used the notation t success (;'') (;'') = Ifailure1 on the nth trial Either by use of the transform or by direct calculation we find We refer to the outcome of a Bernoulli trial as a success when the ex- Each sample point which represents an outcome of exactly ko suc- perimental value of x is unity and as a failure when the experimental cesses in the n trials would have a probability assignment equal to value of x is zero.
    [Show full text]