Joint Probability Distributions

Total Page:16

File Type:pdf, Size:1020Kb

Joint Probability Distributions ST 370 Probability and Statistics for Engineers Joint Probability Distributions In many random experiments, more than one quantity is measured, meaning that there is more than one random variable. Example: Cell phone flash unit A flash unit is chosen randomly from a production line; its recharge time X (seconds) and flash intensity Y (watt-seconds) are measured. 1 / 21 Joint Probability Distributions ST 370 Probability and Statistics for Engineers Example: Bernoulli trials X1 is the indicator of success on the first trial: ( 1 success on first trial X1 = 0 otherwise and X2, X3, ::: , the indicators for the other trials, are all random variables. 2 / 21 Joint Probability Distributions ST 370 Probability and Statistics for Engineers Two or More Random Variables To make probability statements about several random variables, we need their joint probability distribution. Discrete random variables If X and Y are discrete random variables, they have a joint probability mass function fXY (xi ; yj ) = P(X = xi and Y = yj ): 3 / 21 Joint Probability Distributions Two or More Random Variables ST 370 Probability and Statistics for Engineers Example: Mobile response time A mobile web site is accessed from a smart phone; X is the signal strength, in number of bars, and Y is response time, to the nearest second. x = Number of bars 1 2 3 y = Response time 4+ 0.15 0.10 0.05 3 0.02 0.10 0.05 2 0.02 0.03 0.20 1 0.01 0.02 0.25 4 / 21 Joint Probability Distributions Two or More Random Variables ST 370 Probability and Statistics for Engineers Continuous random variables If X and Y are continuous random variables, they have a joint probability density function fXY (x; y), with the interpretation Z b Z d P(a ≤ X ≤ b and c ≤ Y ≤ d) = fXY (x; y)dy dx: a c If one random variable is discrete and the other is continuous, the joint distribution is more complex. In all cases, they have a joint cumulative distribution function FXY (x; y) = P(X ≤ x and Y ≤ y): 5 / 21 Joint Probability Distributions Two or More Random Variables ST 370 Probability and Statistics for Engineers Marginal probability distributions Since X is a random variable, it also has its own probability distribution, ignoring the value of Y , called its marginal probability distribution. Discrete case: fX (xi ) = P(X = xi ) = P(X = xi and Y takes any value) X = P(X = xi ; Y = yj ) j X = fXY (xi ; yj ); and similarly j X fY (yj ) = fXY (xi ; yj ): i 6 / 21 Joint Probability Distributions Two or More Random Variables ST 370 Probability and Statistics for Engineers Example: Mobile response time Marginal distributions of X and Y : x = Number of bars 1 2 3 Marginal y = Response time 4+ 0.15 0.10 0.05 0.30 3 0.02 0.10 0.05 0.17 2 0.02 0.03 0.20 0.25 1 0.01 0.02 0.25 0.28 Marginal 0.20 0.25 0.55 7 / 21 Joint Probability Distributions Two or More Random Variables ST 370 Probability and Statistics for Engineers Continuous case: Z 1 fX (x) = fXY (x; y)dy: −∞ and Z 1 fY (y) = fXY (x; y)dx: −∞ 8 / 21 Joint Probability Distributions Two or More Random Variables ST 370 Probability and Statistics for Engineers Cumulative distribution: FX (x) = P(X ≤ x) = P(X ≤ x; Y takes any value) = P(X ≤ x; Y < 1) = FXY (x; 1) and FY (y) = FXY (1; y): 9 / 21 Joint Probability Distributions Two or More Random Variables ST 370 Probability and Statistics for Engineers Conditional probability distributions Suppose that X and Y are discrete random variables, and that we observe the value of X : X = xi for one of its values xi . What does that tell us about Y ? Recall conditional probability: P(Y = yj \ X = xi ) P(Y = yj jX = xi ) = P(X = xi ) f (x ; y ) = XY i j : fX (xi ) This is the conditional probability mass function of Y given X = xi , written fY jX (yjxi ). 10 / 21 Joint Probability Distributions Two or More Random Variables ST 370 Probability and Statistics for Engineers Example: Mobile response time Conditional distributions of Y given X : x = Number of bars 1 2 3 y = Response time 4+ 0.750 0.400 0.091 3 0.100 0.400 0.091 2 0.100 0.120 0.364 1 0.050 0.080 0.454 Total 1 1 1 11 / 21 Joint Probability Distributions Two or More Random Variables ST 370 Probability and Statistics for Engineers When X and Y are continuous random variables, the conditional probability density function of Y given X is also defined as a ratio: fXY (x; y) fY jX (yjx) = ; fX (x) but the reason is less clear: P(X = x) = 0, so we cannot simply divide the joint probability by the marginal probability. One approach is to condition on X being near to x, say x − δx ≤ X ≤ x + δx for some small δx > 0, and take the limit as δx # 0. 12 / 21 Joint Probability Distributions Two or More Random Variables ST 370 Probability and Statistics for Engineers Independent random variables In some situations, knowing the value of X gives no information about the value of Y . So the conditional distribution of Y given X is the same as the marginal distribution of Y : fY jX (yjx) = fY (y): In this case, X and Y are said to be independent random variables. 13 / 21 Joint Probability Distributions Two or More Random Variables ST 370 Probability and Statistics for Engineers But fXY (x; y) fY jX (yjx) = ; fX (x) so when X and Y are independent fXY (x; y) = fY (y); fX (x) or fXY (x; y) = fX (x)fY (y): This is true for either the probability density function or the probability mass function. 14 / 21 Joint Probability Distributions Two or More Random Variables ST 370 Probability and Statistics for Engineers So for independent random variables, it is enough to know the marginal probability distributions: the joint probability distribution is just the product of the marginal functions. Example: Cell phone flash unit The recharge time X and flash intensity Y may not be independent: they are both affected by the quality of components such as capacitors, and a defective component may cause both a long recharge time and a low flash intensity. Example: Bernoulli trials We assume that the trials are independent, so the indicator variables X1; X2;::: are also independent. 15 / 21 Joint Probability Distributions Two or More Random Variables ST 370 Probability and Statistics for Engineers Designed experiments When you carry out a designed experiment, such as the replicated two-factor case Yi;j;k = µ + τi + βj + (τβ)i;j + i;j;k ; good technique will ensure that the result of any one run is unaffected by results of other runs. You would then assume that the responses Yi;j;k ; i = 1;:::; a; j = 1;:::; b; k = 1;:::; n are independent random variables. 16 / 21 Joint Probability Distributions Two or More Random Variables ST 370 Probability and Statistics for Engineers Equivalently, you could assume that the random noise terms i;j;k ; i = 1;:::; a; j = 1;:::; b; k = 1;:::; n are independent. We always assume that the noise terms have zero expected value: E(i;j;k ) = 0; and usually also a common variance: 2 V (i;j;k ) = σ : 17 / 21 Joint Probability Distributions Two or More Random Variables ST 370 Probability and Statistics for Engineers In order to find the probability distributions of statistics like the t-ratio and the F -ratio, we shall also assume that the noise terms have Gaussian distributions; that is, i;j;k ; i = 1;:::; a; j = 1;:::; b; k = 1;:::; n are independent random variables, each distributed as N(0; σ2). The joint distribution of these a × b × n random variables is determined by their common N(0; σ2) marginal distribution and the assumption of independence. 18 / 21 Joint Probability Distributions Two or More Random Variables ST 370 Probability and Statistics for Engineers Residual Plots The probability distributions of statistics like the t-ratio and the F -ratio are derived under these assumptions about the random noise terms , so we should try to verify that the assumptions actually hold. We observe the responses Y , but the parameters µ and so on are unknown, so we cannot compute the noise terms . The best we can do is replace the parameters by their estimates, and compute the residuals ^ ei;j;k = yi;j;k − (^µ +τ ^i + βj + (dτβ)i;j ) = yi;j;k − y^i;j;k : 19 / 21 Joint Probability Distributions Residual Plots ST 370 Probability and Statistics for Engineers Four plots of the residuals are often used to look for departures from the assumptions: Residuals vs Fitted values: If E() = 0, the residuals should vary around 0, with no pattern; curvature would suggest that second-order terms are needed. Normal quantile-quantile plot: If the noise terms are Gaussian, the quantile-quantile plot should be close to a straight line; outliers or nonGaussian behavior, especially longer tails, will show up. Scale-Location plot: The y-axis in this plot is pjresidualj, and, if the noise terms have constant variance, the plot should show no trend. Residuals vs Factor Levels: This plot can detect particular factor levels that change either the expected value of or its variance. 20 / 21 Joint Probability Distributions Residual Plots ST 370 Probability and Statistics for Engineers Example: Aircraft paint A replicated two-factor case: paint <- read.csv("Data/Table-14-05.csv") plot(aov(Adhesion ~ factor(Primer) * Method, paint)) Example: Wire bonds A one-predictor regression case: wireBond <- read.csv("Data/Table-01-02.csv") plot(lm(Strength ~ Length, wireBond)) In regression analyses, the fourth plot is replaced by: Residuals vs Leverage: This plot can reveal individual observations that strongly influence the analysis (Section 12-5).
Recommended publications
  • Bayes and the Law
    Bayes and the Law Norman Fenton, Martin Neil and Daniel Berger [email protected] January 2016 This is a pre-publication version of the following article: Fenton N.E, Neil M, Berger D, “Bayes and the Law”, Annual Review of Statistics and Its Application, Volume 3, 2016, doi: 10.1146/annurev-statistics-041715-033428 Posted with permission from the Annual Review of Statistics and Its Application, Volume 3 (c) 2016 by Annual Reviews, http://www.annualreviews.org. Abstract Although the last forty years has seen considerable growth in the use of statistics in legal proceedings, it is primarily classical statistical methods rather than Bayesian methods that have been used. Yet the Bayesian approach avoids many of the problems of classical statistics and is also well suited to a broader range of problems. This paper reviews the potential and actual use of Bayes in the law and explains the main reasons for its lack of impact on legal practice. These include misconceptions by the legal community about Bayes’ theorem, over-reliance on the use of the likelihood ratio and the lack of adoption of modern computational methods. We argue that Bayesian Networks (BNs), which automatically produce the necessary Bayesian calculations, provide an opportunity to address most concerns about using Bayes in the law. Keywords: Bayes, Bayesian networks, statistics in court, legal arguments 1 1 Introduction The use of statistics in legal proceedings (both criminal and civil) has a long, but not terribly well distinguished, history that has been very well documented in (Finkelstein, 2009; Gastwirth, 2000; Kadane, 2008; Koehler, 1992; Vosk and Emery, 2014).
    [Show full text]
  • Estimating the Accuracy of Jury Verdicts
    Institute for Policy Research Northwestern University Working Paper Series WP-06-05 Estimating the Accuracy of Jury Verdicts Bruce D. Spencer Faculty Fellow, Institute for Policy Research Professor of Statistics Northwestern University Version date: April 17, 2006; rev. May 4, 2007 Forthcoming in Journal of Empirical Legal Studies 2040 Sheridan Rd. ! Evanston, IL 60208-4100 ! Tel: 847-491-3395 Fax: 847-491-9916 www.northwestern.edu/ipr, ! [email protected] Abstract Average accuracy of jury verdicts for a set of cases can be studied empirically and systematically even when the correct verdict cannot be known. The key is to obtain a second rating of the verdict, for example the judge’s, as in the recent study of criminal cases in the U.S. by the National Center for State Courts (NCSC). That study, like the famous Kalven-Zeisel study, showed only modest judge-jury agreement. Simple estimates of jury accuracy can be developed from the judge-jury agreement rate; the judge’s verdict is not taken as the gold standard. Although the estimates of accuracy are subject to error, under plausible conditions they tend to overestimate the average accuracy of jury verdicts. The jury verdict was estimated to be accurate in no more than 87% of the NCSC cases (which, however, should not be regarded as a representative sample with respect to jury accuracy). More refined estimates, including false conviction and false acquittal rates, are developed with models using stronger assumptions. For example, the conditional probability that the jury incorrectly convicts given that the defendant truly was not guilty (a “type I error”) was estimated at 0.25, with an estimated standard error (s.e.) of 0.07, the conditional probability that a jury incorrectly acquits given that the defendant truly was guilty (“type II error”) was estimated at 0.14 (s.e.
    [Show full text]
  • General Probability, II: Independence and Conditional Proba- Bility
    Math 408, Actuarial Statistics I A.J. Hildebrand General Probability, II: Independence and conditional proba- bility Definitions and properties 1. Independence: A and B are called independent if they satisfy the product formula P (A ∩ B) = P (A)P (B). 2. Conditional probability: The conditional probability of A given B is denoted by P (A|B) and defined by the formula P (A ∩ B) P (A|B) = , P (B) provided P (B) > 0. (If P (B) = 0, the conditional probability is not defined.) 3. Independence of complements: If A and B are independent, then so are A and B0, A0 and B, and A0 and B0. 4. Connection between independence and conditional probability: If the con- ditional probability P (A|B) is equal to the ordinary (“unconditional”) probability P (A), then A and B are independent. Conversely, if A and B are independent, then P (A|B) = P (A) (assuming P (B) > 0). 5. Complement rule for conditional probabilities: P (A0|B) = 1 − P (A|B). That is, with respect to the first argument, A, the conditional probability P (A|B) satisfies the ordinary complement rule. 6. Multiplication rule: P (A ∩ B) = P (A|B)P (B) Some special cases • If P (A) = 0 or P (B) = 0 then A and B are independent. The same holds when P (A) = 1 or P (B) = 1. • If B = A or B = A0, A and B are not independent except in the above trivial case when P (A) or P (B) is 0 or 1. In other words, an event A which has probability strictly between 0 and 1 is not independent of itself or of its complement.
    [Show full text]
  • Chapter 6 Continuous Random Variables and Probability
    EF 507 QUANTITATIVE METHODS FOR ECONOMICS AND FINANCE FALL 2019 Chapter 6 Continuous Random Variables and Probability Distributions Chap 6-1 Probability Distributions Probability Distributions Ch. 5 Discrete Continuous Ch. 6 Probability Probability Distributions Distributions Binomial Uniform Hypergeometric Normal Poisson Exponential Chap 6-2/62 Continuous Probability Distributions § A continuous random variable is a variable that can assume any value in an interval § thickness of an item § time required to complete a task § temperature of a solution § height in inches § These can potentially take on any value, depending only on the ability to measure accurately. Chap 6-3/62 Cumulative Distribution Function § The cumulative distribution function, F(x), for a continuous random variable X expresses the probability that X does not exceed the value of x F(x) = P(X £ x) § Let a and b be two possible values of X, with a < b. The probability that X lies between a and b is P(a < X < b) = F(b) -F(a) Chap 6-4/62 Probability Density Function The probability density function, f(x), of random variable X has the following properties: 1. f(x) > 0 for all values of x 2. The area under the probability density function f(x) over all values of the random variable X is equal to 1.0 3. The probability that X lies between two values is the area under the density function graph between the two values 4. The cumulative density function F(x0) is the area under the probability density function f(x) from the minimum x value up to x0 x0 f(x ) = f(x)dx 0 ò xm where
    [Show full text]
  • Extremal Dependence Concepts
    Extremal dependence concepts Giovanni Puccetti1 and Ruodu Wang2 1Department of Economics, Management and Quantitative Methods, University of Milano, 20122 Milano, Italy 2Department of Statistics and Actuarial Science, University of Waterloo, Waterloo, ON N2L3G1, Canada Journal version published in Statistical Science, 2015, Vol. 30, No. 4, 485{517 Minor corrections made in May and June 2020 Abstract The probabilistic characterization of the relationship between two or more random variables calls for a notion of dependence. Dependence modeling leads to mathematical and statistical challenges and recent developments in extremal dependence concepts have drawn a lot of attention to probability and its applications in several disciplines. The aim of this paper is to review various concepts of extremal positive and negative dependence, including several recently established results, reconstruct their history, link them to probabilistic optimization problems, and provide a list of open questions in this area. While the concept of extremal positive dependence is agreed upon for random vectors of arbitrary dimensions, various notions of extremal negative dependence arise when more than two random variables are involved. We review existing popular concepts of extremal negative dependence given in literature and introduce a novel notion, which in a general sense includes the existing ones as particular cases. Even if much of the literature on dependence is focused on positive dependence, we show that negative dependence plays an equally important role in the solution of many optimization problems. While the most popular tool used nowadays to model dependence is that of a copula function, in this paper we use the equivalent concept of a set of rearrangements.
    [Show full text]
  • Propensities and Probabilities
    ARTICLE IN PRESS Studies in History and Philosophy of Modern Physics 38 (2007) 593–625 www.elsevier.com/locate/shpsb Propensities and probabilities Nuel Belnap 1028-A Cathedral of Learning, University of Pittsburgh, Pittsburgh, PA 15260, USA Received 19 May 2006; accepted 6 September 2006 Abstract Popper’s introduction of ‘‘propensity’’ was intended to provide a solid conceptual foundation for objective single-case probabilities. By considering the partly opposed contributions of Humphreys and Miller and Salmon, it is argued that when properly understood, propensities can in fact be understood as objective single-case causal probabilities of transitions between concrete events. The chief claim is that propensities are well-explicated by describing how they fit into the existing formal theory of branching space-times, which is simultaneously indeterministic and causal. Several problematic examples, some commonsense and some quantum-mechanical, are used to make clear the advantages of invoking branching space-times theory in coming to understand propensities. r 2007 Elsevier Ltd. All rights reserved. Keywords: Propensities; Probabilities; Space-times; Originating causes; Indeterminism; Branching histories 1. Introduction You are flipping a fair coin fairly. You ascribe a probability to a single case by asserting The probability that heads will occur on this very next flip is about 50%. ð1Þ The rough idea of a single-case probability seems clear enough when one is told that the contrast is with either generalizations or frequencies attributed to populations asserted while you are flipping a fair coin fairly, such as In the long run; the probability of heads occurring among flips is about 50%. ð2Þ E-mail address: [email protected] 1355-2198/$ - see front matter r 2007 Elsevier Ltd.
    [Show full text]
  • Conditional Probability and Bayes Theorem A
    Conditional probability And Bayes theorem A. Zaikin 2.1 Conditional probability 1 Conditional probablity Given events E and F ,oftenweareinterestedinstatementslike if even E has occurred, then the probability of F is ... Some examples: • Roll two dice: what is the probability that the sum of faces is 6 given that the first face is 4? • Gene expressions: What is the probability that gene A is switched off (e.g. down-regulated) given that gene B is also switched off? A. Zaikin 2.2 Conditional probability 2 This conditional probability can be derived following a similar construction: • Repeat the experiment N times. • Count the number of times event E occurs, N(E),andthenumberoftimesboth E and F occur jointly, N(E ∩ F ).HenceN(E) ≤ N • The proportion of times that F occurs in this reduced space is N(E ∩ F ) N(E) since E occurs at each one of them. • Now note that the ratio above can be re-written as the ratio between two (unconditional) probabilities N(E ∩ F ) N(E ∩ F )/N = N(E) N(E)/N • Then the probability of F ,giventhatE has occurred should be defined as P (E ∩ F ) P (E) A. Zaikin 2.3 Conditional probability: definition The definition of Conditional Probability The conditional probability of an event F ,giventhataneventE has occurred, is defined as P (E ∩ F ) P (F |E)= P (E) and is defined only if P (E) > 0. Note that, if E has occurred, then • F |E is a point in the set P (E ∩ F ) • E is the new sample space it can be proved that the function P (·|·) defyning a conditional probability also satisfies the three probability axioms.
    [Show full text]
  • Joint Probability Distributions
    ST 380 Probability and Statistics for the Physical Sciences Joint Probability Distributions In many experiments, two or more random variables have values that are determined by the outcome of the experiment. For example, the binomial experiment is a sequence of trials, each of which results in success or failure. If ( 1 if the i th trial is a success Xi = 0 otherwise; then X1; X2;:::; Xn are all random variables defined on the whole experiment. 1 / 15 Joint Probability Distributions Introduction ST 380 Probability and Statistics for the Physical Sciences To calculate probabilities involving two random variables X and Y such as P(X > 0 and Y ≤ 0); we need the joint distribution of X and Y . The way we represent the joint distribution depends on whether the random variables are discrete or continuous. 2 / 15 Joint Probability Distributions Introduction ST 380 Probability and Statistics for the Physical Sciences Two Discrete Random Variables If X and Y are discrete, with ranges RX and RY , respectively, the joint probability mass function is p(x; y) = P(X = x and Y = y); x 2 RX ; y 2 RY : Then a probability like P(X > 0 and Y ≤ 0) is just X X p(x; y): x2RX :x>0 y2RY :y≤0 3 / 15 Joint Probability Distributions Two Discrete Random Variables ST 380 Probability and Statistics for the Physical Sciences Marginal Distribution To find the probability of an event defined only by X , we need the marginal pmf of X : X pX (x) = P(X = x) = p(x; y); x 2 RX : y2RY Similarly the marginal pmf of Y is X pY (y) = P(Y = y) = p(x; y); y 2 RY : x2RX 4 / 15 Joint
    [Show full text]
  • CONDITIONAL EXPECTATION Definition 1. Let (Ω,F,P)
    CONDITIONAL EXPECTATION 1. CONDITIONAL EXPECTATION: L2 THEORY ¡ Definition 1. Let (­,F ,P) be a probability space and let G be a σ algebra contained in F . For ¡ any real random variable X L2(­,F ,P), define E(X G ) to be the orthogonal projection of X 2 j onto the closed subspace L2(­,G ,P). This definition may seem a bit strange at first, as it seems not to have any connection with the naive definition of conditional probability that you may have learned in elementary prob- ability. However, there is a compelling rationale for Definition 1: the orthogonal projection E(X G ) minimizes the expected squared difference E(X Y )2 among all random variables Y j ¡ 2 L2(­,G ,P), so in a sense it is the best predictor of X based on the information in G . It may be helpful to consider the special case where the σ algebra G is generated by a single random ¡ variable Y , i.e., G σ(Y ). In this case, every G measurable random variable is a Borel function Æ ¡ of Y (exercise!), so E(X G ) is the unique Borel function h(Y ) (up to sets of probability zero) that j minimizes E(X h(Y ))2. The following exercise indicates that the special case where G σ(Y ) ¡ Æ for some real-valued random variable Y is in fact very general. Exercise 1. Show that if G is countably generated (that is, there is some countable collection of set B G such that G is the smallest σ algebra containing all of the sets B ) then there is a j 2 ¡ j G measurable real random variable Y such that G σ(Y ).
    [Show full text]
  • ST 371 (VIII): Theory of Joint Distributions
    ST 371 (VIII): Theory of Joint Distributions So far we have focused on probability distributions for single random vari- ables. However, we are often interested in probability statements concerning two or more random variables. The following examples are illustrative: • In ecological studies, counts, modeled as random variables, of several species are often made. One species is often the prey of another; clearly, the number of predators will be related to the number of prey. • The joint probability distribution of the x, y and z components of wind velocity can be experimentally measured in studies of atmospheric turbulence. • The joint distribution of the values of various physiological variables in a population of patients is often of interest in medical studies. • A model for the joint distribution of age and length in a population of ¯sh can be used to estimate the age distribution from the length dis- tribution. The age distribution is relevant to the setting of reasonable harvesting policies. 1 Joint Distribution The joint behavior of two random variables X and Y is determined by the joint cumulative distribution function (cdf): (1.1) FXY (x; y) = P (X · x; Y · y); where X and Y are continuous or discrete. For example, the probability that (X; Y ) belongs to a given rectangle is P (x1 · X · x2; y1 · Y · y2) = F (x2; y2) ¡ F (x2; y1) ¡ F (x1; y2) + F (x1; y1): 1 In general, if X1; ¢ ¢ ¢ ;Xn are jointly distributed random variables, the joint cdf is FX1;¢¢¢ ;Xn (x1; ¢ ¢ ¢ ; xn) = P (X1 · x1;X2 · x2; ¢ ¢ ¢ ;Xn · xn): Two- and higher-dimensional versions of probability distribution functions and probability mass functions exist.
    [Show full text]
  • Recent Progress on Conditional Randomness (Probability Symposium)
    数理解析研究所講究録 39 第2030巻 2017年 39-46 Recent progress on conditional randomness * Hayato Takahashi Random Data Lab. Abstract We review the recent progress on the definition of randomness with respect to conditional probabilities and a generalization of van Lambal‐ gen theorem (Takahashi 2006, 2008, 2009, 2011). In addition we show a new result on the random sequences when the conditional probabili‐ tie \mathrm{s}^{\backslash }\mathrm{a}\mathrm{r}\mathrm{e} mutually singular, which is a generalization of Kjos Hanssens theorem (2010). Finally we propose a definition of random sequences with respect to conditional probability and argue the validity of the definition from the Bayesian statistical point of view. Keywords: Martin‐Löf random sequences, Lambalgen theo‐ rem, conditional probability, Bayesian statistics 1 Introduction The notion of conditional probability is one of the main idea in probability theory. In order to define conditional probability rigorously, Kolmogorov introduced measure theory into probability theory [4]. The notion of ran‐ domness is another important subject in probability and statistics. In statistics, the set of random points is defined as the compliment of give statistical tests. In practice, data is finite and statistical test is a set of small probability, say 3%, with respect to null hypothesis. In order to dis‐ cuss whether a point is random or not rigorously, we study the randomness of sequences (infinite data) and null sets as statistical tests. The random set depends on the class of statistical tests. Kolmogorov brought an idea of recursion theory into statistics and proposed the random set as the compli‐ ment of the effective null sets [5, 6, 8].
    [Show full text]
  • (Introduction to Probability at an Advanced Level) - All Lecture Notes
    Fall 2018 Statistics 201A (Introduction to Probability at an advanced level) - All Lecture Notes Aditya Guntuboyina August 15, 2020 Contents 0.1 Sample spaces, Events, Probability.................................5 0.2 Conditional Probability and Independence.............................6 0.3 Random Variables..........................................7 1 Random Variables, Expectation and Variance8 1.1 Expectations of Random Variables.................................9 1.2 Variance................................................ 10 2 Independence of Random Variables 11 3 Common Distributions 11 3.1 Ber(p) Distribution......................................... 11 3.2 Bin(n; p) Distribution........................................ 11 3.3 Poisson Distribution......................................... 12 4 Covariance, Correlation and Regression 14 5 Correlation and Regression 16 6 Back to Common Distributions 16 6.1 Geometric Distribution........................................ 16 6.2 Negative Binomial Distribution................................... 17 7 Continuous Distributions 17 7.1 Normal or Gaussian Distribution.................................. 17 1 7.2 Uniform Distribution......................................... 18 7.3 The Exponential Density...................................... 18 7.4 The Gamma Density......................................... 18 8 Variable Transformations 19 9 Distribution Functions and the Quantile Transform 20 10 Joint Densities 22 11 Joint Densities under Transformations 23 11.1 Detour to Convolutions......................................
    [Show full text]