Binary and Multinomial Variables

Binary and multinomial variables UFC/DC ATAI-I (CK0146) 2017.1 Binary variables The beta distribution Multinomial variables The Dirichlet distribution Binary and multinomial variables Probability distributions Francesco Corona Binary and Outline multinomial variables UFC/DC ATAI-I (CK0146) 2017.1 Binary variables The beta distribution Multinomial variables The Dirichlet distribution 1 Binary variables The beta distribution 2 Multinomial variables The Dirichlet distribution Binary and Probability distributions multinomial variables UFC/DC ATAI-I (CK0146) 2017.1 Binary variables We explore now some probability distributions and their properties The beta distribution • Multinomial variables Of great interest in their own right The Dirichlet distribution • Building blocks for complex models Definition One role for these distributions is to model the probability distribution p(x) of a random variable x, given some finite set x1,..., xN of observations • This problem is known as density estimation A problem that is fundamentally ill-posed, because there are infinitely many probability distributions that could have given rise to the observed finite data • Any p(x) that is nonzero at each of x1,..., xN is a potential candidate Binary and Probability distributions (cont.) multinomial variables UFC/DC ATAI-I (CK0146) 2017.1 We begin by considering specific examples of parametric distributions Binary variables The beta distribution Multinomial variables • Binomial and multinomial distribution for discrete variables The Dirichlet distribution • The Gaussian distribution for continuous random variables Parametric distributions because governed by a number of parameters To use such models in density estimation problems, we need a procedure • Determine the values for the model parameters, given observations Remark In a frequentist treatment, we set the parameters by optimising some criterion • For instance, the likelihood function In a Bayesian treatment we introduce prior distributions over the parameters • Bayes’ theorem to get the posterior Binary and Probability distributions (cont.) multinomial variables UFC/DC ATAI-I (CK0146) 2017.1 Binary variables The beta distribution Multinomial variables We introduce the important concept of conjugate prior The Dirichlet distribution • It is a prior that leads to a formally special posterior • A posterior with the same functional form as the prior The conjugate prior for the parameters of a multinomial distribution • A Dirichlet distribution The conjugate prior for the mean of a Gaussian distribution • A Gaussian distribution All these distributions are members of the exponential family Binary and Probability distributions (cont.) multinomial variables UFC/DC ATAI-I (CK0146) 2017.1 Binary variables The beta distribution Multinomial variables The Dirichlet distribution The parametric approach assumes a specific functional form for the distro • It may turn out to be inappropriate for a particular application An alternative approach is given by nonparametric density estimation • the form of the distribution often depends on the size of the data Such models still contain parameters, but they control model complexity Nonparametric methods: Histograms, near-neighbours, and kernels Binary and multinomial variables UFC/DC ATAI-I (CK0146) 2017.1 Binary variables The beta distribution Multinomial variables The Dirichlet distribution Binary variables Probability distributions Binary and Binary variables multinomial variables UFC/DC ATAI-I (CK0146) 2017.1 Binary variables The beta distribution Consider a single binary variable x ∈ {0, 1} Multinomial variables The Dirichlet distribution Example Think of an unfair coin, in which probability of tails and heads is different • x describes the outcome of flipping the coin • x = 1 represents heads • x = 0 represents tails The probability of x = 1 is denoted by the parameter µ, with 0 ≤ µ ≤ 1 • 1p(x = |µ) = µ • 0p(x = |µ) = 1 − p(x = 1|µ) = 1 − µ Binary and Binary variables (cont.) multinomial variables UFC/DC ATAI-I (CK0146) 2017.1 Binary variables The beta distribution Multinomial variables The probability distro over x can be written as Bern(x|µ) = µx (1 − µ)1−x The Dirichlet distribution x = 0, µ0(1 − µ)1−0 = (1 − µ) x x 1−x Bern( |µ) = µ (1 − µ) =⇒ 1 1−1 (1) (x = 1, µ (1 − µ) = µ This is the Bernoulli distribution, so it is normalised x Bern(x|µ) = 1 (⋆) • E E 2 E 2 with mean [x] = x xBern(x|µ) and variance var[P x] = [x ] − [x] P E[x] = µ (2) var[x] = µ(1 − µ) (3) Binary and Binary variables (cont.) multinomial variables UFC/DC ATAI-I (CK0146) 2017.1 Now suppose we have a data set D = {x1, ..., xN } of observed values of x Binary variables The beta distribution We can construct the likelihood function of the data Multinomial variables • The Dirichlet distribution It is a function of µ Under the assumption of iid observations from p(x|µ) N N xn 1−xn p(D|µ) = p(xn|µ) = µ (1 − µ) (4) n n Y=1 Y=1 Remark We can estimate the value for µ by maximising the likelihood function • Equivalently, we can maximise the log likelihood function N N ln p(D|µ) = ln p(xn|µ) = xn ln µ + (1 − xn) ln (1 − µ) (5) n n X=1 X=1 It depends on the N observations only through their sum n xn P Binary and Binary variables (cont.) multinomial variables UFC/DC ATAI-I (CK0146) 2017.1 Binary variables The beta distribution Multinomial variables The Dirichlet distribution If we set the derivative of ln p(D|µ) with respect to µ equal to zero, we get 1 N µML = xn (6) N n X=1 The maximum likelihood estimator of the mean of the Bernoulli distribution • It is known as the sample mean, as always Binary and Binary variables (cont.) multinomial variables UFC/DC ATAI-I (CK0146) 2017.1 Binary variables The beta distribution Definition Multinomial variables Denoting the number of observations x = 1 (heads) in the data set by m The Dirichlet distribution m µML = (7) N The probability of landing heads is the fraction of heads in the data set If we toss 3 times and observe heads 3 times, N = m = 3 and µML = 1 The maximum likelihood result would predict all future observations as heads • Common sense suggests that this is unreasonable • It is an extreme case of over-fitting Setting a prior over µ and using Bayes to get a posterior give sensible results Binary and Binary variables (cont.) multinomial variables UFC/DC ATAI-I (CK0146) 2017.1 Binary variables We can work out the distribution of the number m of observations of x = 1 The beta distribution • given that the data has size N Multinomial variables The Dirichlet distribution This is the binomial distribution, it is proportional to µm(1 − µ)N−m N Bin(m|N, µ) = µm(1 − µ)N−m (8) m • It considers all possible ways of obtaining m heads out of N flips • The probability of any dataset of N flips of which m are heads is given, assuming independence, by µm(1 − µ)N−m N The term m (verbally, ‘N choose m’) gives the total number of ways of choosing m objects out of a total of N identical objects and it equals (⋆) N N! ≡ (9) m !(N − m) m! Binary and Binary variables (cont.) multinomial variables UFC/DC ATAI-I (CK0146) 2017.1 Binary variables Example The beta distribution Consider the case where N = 3 and m = 2 Multinomial variables The Dirichlet distribution N=3 There are m=2 = 3 ways of getting two heads out of three coin flips • HHH TTT THH TTH THT HTT HHT HTH { , , , , , , , } p(D|µ) p(D|µ) p(D|µ) • |{z}N N |{z} |{z} |p(D µ) = n=1 p(xn|µ) = n=1 Bern(xn|µ) Q Q Each of the three outcomes has the same probability of occurring 3 p(D|µ) = µxn (1 − µ)1−xn n Y=1 = µx1 (1 − µ)1−x1 µx2 (1 − µ)1−x2 µx3 (1 − µ)1−x3 h ih ih i = µx1+x2+x3 (1 − µ)(1+1+1)−(x1 +x2+x3) = µ2(1 − µ)3−2 = µm(1 − µ)N−m Binary and Binary variables (cont.) multinomial variables UFC/DC ATAI-I (CK0146) 2017.1 Binary variables The beta distribution Multinomial variables The Dirichlet distribution 0.3 The binomial distribution 0.2 • N = 10 • 0µ = .25 0.1 N Bin(m|N, µ) = µm(1−µ)N−m m 0 0 1 2 3 4 5 6 7 8 9 10 m Binary and Binary variables (cont.) multinomial variables UFC/DC ATAI-I (CK0146) 2017.1 Binary variables The beta distribution Multinomial variables ( ⋆) For iid events, the mean and variance of the binomial distribution are The Dirichlet distribution N E[m] ≡ mBin(m|N, µ) = Nµ (10) m X=0 N var[m] ≡ (m − E[m])2Bin(m|N, µ) = Nµ(1 − µ) (11) m X=0 m = x1 + ··· + xN and for each xn the mean is µ and variance is µ(1 − µ) • The mean of the sum is the sum of the means • The variance of the sum is the sum of variances Binary and The beta distribution multinomial variables UFC/DC ATAI-I (CK0146) 2017.1 The maximum likelihood setting for parameter µ in the Bernoulli distribution Binary variables (and binomial distribution) is the fraction of the observations having x = 1 The beta distribution • Multinomial variables Severe overfitting for small datasets The Dirichlet distribution To go Bayesian, we need to set a prior distribution p(µ) over parameter µ • Here we consider a special form of this prior distribution The likelihood function takes the form of product of factors µx (1 − µ)1−x • We can choose a prior proportional to powers of µ and (1 − µ) The posterior will be proportional to the product of prior and likelihood • The posterior will have the same functional form as the prior Definition Having a posterior with the same functional form of the prior: Conjugacy Binary and The beta distribution (cont.) multinomial variables UFC/DC We choose a prior distribution called the beta distribution ATAI-I (CK0146) 2017.1 Γ(a + b) Beta(µ|a, b) = µa−1(1 − µ)b−1 (12) Binary variables Γ( a)Γ(b) The beta distribution a

Binary and Multinomial Variables

On Two-Echelon Inventory Systems with Poisson Demand and Lost Sales

Distance Between Multinomial and Multivariate Normal Models

The Exciting Guide to Probability Distributions – Part 2

556: MATHEMATICAL STATISTICS I 1 the Multinomial Distribution the Multinomial Distribution Is a Multivariate Generalization of T

Probability Distributions and Related Mathematical Constructs

Binomial and Multinomial Distributions

Discrete Probability Distributions

Statistical Distributions

5. the Multinomial Distribution

Chapter 5: Multivariate Distributions

A Multivariate Beta-Binomial Model Which Allows to Conduct Bayesian Inference Regarding Θ Or Transformations Thereof

Review of Probability Distributions for Modeling Count Data