Randomness Characterization Through Bayesian Model Selection

Total Page:16

File Type:pdf, Size:1020Kb

Randomness Characterization Through Bayesian Model Selection Randomness Characterization through Bayesian Model Selection Rafael Díaz Hernández Rojas Sapienza University of Rome Isaac Pérez Castillo (IF-UNAM) Jorge Hirsch, Alfred U’Ren, Aldo Solís, Alí Angulo (ICN-UNAM) Matteo Marsili (ICTP, Italy) AISIS, UNAM. October 2019 Randomness characterization through Bayesian model selection 1/16 Monte Carlo methods Cryptography Probabilistc algorithms How to tell if a number sequence is random? Dynamical systems mappings Spin systems Correlated photons Particles decay s^ = HHTTTHTHT:::THTT s^ = LLRRRLRLR : : : RLRR s^ = 110001010 ::: 0100 Randomness characterization through Bayesian model selection 2/16 How to tell if a number sequence is random? Dynamical systems mappings Spin systems Correlated photons Particles decay s^ = HHTTTHTHT:::THTT s^ = LLRRRLRLR : : : RLRR s^ = 110001010 ::: 0100 Monte Carlo methods Cryptography Probabilistc algorithms Randomness characterization through Bayesian model selection 2/16 # (Maximally) random sequences Randomness characterization through Bayesian model selection 3/16 (Maximally) random sequences # s^ = HHTTTHTHT...THTT Randomness characterization through Bayesian model selection 3/16 (Maximally) random sequences # s^ = 01001101...0010 Randomness characterization through Bayesian model selection 3/16 (Maximally) random sequences # s^ = 01001101...0010 What if the coin is biased? Randomness characterization through Bayesian model selection 3/16 (Maximally) random sequences H[X] ∼ Measure of randomness X H[X] = − px log2 px x 0 ≤ H[X] ≤ log2 jXj 1 Hmax = log2 jXj () px = jXj # s^ = 01001101...0010 What if the coin is biased? Randomness characterization through Bayesian model selection 3/16 (Maximally) random sequences H[X] ∼ Measure of randomness X H[X] = − px log2 px x 0 ≤ H[X] ≤ log2 jXj 1 Hmax = log2 jXj () px = jXj 1.0 0.8 0.6 H # 0.4 0.2 s^ = 01001101...0010 0.0 0.0 0.2 0.4 0.6 0.8 1.0 What if the coin is biased? p0 H[X] = −p0 log2 p0 − (1 − p0) log2(1 − p0) Randomness characterization through Bayesian model selection 3/16 If “random” =) properties examined with the tests. Properties ; randomness Frequentist approach based on p-values R. L. Wasserstein and N. A. Lazar, “The ASA’s statement on p-values: context, process, and purpose”, The American Statistician, 129–133 (2016) M. Baker, “Statisticians issue warning on p-values”, Nature 531, 151 (2016) Pragmatic approach: NIST battery of tests 8Same frequency of ‘0’ and ‘1’ (k ≈ k ) > 0 1 > <>Longest string of consecutive 0’s s^ = 01001101...0010 =) Fourier transform ∼ white noise > >. :. Each property is analysed as an hypothesis test =) obtain a p-value Randomness characterization through Bayesian model selection 4/16 Pragmatic approach: NIST battery of tests 8Same frequency of ‘0’ and ‘1’ (k ≈ k ) > 0 1 > <>Longest string of consecutive 0’s s^ = 01001101...0010 =) Fourier transform ∼ white noise > >. :. Each property is analysed as an hypothesis test =) obtain a p-value If “random” =) properties examined with the tests. Properties ; randomness Frequentist approach based on p-values R. L. Wasserstein and N. A. Lazar, “The ASA’s statement on p-values: context, process, and purpose”, The American Statistician, 129–133 (2016) M. Baker, “Statisticians issue warning on p-values”, Nature 531, 151 (2016) Randomness characterization through Bayesian model selection 4/16 Is π random? NIST Random Number Test Suite for 106 digits ofπ 1. 1. 0.1 0.1 Value - p 0.01 Significance Level 0.01 MonobitFrequencyTest BlockFrequencyTest RunsTest LongestRunsOnes10000 BinaryMatrixRankTest SpectralTest NonOverlappingTemplateMatching OverlapingTemplateMatching MaurersUniversalStatisticTest LinearComplexityTest SerialTest ApproximateEntropyTest CumulativeSumsTest RandomExcursionsTest RandomExcursionsVariantTest CumulativeSumsTestReverse LempelZivCompressionTest Random Number Test (a) Binary representation of the first (b) Results of the 15 NIST tests 302,500 digits of π. Randomness characterization through Bayesian model selection 5/16 1 X (−1)n 2 2 2 π = 4 ; π = 2 · p · p · ··· 2n + 1 2 p q p p n=0 2 + 2 2 + 2 + 2 Randomness as “incompressibility” Algorithmic Information Theory s^ is random iff the “shortest” algorithm to generate it is print(^s). AIT (Chaitin, Kolmogorov, Somolonoff): a mathematically formal theory that identifies (computationally) random ∼ incompressible. There isNO general algorithm capable of assessing whether any sequence is random. ... but Borel’s Normality criterion C. S. Calude, Information and randomness: an algorithmic perspective, 2nd Edition (Springer, 2010) Randomness characterization through Bayesian model selection 6/16 Randomness as “incompressibility” Algorithmic Information Theory s^ is random iff the “shortest” algorithm to generate it is print(^s). AIT (Chaitin, Kolmogorov, Somolonoff): a mathematically formal theory that identifies (computationally) random ∼ incompressible. There isNO general algorithm capable of assessing whether any sequence is random. ... but Borel’s Normality criterion C. S. Calude, Information and randomness: an algorithmic perspective, 2nd Edition (Springer, 2010) 1 X (−1)n 2 2 2 π = 4 ; π = 2 · p · p · ··· 2n + 1 2 p q p p n=0 2 + 2 2 + 2 + 2 Randomness characterization through Bayesian model selection 6/16 What happens if θ∗ ≈ 0:5? ... p-values Primer on statistical inference for bit sequences js^j = M bits, with k0 and k1 being the frequencies of 0’s and ‘1’s; k0 + k1 = M. k0 k1 M : p0 = θ; p1 = 1 − θ =) P (^sjθ; M) = θ (1 − θ) : = -200 k0 400 = Maximization of P (^sjθ; M) M=1000 k0 900 = -400 M 1000 ∗ ) θ = k0=M ℳ , θ -600 | s ( Fair “coin” (RNG) P ∗ 10 -800 =) θ = 0:5 Log -1000 * * θ =0.4 θ =0.9 -1200 0.0 0.2 0.4 0.6 0.8 1.0 θ Randomness characterization through Bayesian model selection 7/16 ... p-values Primer on statistical inference for bit sequences js^j = M bits, with k0 and k1 being the frequencies of 0’s and ‘1’s; k0 + k1 = M. k0 k1 M : p0 = θ; p1 = 1 − θ =) P (^sjθ; M) = θ (1 − θ) : = -200 k0 400 = Maximization of P (^sjθ; M) M=1000 k0 900 = -400 M 1000 ∗ ) θ = k0=M ℳ , θ -600 | s ( Fair “coin” (RNG) P ∗ 10 -800 =) θ = 0:5 Log -1000 * * ∗ θ =0.4 θ =0.9 What happens if θ ≈ 0:5? -1200 0.0 0.2 0.4 0.6 0.8 1.0 θ Randomness characterization through Bayesian model selection 7/16 Primer on statistical inference for bit sequences js^j = M bits, with k0 and k1 being the frequencies of 0’s and ‘1’s; k0 + k1 = M. k0 k1 M : p0 = θ; p1 = 1 − θ =) P (^sjθ; M) = θ (1 − θ) : = -200 k0 400 = Maximization of P (^sjθ; M) M=1000 k0 900 = -400 M 1000 ∗ ) θ = k0=M ℳ , θ -600 | s ( Fair “coin” (RNG) P ∗ 10 -800 =) θ = 0:5 Log -1000 * * ∗ θ =0.4 θ =0.9 What happens if θ ≈ 0:5? -1200 0.0 0.2 0.4 0.6 0.8 1.0 θ ... p-values Randomness characterization through Bayesian model selection 7/16 Once a model M is chosen, The right question is: the usual question is: Given the observations s^ how How well does it describe the likely it is that M is the true set of observations s^? model? P (^sjM; θ) or P (^sjM) P (M; θjs^) or P (Mjs^) Bayes Theorem P (^sjM)P (M) P (^sjM)P (M) P (Mjs^) = = P P (^s) i P (^sjMi)P (Mi) Recipe for how to update our (un)certainty about a model – hypothesis – given some data. N N Bayes n o ∗ fMαgα=1 −−−! P (Mαjs^) =) M = arg max P (Mαjs^) s^ α=1 Mα Model selection as hypothesis test A model, M, defines a family of probability distributions, including its dependence on parameters, P (^sjθ; M) , and their distribution P (θjM) . Randomness characterization through Bayesian model selection 8/16 The right question is: Given the observations s^ how likely it is that M is the true model? P (^sjM; θ) or P (^sjM) P (M; θjs^) or P (Mjs^) Bayes Theorem P (^sjM)P (M) P (^sjM)P (M) P (Mjs^) = = P P (^s) i P (^sjMi)P (Mi) Recipe for how to update our (un)certainty about a model – hypothesis – given some data. N N Bayes n o ∗ fMαgα=1 −−−! P (Mαjs^) =) M = arg max P (Mαjs^) s^ α=1 Mα Model selection as hypothesis test A model, M, defines a family of probability distributions, including its dependence on parameters, P (^sjθ; M) , and their distribution P (θjM) . Once a model M is chosen, the usual question is: How well does it describe the set of observations s^? Randomness characterization through Bayesian model selection 8/16 P (M; θjs^) or P (Mjs^) Bayes Theorem P (^sjM)P (M) P (^sjM)P (M) P (Mjs^) = = P P (^s) i P (^sjMi)P (Mi) Recipe for how to update our (un)certainty about a model – hypothesis – given some data. N N Bayes n o ∗ fMαgα=1 −−−! P (Mαjs^) =) M = arg max P (Mαjs^) s^ α=1 Mα Model selection as hypothesis test A model, M, defines a family of probability distributions, including its dependence on parameters, P (^sjθ; M) , and their distribution P (θjM) . Once a model M is chosen, The right question is: the usual question is: Given the observations s^ how How well does it describe the likely it is that M is the true set of observations s^? model? P (^sjM; θ) or P (^sjM) Randomness characterization through Bayesian model selection 8/16 Bayes Theorem P (^sjM)P (M) P (^sjM)P (M) P (Mjs^) = = P P (^s) i P (^sjMi)P (Mi) Recipe for how to update our (un)certainty about a model – hypothesis – given some data. N N Bayes n o ∗ fMαgα=1 −−−! P (Mαjs^) =) M = arg max P (Mαjs^) s^ α=1 Mα Model selection as hypothesis test A model, M, defines a family of probability distributions, including its dependence on parameters, P (^sjθ; M) , and their distribution P (θjM) . Once a model M is chosen, The right question is: the usual question is: Given the observations s^ how How well does it describe the likely it is that M is the true set of observations s^? model? P (^sjM; θ) or P (^sjM) P (M; θjs^) or P (Mjs^) Randomness characterization through Bayesian model selection 8/16 N N Bayes n o ∗ fMαgα=1 −−−! P (Mαjs^) =) M = arg max P (Mαjs^) s^ α=1 Mα Model selection as hypothesis test A model, M, defines a family of probability distributions, including its dependence on parameters, P (^sjθ; M) , and their distribution P (θjM) .
Recommended publications
  • 3 Autocorrelation
    3 Autocorrelation Autocorrelation refers to the correlation of a time series with its own past and future values. Autocorrelation is also sometimes called “lagged correlation” or “serial correlation”, which refers to the correlation between members of a series of numbers arranged in time. Positive autocorrelation might be considered a specific form of “persistence”, a tendency for a system to remain in the same state from one observation to the next. For example, the likelihood of tomorrow being rainy is greater if today is rainy than if today is dry. Geophysical time series are frequently autocorrelated because of inertia or carryover processes in the physical system. For example, the slowly evolving and moving low pressure systems in the atmosphere might impart persistence to daily rainfall. Or the slow drainage of groundwater reserves might impart correlation to successive annual flows of a river. Or stored photosynthates might impart correlation to successive annual values of tree-ring indices. Autocorrelation complicates the application of statistical tests by reducing the effective sample size. Autocorrelation can also complicate the identification of significant covariance or correlation between time series (e.g., precipitation with a tree-ring series). Autocorrelation implies that a time series is predictable, probabilistically, as future values are correlated with current and past values. Three tools for assessing the autocorrelation of a time series are (1) the time series plot, (2) the lagged scatterplot, and (3) the autocorrelation function. 3.1 Time series plot Positively autocorrelated series are sometimes referred to as persistent because positive departures from the mean tend to be followed by positive depatures from the mean, and negative departures from the mean tend to be followed by negative departures (Figure 3.1).
    [Show full text]
  • The Bayesian Approach to Statistics
    THE BAYESIAN APPROACH TO STATISTICS ANTHONY O’HAGAN INTRODUCTION the true nature of scientific reasoning. The fi- nal section addresses various features of modern By far the most widely taught and used statisti- Bayesian methods that provide some explanation for the rapid increase in their adoption since the cal methods in practice are those of the frequen- 1980s. tist school. The ideas of frequentist inference, as set out in Chapter 5 of this book, rest on the frequency definition of probability (Chapter 2), BAYESIAN INFERENCE and were developed in the first half of the 20th century. This chapter concerns a radically differ- We first present the basic procedures of Bayesian ent approach to statistics, the Bayesian approach, inference. which depends instead on the subjective defini- tion of probability (Chapter 3). In some respects, Bayesian methods are older than frequentist ones, Bayes’s Theorem and the Nature of Learning having been the basis of very early statistical rea- Bayesian inference is a process of learning soning as far back as the 18th century. Bayesian from data. To give substance to this statement, statistics as it is now understood, however, dates we need to identify who is doing the learning and back to the 1950s, with subsequent development what they are learning about. in the second half of the 20th century. Over that time, the Bayesian approach has steadily gained Terms and Notation ground, and is now recognized as a legitimate al- ternative to the frequentist approach. The person doing the learning is an individual This chapter is organized into three sections.
    [Show full text]
  • A Review of Graph and Network Complexity from an Algorithmic Information Perspective
    entropy Review A Review of Graph and Network Complexity from an Algorithmic Information Perspective Hector Zenil 1,2,3,4,5,*, Narsis A. Kiani 1,2,3,4 and Jesper Tegnér 2,3,4,5 1 Algorithmic Dynamics Lab, Centre for Molecular Medicine, Karolinska Institute, 171 77 Stockholm, Sweden; [email protected] 2 Unit of Computational Medicine, Department of Medicine, Karolinska Institute, 171 77 Stockholm, Sweden; [email protected] 3 Science for Life Laboratory (SciLifeLab), 171 77 Stockholm, Sweden 4 Algorithmic Nature Group, Laboratoire de Recherche Scientifique (LABORES) for the Natural and Digital Sciences, 75005 Paris, France 5 Biological and Environmental Sciences and Engineering Division (BESE), King Abdullah University of Science and Technology (KAUST), Thuwal 23955, Saudi Arabia * Correspondence: [email protected] or [email protected] Received: 21 June 2018; Accepted: 20 July 2018; Published: 25 July 2018 Abstract: Information-theoretic-based measures have been useful in quantifying network complexity. Here we briefly survey and contrast (algorithmic) information-theoretic methods which have been used to characterize graphs and networks. We illustrate the strengths and limitations of Shannon’s entropy, lossless compressibility and algorithmic complexity when used to identify aspects and properties of complex networks. We review the fragility of computable measures on the one hand and the invariant properties of algorithmic measures on the other demonstrating how current approaches to algorithmic complexity are misguided and suffer of similar limitations than traditional statistical approaches such as Shannon entropy. Finally, we review some current definitions of algorithmic complexity which are used in analyzing labelled and unlabelled graphs. This analysis opens up several new opportunities to advance beyond traditional measures.
    [Show full text]
  • Introduction to Bayesian Inference and Modeling Edps 590BAY
    Introduction to Bayesian Inference and Modeling Edps 590BAY Carolyn J. Anderson Department of Educational Psychology c Board of Trustees, University of Illinois Fall 2019 Introduction What Why Probability Steps Example History Practice Overview ◮ What is Bayes theorem ◮ Why Bayesian analysis ◮ What is probability? ◮ Basic Steps ◮ An little example ◮ History (not all of the 705+ people that influenced development of Bayesian approach) ◮ In class work with probabilities Depending on the book that you select for this course, read either Gelman et al. Chapter 1 or Kruschke Chapters 1 & 2. C.J. Anderson (Illinois) Introduction Fall 2019 2.2/ 29 Introduction What Why Probability Steps Example History Practice Main References for Course Throughout the coures, I will take material from ◮ Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., & Rubin, D.B. (20114). Bayesian Data Analysis, 3rd Edition. Boco Raton, FL, CRC/Taylor & Francis.** ◮ Hoff, P.D., (2009). A First Course in Bayesian Statistical Methods. NY: Sringer.** ◮ McElreath, R.M. (2016). Statistical Rethinking: A Bayesian Course with Examples in R and Stan. Boco Raton, FL, CRC/Taylor & Francis. ◮ Kruschke, J.K. (2015). Doing Bayesian Data Analysis: A Tutorial with JAGS and Stan. NY: Academic Press.** ** There are e-versions these of from the UofI library. There is a verson of McElreath, but I couldn’t get if from UofI e-collection. C.J. Anderson (Illinois) Introduction Fall 2019 3.3/ 29 Introduction What Why Probability Steps Example History Practice Bayes Theorem A whole semester on this? p(y|θ)p(θ) p(θ|y)= p(y) where ◮ y is data, sample from some population.
    [Show full text]
  • Autocorrelation
    Autocorrelation David Gerbing School of Business Administration Portland State University January 30, 2016 Autocorrelation The difference between an actual data value and the forecasted data value from a model is the residual for that forecasted value. Residual: ei = Yi − Y^i One of the assumptions of the least squares estimation procedure for the coefficients of a regression model is that the residuals are purely random. One consequence of randomness is that the residuals would not correlate with anything else, including with each other at different time points. A value above the mean, that is, a value with a positive residual, would contain no information as to whether the next value in time would have a positive residual, or negative residual, with a data value below the mean. For example, flipping a fair coin yields random flips, with half of the flips resulting in a Head and the other half a Tail. If a Head is scored as a 1 and a Tail as a 0, and the probability of both Heads and Tails is 0.5, then calculate the value of the population mean as: Population Mean: µ = (0:5)(1) + (0:5)(0) = :5 The forecast of the outcome of the next flip of a fair coin is the mean of the process, 0.5, which is stable over time. What are the corresponding residuals? A residual value is the difference of the corresponding data value minus the mean. With this scoring system, a Head generates a positive residual from the mean, µ, Head: ei = 1 − µ = 1 − 0:5 = 0:5 A Tail generates a negative residual from the mean, Tail: ei = 0 − µ = 0 − 0:5 = −0:5 The error terms of the coin flips are independent of each other, so if the current flip is a Head, or if the last 5 flips are Heads, the forecast for the next flip is still µ = :5.
    [Show full text]
  • Mild Vs. Wild Randomness: Focusing on Those Risks That Matter
    with empirical sense1. Granted, it has been Mild vs. Wild Randomness: tinkered with using such methods as Focusing on those Risks that complementary “jumps”, stress testing, regime Matter switching or the elaborate methods known as GARCH, but while they represent a good effort, Benoit Mandelbrot & Nassim Nicholas Taleb they fail to remediate the bell curve’s irremediable flaws. Forthcoming in, The Known, the Unknown and the The problem is that measures of uncertainty Unknowable in Financial Institutions Frank using the bell curve simply disregard the possibility Diebold, Neil Doherty, and Richard Herring, of sharp jumps or discontinuities. Therefore they editors, Princeton: Princeton University Press. have no meaning or consequence. Using them is like focusing on the grass and missing out on the (gigantic) trees. Conventional studies of uncertainty, whether in statistics, economics, finance or social science, In fact, while the occasional and unpredictable have largely stayed close to the so-called “bell large deviations are rare, they cannot be dismissed curve”, a symmetrical graph that represents a as “outliers” because, cumulatively, their impact in probability distribution. Used to great effect to the long term is so dramatic. describe errors in astronomical measurement by The good news, especially for practitioners, is the 19th-century mathematician Carl Friedrich that the fractal model is both intuitively and Gauss, the bell curve, or Gaussian model, has computationally simpler than the Gaussian. It too since pervaded our business and scientific culture, has been around since the sixties, which makes us and terms like sigma, variance, standard deviation, wonder why it was not implemented before. correlation, R-square and Sharpe ratio are all The traditional Gaussian way of looking at the directly linked to it.
    [Show full text]
  • Random Variables and Applications
    Random Variables and Applications OPRE 6301 Random Variables. As noted earlier, variability is omnipresent in the busi- ness world. To model variability probabilistically, we need the concept of a random variable. A random variable is a numerically valued variable which takes on different values with given probabilities. Examples: The return on an investment in a one-year period The price of an equity The number of customers entering a store The sales volume of a store on a particular day The turnover rate at your organization next year 1 Types of Random Variables. Discrete Random Variable: — one that takes on a countable number of possible values, e.g., total of roll of two dice: 2, 3, ..., 12 • number of desktops sold: 0, 1, ... • customer count: 0, 1, ... • Continuous Random Variable: — one that takes on an uncountable number of possible values, e.g., interest rate: 3.25%, 6.125%, ... • task completion time: a nonnegative value • price of a stock: a nonnegative value • Basic Concept: Integer or rational numbers are discrete, while real numbers are continuous. 2 Probability Distributions. “Randomness” of a random variable is described by a probability distribution. Informally, the probability distribution specifies the probability or likelihood for a random variable to assume a particular value. Formally, let X be a random variable and let x be a possible value of X. Then, we have two cases. Discrete: the probability mass function of X specifies P (x) P (X = x) for all possible values of x. ≡ Continuous: the probability density function of X is a function f(x) that is such that f(x) h P (x < · ≈ X x + h) for small positive h.
    [Show full text]
  • Algorithmic Information Theory and Novelty Generation
    Computational Creativity 2007 Algorithmic Information Theory and Novelty Generation Simon McGregor Centre for Research in Cognitive Science University of Sussex, UK [email protected] Abstract 100,000 digits in the binary expansion of π can be gener- ated by a program far shorter than 100,000 bits. A string This paper discusses some of the possible contributions of consisting of the binary digit 1 repeated 1,000 times can be algorithmic information theory, and in particular the cen- generated by a program shorter than 1,000 bits. However, tral notion of data compression, to a theoretical exposition it can be shown (Li and Vitanyi, 1997) that most strings of computational creativity and novelty generation. I note are incompressible, i.e. they cannot be generated by a pro- that the formalised concepts of pattern and randomness gram shorter than themselves. Consequently, if you flip due to algorithmic information theory are relevant to com- a perfectly random coin 100,000 times, the likelihood is puter creativity, briefly discuss the role of compression that the sequence of heads and tails you obtain cannot be in machine learning theory and present a general model described by a program shorter than 100,000 bits. In algo- for generative algorithms which turns out to be instanti- rithmic information theory a string is described as random ated by decompression in a lossy compression scheme. I if and only if it is incompressible. also investigate the concept of novelty using information- Note that randomness in algorithmic information the- theoretic tools and show that a purely “impersonal” formal ory applies to strings, and not to the physical processes notion of novelty is inadequate; novelty must be defined which generate those strings.
    [Show full text]
  • Random Numbers and the Central Limit Theorem
    Random numbers and the central limit theorem Alberto García Javier Junquera Bibliography No se puede mostrar la imagen en este momento. Cambridge University Press, Cambridge, 2002 ISBN 0 521 65314 2 Physical processes with probabilistic character Certain physical processes do have a probabilistic character Desintegration of atomic nuclei: Brownian movement of a particle in a liquid: The dynamic (based on Quantum We do not know in detail the Mechanics) is strictly probabilistic dynamical variables of all the particles involved in the problem We need to base our knowledge in new laws that do not rely on dynamical variables with determined values, but with probabilistic distributions Starting from the probabilistic distribution, it is possible to obtain well defined averages of physical magnitudes, especially if we deal with very large number of particles The stochastic oracle Computers are (or at least should be) totally predictible Given some data and a program to operate with them, the results could be exactly reproduced But imagine, that we couple our code with a special module that generates randomness program randomness real :: x $<your compiler> -o randomness.x randomness.f90 do $./randomness call random_number(x) print "(f10.6)" , x enddo end program randomness The output of the subroutine randomness is a real number, uniformly distributed in the interval "Anyone who considers arithmetical methods of producing random digits is, of course, in a state of sin.” (John von Neumann) Probability distribution The subroutine “random number”
    [Show full text]
  • Subjective Randomness and Natural Scene Statistics
    Psychonomic Bulletin & Review 2010, 17 (5), 624-629 doi:10.3758/PBR.17.5.624 BRIEF REPORTS Subjective randomness and natural scene statistics ANNE S. HSU University College London, London, England TH OM A S L. GRIFFIT H S University of California, Berkeley, California A ND ET ha N SC H REI B ER University of California, Los Angeles, California Accounts of subjective randomness suggest that people consider a stimulus random when they cannot detect any regularities characterizing the structure of that stimulus. We explored the possibility that the regularities people detect are shaped by the statistics of their natural environment. We did this by testing the hypothesis that people’s perception of randomness in two-dimensional binary arrays (images with two levels of intensity) is inversely related to the probability with which the array’s pattern would be encountered in nature. We estimated natural scene probabilities for small binary arrays by tabulating the frequencies with which each pattern of cell values appears. We then conducted an experiment in which we collected human randomness judgments. The results show an inverse relationship between people’s perceived randomness of an array pattern and the prob- ability of the pattern appearing in nature. People are very sensitive to deviations from their expec- dimensional binary arrays used in many subjective ran- tations about randomness. For example, the game Yahtzee domness experiments, explanations are more difficult to involves repeatedly rolling 5 six-sided dice. If you were to come by. A common finding in these experiments is that roll all sixes 6 times in a row, you would probably be quite people consider arrays in which cells take different values surprised.
    [Show full text]
  • Randomness, Predictability, and Complexity in Repeated Interactions V0.02 Preliminary Version, Please Do Not Cite
    Randomness, Predictability, and Complexity in Repeated Interactions v0.02 Preliminary version, please do not cite. Shuige Liua, Zsombor Z. Meder´ b,∗ aFaculty of Political Science and Economics, Waseda University, 1-6-1, Nishi-Waseda, Shinjuku-Ku, 169-8050, Tokyo, Japan. bHumanities, Arts and Social Sciences, Singapore University of Technology and Design, 8 Sompah Road, 487372 Singapore. Abstract Nash equilibrium often requires players to adopt a mixed strategy, i.e., a randomized choice between pure strategies. Typically, the player is asked to use some randomizing device, and the story usually ends here. In this paper, we will argue that: (1) Game theory needs to give an account of what counts as a random sequence (of behavior); (2) from a game-theoretic perspective, a plausible account of randnomness is given by algorithmic complexity theory, and, in particular, the complexity measure proposed by Kolmogorov; (3) in certain contexts, strategic reasoning amounts to modelling the opponent’s mind as a Turing machine; (4) this account of random behavior also highlights some interesting aspects on the nature of strategic thinking. Namely, it indicates that it is an art, in the sense that it cannot be reduced to following an algorithm. Keywords: repeated games, mixed strategy, Kolmogorov complexity, Turing machine. JEL: C72, C73. IThe authors gratefully acknowledge the support of Grant-in-Aids for Young Scientists (B) of JSPS No. 17K13707, Grant for Special Research Project No. 2017K-016 of Waseda University, and Start-up Research Grant SRHA 15-110 of the Singapore University of Technology and Design. ∗Corresponding author Email addresses: [email protected] (Shuige Liu), [email protected] (Zsombor Z.
    [Show full text]
  • Lecture 15 Autocorrelation in Time Series Stationarity If the Time Series Is Normally Distributed Then Strictly Stationary
    Lecture 15 Autocorrelation in Time Series Stationarity If the time series is normally distributed then strictly stationary and weekly stationary time series are considered equivalent. Autocorrelation The autocorrelation (Box and Jenkins, 1976) function can be used for the following two purposes: 1. To detect non-randomness in data. 2. To identify an appropriate time series model if the data are not random. Definition Given measurements, Y1, Y2, ..., YN at time X1, X2, ..., XN, the lag k autocorrelation function is defined as ͯ& Ŭ Ŭ ∑$Ͱͥ ʚ͓$ − ͓ʛʚ͓$ͮ& − ͓ʛ ͦ& = ͦ ∑ ʚ͓$ − ͓Ŭʛ Although the time variable, X, is not used in the formula$Ͱͥ for autocorrelation, the assumption is that the observations are equi-spaced. Autocorrelation is a correlation coefficient. However, instead of correlation between two different variables, the correlation is between two values of the same variable at times Xi and Xi+k . When the autocorrelation is used to detect non-randomness, it is usually only the first (lag 1) autocorrelation that is of interest. When the autocorrelation is used to identify an appropriate time series model, the autocorrelations are usually plotted for many lags. Sample Output Dataplot generated the following autocorrelation output using the data set: The lag-one autocorrelation coefficient of the 200 observations = -0.3073 The computed value of the constant a = -0.3073 lag autocorrelation 12. -0.55 25. -0.66 38. -0.64 0. 1.00 13. 0.73 26. 0.42 39. 0.08 1. -0.31 14. 0.07 27. 0.39 40. 0.58 2. -0.74 15. -0.76 28.
    [Show full text]