Randomness Characterization Through Bayesian Model Selection
Total Page:16
File Type:pdf, Size:1020Kb
Randomness Characterization through Bayesian Model Selection Rafael Díaz Hernández Rojas Sapienza University of Rome Isaac Pérez Castillo (IF-UNAM) Jorge Hirsch, Alfred U’Ren, Aldo Solís, Alí Angulo (ICN-UNAM) Matteo Marsili (ICTP, Italy) AISIS, UNAM. October 2019 Randomness characterization through Bayesian model selection 1/16 Monte Carlo methods Cryptography Probabilistc algorithms How to tell if a number sequence is random? Dynamical systems mappings Spin systems Correlated photons Particles decay s^ = HHTTTHTHT:::THTT s^ = LLRRRLRLR : : : RLRR s^ = 110001010 ::: 0100 Randomness characterization through Bayesian model selection 2/16 How to tell if a number sequence is random? Dynamical systems mappings Spin systems Correlated photons Particles decay s^ = HHTTTHTHT:::THTT s^ = LLRRRLRLR : : : RLRR s^ = 110001010 ::: 0100 Monte Carlo methods Cryptography Probabilistc algorithms Randomness characterization through Bayesian model selection 2/16 # (Maximally) random sequences Randomness characterization through Bayesian model selection 3/16 (Maximally) random sequences # s^ = HHTTTHTHT...THTT Randomness characterization through Bayesian model selection 3/16 (Maximally) random sequences # s^ = 01001101...0010 Randomness characterization through Bayesian model selection 3/16 (Maximally) random sequences # s^ = 01001101...0010 What if the coin is biased? Randomness characterization through Bayesian model selection 3/16 (Maximally) random sequences H[X] ∼ Measure of randomness X H[X] = − px log2 px x 0 ≤ H[X] ≤ log2 jXj 1 Hmax = log2 jXj () px = jXj # s^ = 01001101...0010 What if the coin is biased? Randomness characterization through Bayesian model selection 3/16 (Maximally) random sequences H[X] ∼ Measure of randomness X H[X] = − px log2 px x 0 ≤ H[X] ≤ log2 jXj 1 Hmax = log2 jXj () px = jXj 1.0 0.8 0.6 H # 0.4 0.2 s^ = 01001101...0010 0.0 0.0 0.2 0.4 0.6 0.8 1.0 What if the coin is biased? p0 H[X] = −p0 log2 p0 − (1 − p0) log2(1 − p0) Randomness characterization through Bayesian model selection 3/16 If “random” =) properties examined with the tests. Properties ; randomness Frequentist approach based on p-values R. L. Wasserstein and N. A. Lazar, “The ASA’s statement on p-values: context, process, and purpose”, The American Statistician, 129–133 (2016) M. Baker, “Statisticians issue warning on p-values”, Nature 531, 151 (2016) Pragmatic approach: NIST battery of tests 8Same frequency of ‘0’ and ‘1’ (k ≈ k ) > 0 1 > <>Longest string of consecutive 0’s s^ = 01001101...0010 =) Fourier transform ∼ white noise > >. :. Each property is analysed as an hypothesis test =) obtain a p-value Randomness characterization through Bayesian model selection 4/16 Pragmatic approach: NIST battery of tests 8Same frequency of ‘0’ and ‘1’ (k ≈ k ) > 0 1 > <>Longest string of consecutive 0’s s^ = 01001101...0010 =) Fourier transform ∼ white noise > >. :. Each property is analysed as an hypothesis test =) obtain a p-value If “random” =) properties examined with the tests. Properties ; randomness Frequentist approach based on p-values R. L. Wasserstein and N. A. Lazar, “The ASA’s statement on p-values: context, process, and purpose”, The American Statistician, 129–133 (2016) M. Baker, “Statisticians issue warning on p-values”, Nature 531, 151 (2016) Randomness characterization through Bayesian model selection 4/16 Is π random? NIST Random Number Test Suite for 106 digits ofπ 1. 1. 0.1 0.1 Value - p 0.01 Significance Level 0.01 MonobitFrequencyTest BlockFrequencyTest RunsTest LongestRunsOnes10000 BinaryMatrixRankTest SpectralTest NonOverlappingTemplateMatching OverlapingTemplateMatching MaurersUniversalStatisticTest LinearComplexityTest SerialTest ApproximateEntropyTest CumulativeSumsTest RandomExcursionsTest RandomExcursionsVariantTest CumulativeSumsTestReverse LempelZivCompressionTest Random Number Test (a) Binary representation of the first (b) Results of the 15 NIST tests 302,500 digits of π. Randomness characterization through Bayesian model selection 5/16 1 X (−1)n 2 2 2 π = 4 ; π = 2 · p · p · ··· 2n + 1 2 p q p p n=0 2 + 2 2 + 2 + 2 Randomness as “incompressibility” Algorithmic Information Theory s^ is random iff the “shortest” algorithm to generate it is print(^s). AIT (Chaitin, Kolmogorov, Somolonoff): a mathematically formal theory that identifies (computationally) random ∼ incompressible. There isNO general algorithm capable of assessing whether any sequence is random. ... but Borel’s Normality criterion C. S. Calude, Information and randomness: an algorithmic perspective, 2nd Edition (Springer, 2010) Randomness characterization through Bayesian model selection 6/16 Randomness as “incompressibility” Algorithmic Information Theory s^ is random iff the “shortest” algorithm to generate it is print(^s). AIT (Chaitin, Kolmogorov, Somolonoff): a mathematically formal theory that identifies (computationally) random ∼ incompressible. There isNO general algorithm capable of assessing whether any sequence is random. ... but Borel’s Normality criterion C. S. Calude, Information and randomness: an algorithmic perspective, 2nd Edition (Springer, 2010) 1 X (−1)n 2 2 2 π = 4 ; π = 2 · p · p · ··· 2n + 1 2 p q p p n=0 2 + 2 2 + 2 + 2 Randomness characterization through Bayesian model selection 6/16 What happens if θ∗ ≈ 0:5? ... p-values Primer on statistical inference for bit sequences js^j = M bits, with k0 and k1 being the frequencies of 0’s and ‘1’s; k0 + k1 = M. k0 k1 M : p0 = θ; p1 = 1 − θ =) P (^sjθ; M) = θ (1 − θ) : = -200 k0 400 = Maximization of P (^sjθ; M) M=1000 k0 900 = -400 M 1000 ∗ ) θ = k0=M ℳ , θ -600 | s ( Fair “coin” (RNG) P ∗ 10 -800 =) θ = 0:5 Log -1000 * * θ =0.4 θ =0.9 -1200 0.0 0.2 0.4 0.6 0.8 1.0 θ Randomness characterization through Bayesian model selection 7/16 ... p-values Primer on statistical inference for bit sequences js^j = M bits, with k0 and k1 being the frequencies of 0’s and ‘1’s; k0 + k1 = M. k0 k1 M : p0 = θ; p1 = 1 − θ =) P (^sjθ; M) = θ (1 − θ) : = -200 k0 400 = Maximization of P (^sjθ; M) M=1000 k0 900 = -400 M 1000 ∗ ) θ = k0=M ℳ , θ -600 | s ( Fair “coin” (RNG) P ∗ 10 -800 =) θ = 0:5 Log -1000 * * ∗ θ =0.4 θ =0.9 What happens if θ ≈ 0:5? -1200 0.0 0.2 0.4 0.6 0.8 1.0 θ Randomness characterization through Bayesian model selection 7/16 Primer on statistical inference for bit sequences js^j = M bits, with k0 and k1 being the frequencies of 0’s and ‘1’s; k0 + k1 = M. k0 k1 M : p0 = θ; p1 = 1 − θ =) P (^sjθ; M) = θ (1 − θ) : = -200 k0 400 = Maximization of P (^sjθ; M) M=1000 k0 900 = -400 M 1000 ∗ ) θ = k0=M ℳ , θ -600 | s ( Fair “coin” (RNG) P ∗ 10 -800 =) θ = 0:5 Log -1000 * * ∗ θ =0.4 θ =0.9 What happens if θ ≈ 0:5? -1200 0.0 0.2 0.4 0.6 0.8 1.0 θ ... p-values Randomness characterization through Bayesian model selection 7/16 Once a model M is chosen, The right question is: the usual question is: Given the observations s^ how How well does it describe the likely it is that M is the true set of observations s^? model? P (^sjM; θ) or P (^sjM) P (M; θjs^) or P (Mjs^) Bayes Theorem P (^sjM)P (M) P (^sjM)P (M) P (Mjs^) = = P P (^s) i P (^sjMi)P (Mi) Recipe for how to update our (un)certainty about a model – hypothesis – given some data. N N Bayes n o ∗ fMαgα=1 −−−! P (Mαjs^) =) M = arg max P (Mαjs^) s^ α=1 Mα Model selection as hypothesis test A model, M, defines a family of probability distributions, including its dependence on parameters, P (^sjθ; M) , and their distribution P (θjM) . Randomness characterization through Bayesian model selection 8/16 The right question is: Given the observations s^ how likely it is that M is the true model? P (^sjM; θ) or P (^sjM) P (M; θjs^) or P (Mjs^) Bayes Theorem P (^sjM)P (M) P (^sjM)P (M) P (Mjs^) = = P P (^s) i P (^sjMi)P (Mi) Recipe for how to update our (un)certainty about a model – hypothesis – given some data. N N Bayes n o ∗ fMαgα=1 −−−! P (Mαjs^) =) M = arg max P (Mαjs^) s^ α=1 Mα Model selection as hypothesis test A model, M, defines a family of probability distributions, including its dependence on parameters, P (^sjθ; M) , and their distribution P (θjM) . Once a model M is chosen, the usual question is: How well does it describe the set of observations s^? Randomness characterization through Bayesian model selection 8/16 P (M; θjs^) or P (Mjs^) Bayes Theorem P (^sjM)P (M) P (^sjM)P (M) P (Mjs^) = = P P (^s) i P (^sjMi)P (Mi) Recipe for how to update our (un)certainty about a model – hypothesis – given some data. N N Bayes n o ∗ fMαgα=1 −−−! P (Mαjs^) =) M = arg max P (Mαjs^) s^ α=1 Mα Model selection as hypothesis test A model, M, defines a family of probability distributions, including its dependence on parameters, P (^sjθ; M) , and their distribution P (θjM) . Once a model M is chosen, The right question is: the usual question is: Given the observations s^ how How well does it describe the likely it is that M is the true set of observations s^? model? P (^sjM; θ) or P (^sjM) Randomness characterization through Bayesian model selection 8/16 Bayes Theorem P (^sjM)P (M) P (^sjM)P (M) P (Mjs^) = = P P (^s) i P (^sjMi)P (Mi) Recipe for how to update our (un)certainty about a model – hypothesis – given some data. N N Bayes n o ∗ fMαgα=1 −−−! P (Mαjs^) =) M = arg max P (Mαjs^) s^ α=1 Mα Model selection as hypothesis test A model, M, defines a family of probability distributions, including its dependence on parameters, P (^sjθ; M) , and their distribution P (θjM) . Once a model M is chosen, The right question is: the usual question is: Given the observations s^ how How well does it describe the likely it is that M is the true set of observations s^? model? P (^sjM; θ) or P (^sjM) P (M; θjs^) or P (Mjs^) Randomness characterization through Bayesian model selection 8/16 N N Bayes n o ∗ fMαgα=1 −−−! P (Mαjs^) =) M = arg max P (Mαjs^) s^ α=1 Mα Model selection as hypothesis test A model, M, defines a family of probability distributions, including its dependence on parameters, P (^sjθ; M) , and their distribution P (θjM) .