Lecture 3: Randomness in Computation
Total Page:16
File Type:pdf, Size:1020Kb
Load more
Recommended publications
-
The Randomized Quicksort Algorithm
Outline The Randomized Quicksort Algorithm K. Subramani1 1Lane Department of Computer Science and Electrical Engineering West Virginia University 7 February, 2012 Subramani Sample Analyses Outline Outline 1 The Randomized Quicksort Algorithm Subramani Sample Analyses The Randomized Quicksort Algorithm The Sorting Problem Problem Statement Given an array A of n distinct integers, in the indices A[1] through A[n], Subramani Sample Analyses The Randomized Quicksort Algorithm The Sorting Problem Problem Statement Given an array A of n distinct integers, in the indices A[1] through A[n], permute the elements of A, so that Subramani Sample Analyses The Randomized Quicksort Algorithm The Sorting Problem Problem Statement Given an array A of n distinct integers, in the indices A[1] through A[n], permute the elements of A, so that A[1] < A[2]... A[n]. Subramani Sample Analyses The Randomized Quicksort Algorithm The Sorting Problem Problem Statement Given an array A of n distinct integers, in the indices A[1] through A[n], permute the elements of A, so that A[1] < A[2]... A[n]. Note The assumption of distinctness simplifies the analysis. Subramani Sample Analyses The Randomized Quicksort Algorithm The Sorting Problem Problem Statement Given an array A of n distinct integers, in the indices A[1] through A[n], permute the elements of A, so that A[1] < A[2]... A[n]. Note The assumption of distinctness simplifies the analysis. It has no bearing on the running time. Subramani Sample Analyses The Randomized Quicksort Algorithm The Partition subroutine Function PARTITION(A,p,q) 1: {We partition the sub-array A[p,p + 1,...,q] about A[p].} 2: for (i = (p + 1) to q) do 3: if (A[i] < A[p]) then 4: Insert A[i] into bucket L. -
1 Perfect Secrecy of the One-Time Pad
1 Perfect secrecy of the one-time pad In this section, we make more a more precise analysis of the security of the one-time pad. First, we need to define conditional probability. Let’s consider an example. We know that if it rains Saturday, then there is a reasonable chance that it will rain on Sunday. To make this more precise, we want to compute the probability that it rains on Sunday, given that it rains on Saturday. So we restrict our attention to only those situations where it rains on Saturday and count how often this happens over several years. Then we count how often it rains on both Saturday and Sunday. The ratio gives an estimate of the desired probability. If we call A the event that it rains on Saturday and B the event that it rains on Sunday, then the intersection A ∩ B is when it rains on both days. The conditional probability of A given B is defined to be P (A ∩ B) P (B | A)= , P (A) where P (A) denotes the probability of the event A. This formula can be used to define the conditional probability of one event given another for any two events A and B that have probabilities (we implicitly assume throughout this discussion that any probability that occurs in a denominator has nonzero probability). Events A and B are independent if P (A ∩ B)= P (A) P (B). For example, if Alice flips a fair coin, let A be the event that the coin ends up Heads. If Bob rolls a fair six-sided die, let B be the event that he rolls a 3. -
5. the Student T Distribution
Virtual Laboratories > 4. Special Distributions > 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 5. The Student t Distribution In this section we will study a distribution that has special importance in statistics. In particular, this distribution will arise in the study of a standardized version of the sample mean when the underlying distribution is normal. The Probability Density Function Suppose that Z has the standard normal distribution, V has the chi-squared distribution with n degrees of freedom, and that Z and V are independent. Let Z T= √V/n In the following exercise, you will show that T has probability density function given by −(n +1) /2 Γ((n + 1) / 2) t2 f(t)= 1 + , t∈ℝ ( n ) √n π Γ(n / 2) 1. Show that T has the given probability density function by using the following steps. n a. Show first that the conditional distribution of T given V=v is normal with mean 0 a nd variance v . b. Use (a) to find the joint probability density function of (T,V). c. Integrate the joint probability density function in (b) with respect to v to find the probability density function of T. The distribution of T is known as the Student t distribution with n degree of freedom. The distribution is well defined for any n > 0, but in practice, only positive integer values of n are of interest. This distribution was first studied by William Gosset, who published under the pseudonym Student. In addition to supplying the proof, Exercise 1 provides a good way of thinking of the t distribution: the t distribution arises when the variance of a mean 0 normal distribution is randomized in a certain way. -
Randomized Algorithms, Quicksort and Randomized Selection Carola Wenk Slides Courtesy of Charles Leiserson with Additions by Carola Wenk
CMPS 2200 – Fall 2014 Randomized Algorithms, Quicksort and Randomized Selection Carola Wenk Slides courtesy of Charles Leiserson with additions by Carola Wenk CMPS 2200 Intro. to Algorithms 1 Deterministic Algorithms Runtime for deterministic algorithms with input size n: • Best-case runtime Attained by one input of size n • Worst-case runtime Attained by one input of size n • Average runtime Averaged over all possible inputs of size n CMPS 2200 Intro. to Algorithms 2 Deterministic Algorithms: Insertion Sort for j=2 to n { key = A[j] // insert A[j] into sorted sequence A[1..j-1] i=j-1 while(i>0 && A[i]>key){ A[i+1]=A[i] i-- } A[i+1]=key } • Best case runtime? • Worst case runtime? CMPS 2200 Intro. to Algorithms 3 Deterministic Algorithms: Insertion Sort Best-case runtime: O(n), input [1,2,3,…,n] Attained by one input of size n • Worst-case runtime: O(n2), input [n, n-1, …,2,1] Attained by one input of size n • Average runtime : O(n2) Averaged over all possible inputs of size n •What kind of inputs are there? • How many inputs are there? CMPS 2200 Intro. to Algorithms 4 Average Runtime • What kind of inputs are there? • Do [1,2,…,n] and [5,6,…,n+5] cause different behavior of Insertion Sort? • No. Therefore it suffices to only consider all permutations of [1,2,…,n] . • How many inputs are there? • There are n! different permutations of [1,2,…,n] CMPS 2200 Intro. to Algorithms 5 Average Runtime Insertion Sort: n=4 • Inputs: 4!=24 [1,2,3,4]0346 [4,1,2,3] [4,1,3,2] [4,3,2,1] [2,1,3,4]1235 [1,4,2,3] [1,4,3,2] [3,4,2,1] [1,3,2,4]1124 [1,2,4,3] [1,3,4,2] [3,2,4,1] [3,1,2,4]2455 [4,2,1,3] [4,3,1,2] [4,2,3,1] [3,2,1,4]3244 [2,1,4,3] [3,4,1,2] [2,4,3,1] [2,3,1,4]2333 [2,4,1,3] [3,1,4,2] [2,3,4,1] • Runtime is proportional to: 3 + #times in while loop • Best: 3+0, Worst: 3+6=9, Average: 3+72/24 = 6 CMPS 2200 Intro. -
6.045J Lecture 13: Pseudorandom Generators and One-Way Functions
6.080/6.089 GITCS April 4th, 2008 Lecture 16 Lecturer: Scott Aaronson Scribe: Jason Furtado Private-Key Cryptography 1 Recap 1.1 Derandomization In the last six years, there have been some spectacular discoveries of deterministic algorithms, for problems for which the only similarly-efficient solutions that were known previously required randomness. The two most famous examples are • the Agrawal-Kayal-Saxena (AKS) algorithm for determining if a number is prime or composite in deterministic polynomial time, and • the algorithm of Reingold for getting out of a maze (that is, solving the undirected s-t con nectivity problem) in deterministic LOGSPACE. Beyond these specific examples, mounting evidence has convinced almost all theoretical com puter scientists of the following Conjecture: Every randomized algorithm can be simulated by a deterministic algorithm with at most polynomial slowdown. Formally, P = BP P . 1.2 Cryptographic Codes 1.2.1 Caesar Cipher In this method, a plaintext message is converted to a ciphertext by simply adding 3 to each letter, wrapping around to A after you reach Z. This method is breakable by hand. 1.2.2 One-Time Pad The “one-time pad” uses a random key that must be as long as the message we want to en crypt. The exclusive-or operation is performed on each bit of the message and key (Msg ⊕ Key = EncryptedMsg) to end up with an encrypted message. The encrypted message can be decrypted by performing the same operation on the encrypted message and the key to retrieve the message (EncryptedMsg⊕Key = Msg). An adversary that intercepts the encrypted message will be unable to decrypt it as long as the key is truly random. -
1 One Parameter Exponential Families
1 One parameter exponential families The world of exponential families bridges the gap between the Gaussian family and general dis- tributions. Many properties of Gaussians carry through to exponential families in a fairly precise sense. • In the Gaussian world, there exact small sample distributional results (i.e. t, F , χ2). • In the exponential family world, there are approximate distributional results (i.e. deviance tests). • In the general setting, we can only appeal to asymptotics. A one-parameter exponential family, F is a one-parameter family of distributions of the form Pη(dx) = exp (η · t(x) − Λ(η)) P0(dx) for some probability measure P0. The parameter η is called the natural or canonical parameter and the function Λ is called the cumulant generating function, and is simply the normalization needed to make dPη fη(x) = (x) = exp (η · t(x) − Λ(η)) dP0 a proper probability density. The random variable t(X) is the sufficient statistic of the exponential family. Note that P0 does not have to be a distribution on R, but these are of course the simplest examples. 1.0.1 A first example: Gaussian with linear sufficient statistic Consider the standard normal distribution Z e−z2=2 P0(A) = p dz A 2π and let t(x) = x. Then, the exponential family is eη·x−x2=2 Pη(dx) / p 2π and we see that Λ(η) = η2=2: eta= np.linspace(-2,2,101) CGF= eta**2/2. plt.plot(eta, CGF) A= plt.gca() A.set_xlabel(r'$\eta$', size=20) A.set_ylabel(r'$\Lambda(\eta)$', size=20) f= plt.gcf() 1 Thus, the exponential family in this setting is the collection F = fN(η; 1) : η 2 Rg : d 1.0.2 Normal with quadratic sufficient statistic on R d As a second example, take P0 = N(0;Id×d), i.e. -
3 Autocorrelation
3 Autocorrelation Autocorrelation refers to the correlation of a time series with its own past and future values. Autocorrelation is also sometimes called “lagged correlation” or “serial correlation”, which refers to the correlation between members of a series of numbers arranged in time. Positive autocorrelation might be considered a specific form of “persistence”, a tendency for a system to remain in the same state from one observation to the next. For example, the likelihood of tomorrow being rainy is greater if today is rainy than if today is dry. Geophysical time series are frequently autocorrelated because of inertia or carryover processes in the physical system. For example, the slowly evolving and moving low pressure systems in the atmosphere might impart persistence to daily rainfall. Or the slow drainage of groundwater reserves might impart correlation to successive annual flows of a river. Or stored photosynthates might impart correlation to successive annual values of tree-ring indices. Autocorrelation complicates the application of statistical tests by reducing the effective sample size. Autocorrelation can also complicate the identification of significant covariance or correlation between time series (e.g., precipitation with a tree-ring series). Autocorrelation implies that a time series is predictable, probabilistically, as future values are correlated with current and past values. Three tools for assessing the autocorrelation of a time series are (1) the time series plot, (2) the lagged scatterplot, and (3) the autocorrelation function. 3.1 Time series plot Positively autocorrelated series are sometimes referred to as persistent because positive departures from the mean tend to be followed by positive depatures from the mean, and negative departures from the mean tend to be followed by negative departures (Figure 3.1). -
Random Variables and Probability Distributions 1.1
RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 1. DISCRETE RANDOM VARIABLES 1.1. Definition of a Discrete Random Variable. A random variable X is said to be discrete if it can assume only a finite or countable infinite number of distinct values. A discrete random variable can be defined on both a countable or uncountable sample space. 1.2. Probability for a discrete random variable. The probability that X takes on the value x, P(X=x), is defined as the sum of the probabilities of all sample points in Ω that are assigned the value x. We may denote P(X=x) by p(x) or pX (x). The expression pX (x) is a function that assigns probabilities to each possible value x; thus it is often called the probability function for the random variable X. 1.3. Probability distribution for a discrete random variable. The probability distribution for a discrete random variable X can be represented by a formula, a table, or a graph, which provides pX (x) = P(X=x) for all x. The probability distribution for a discrete random variable assigns nonzero probabilities to only a countable number of distinct x values. Any value x not explicitly assigned a positive probability is understood to be such that P(X=x) = 0. The function pX (x)= P(X=x) for each x within the range of X is called the probability distribution of X. It is often called the probability mass function for the discrete random variable X. 1.4. Properties of the probability distribution for a discrete random variable. -
The Bayesian Approach to Statistics
THE BAYESIAN APPROACH TO STATISTICS ANTHONY O’HAGAN INTRODUCTION the true nature of scientific reasoning. The fi- nal section addresses various features of modern By far the most widely taught and used statisti- Bayesian methods that provide some explanation for the rapid increase in their adoption since the cal methods in practice are those of the frequen- 1980s. tist school. The ideas of frequentist inference, as set out in Chapter 5 of this book, rest on the frequency definition of probability (Chapter 2), BAYESIAN INFERENCE and were developed in the first half of the 20th century. This chapter concerns a radically differ- We first present the basic procedures of Bayesian ent approach to statistics, the Bayesian approach, inference. which depends instead on the subjective defini- tion of probability (Chapter 3). In some respects, Bayesian methods are older than frequentist ones, Bayes’s Theorem and the Nature of Learning having been the basis of very early statistical rea- Bayesian inference is a process of learning soning as far back as the 18th century. Bayesian from data. To give substance to this statement, statistics as it is now understood, however, dates we need to identify who is doing the learning and back to the 1950s, with subsequent development what they are learning about. in the second half of the 20th century. Over that time, the Bayesian approach has steadily gained Terms and Notation ground, and is now recognized as a legitimate al- ternative to the frequentist approach. The person doing the learning is an individual This chapter is organized into three sections. -
Random Processes
Chapter 6 Random Processes Random Process • A random process is a time-varying function that assigns the outcome of a random experiment to each time instant: X(t). • For a fixed (sample path): a random process is a time varying function, e.g., a signal. – For fixed t: a random process is a random variable. • If one scans all possible outcomes of the underlying random experiment, we shall get an ensemble of signals. • Random Process can be continuous or discrete • Real random process also called stochastic process – Example: Noise source (Noise can often be modeled as a Gaussian random process. An Ensemble of Signals Remember: RV maps Events à Constants RP maps Events à f(t) RP: Discrete and Continuous The set of all possible sample functions {v(t, E i)} is called the ensemble and defines the random process v(t) that describes the noise source. Sample functions of a binary random process. RP Characterization • Random variables x 1 , x 2 , . , x n represent amplitudes of sample functions at t 5 t 1 , t 2 , . , t n . – A random process can, therefore, be viewed as a collection of an infinite number of random variables: RP Characterization – First Order • CDF • PDF • Mean • Mean-Square Statistics of a Random Process RP Characterization – Second Order • The first order does not provide sufficient information as to how rapidly the RP is changing as a function of timeà We use second order estimation RP Characterization – Second Order • The first order does not provide sufficient information as to how rapidly the RP is changing as a function -
IBM Research Report Derandomizing Arthur-Merlin Games And
H-0292 (H1010-004) October 5, 2010 Computer Science IBM Research Report Derandomizing Arthur-Merlin Games and Approximate Counting Implies Exponential-Size Lower Bounds Dan Gutfreund, Akinori Kawachi IBM Research Division Haifa Research Laboratory Mt. Carmel 31905 Haifa, Israel Research Division Almaden - Austin - Beijing - Cambridge - Haifa - India - T. J. Watson - Tokyo - Zurich LIMITED DISTRIBUTION NOTICE: This report has been submitted for publication outside of IBM and will probably be copyrighted if accepted for publication. It has been issued as a Research Report for early dissemination of its contents. In view of the transfer of copyright to the outside publisher, its distribution outside of IBM prior to publication should be limited to peer communications and specific requests. After outside publication, requests should be filled only by reprints or legally obtained copies of the article (e.g. , payment of royalties). Copies may be requested from IBM T. J. Watson Research Center , P. O. Box 218, Yorktown Heights, NY 10598 USA (email: [email protected]). Some reports are available on the internet at http://domino.watson.ibm.com/library/CyberDig.nsf/home . Derandomization Implies Exponential-Size Lower Bounds 1 DERANDOMIZING ARTHUR-MERLIN GAMES AND APPROXIMATE COUNTING IMPLIES EXPONENTIAL-SIZE LOWER BOUNDS Dan Gutfreund and Akinori Kawachi Abstract. We show that if Arthur-Merlin protocols can be deran- domized, then there is a Boolean function computable in deterministic exponential-time with access to an NP oracle, that cannot be computed by Boolean circuits of exponential size. More formally, if prAM ⊆ PNP then there is a Boolean function in ENP that requires circuits of size 2Ω(n). -
11. Parameter Estimation
11. Parameter Estimation Chris Piech and Mehran Sahami May 2017 We have learned many different distributions for random variables and all of those distributions had parame- ters: the numbers that you provide as input when you define a random variable. So far when we were working with random variables, we either were explicitly told the values of the parameters, or, we could divine the values by understanding the process that was generating the random variables. What if we don’t know the values of the parameters and we can’t estimate them from our own expert knowl- edge? What if instead of knowing the random variables, we have a lot of examples of data generated with the same underlying distribution? In this chapter we are going to learn formal ways of estimating parameters from data. These ideas are critical for artificial intelligence. Almost all modern machine learning algorithms work like this: (1) specify a probabilistic model that has parameters. (2) Learn the value of those parameters from data. Parameters Before we dive into parameter estimation, first let’s revisit the concept of parameters. Given a model, the parameters are the numbers that yield the actual distribution. In the case of a Bernoulli random variable, the single parameter was the value p. In the case of a Uniform random variable, the parameters are the a and b values that define the min and max value. Here is a list of random variables and the corresponding parameters. From now on, we are going to use the notation q to be a vector of all the parameters: Distribution Parameters Bernoulli(p) q = p Poisson(l) q = l Uniform(a,b) q = (a;b) Normal(m;s 2) q = (m;s 2) Y = mX + b q = (m;b) In the real world often you don’t know the “true” parameters, but you get to observe data.