EE-731: ADVANCED TOPICSIN DATA SCIENCES LABORATORY FOR INFORMATION AND INFERENCE SYSTEMS SPRING 2016
INSTRUCTOR:VOLKAN CEVHER SCRIBER:HARSHIT GUPTA
PROBABILITYINHIGHDIMENSIONS
Index terms: Concentration inequalities, Bounded difference function, Cramer-Chernoff bound, Johnson-Lindenstrauss Theorem, Hoeffding’s Lemma, Sub-Gaussian random variable, Entropy, Herbst’s trick
1 Introduction
Measuring the concentration of a random variable around some value has many applications in statistics, information theory, optimization, etc. In probability theory, this concentration of measures can be quantified as :
Definition 1. Given a random variable Y and a constant m, Y is said to be concentrated around m if,
P(|Y − m| > t) ≤ D(t) where, D(t) decreases drastically to 0 in t.
These inequalities are called concentration of measure inequalities [1] [2]. 1 Pn Example 1.1. A simple example can be considered when Yn = n i=1 Xi, where Xi are independent random variables with 2 mean µ and variance σ , then Yn tends to concentrate around µ as n → ∞. The concentration in this case can be quantified by following results:
• Law of Large Numbers : P(|Y − m| > ) → 0 as n → ∞. • P |Y − m| √α → − α n → ∞ Central Limit Theorem : ( > n ) 2Φ σ as , where Φ is the standard normal CDF. • Large Deviations : Under some technical assumptions, P(|Y − m| > ) ≤ exp(−n.c())
It is quite interesting to note that these concentration properties on the average of the independent random variables hold for the more generalized scenario where a not too sensitive function of independent random variables concentrates on its expectation, formally:
Proposition 1.1. If x1, ··· , xn are independent random variables then any function f (x1, ··· , xn) that is, not too sensitive to any of the co-ordinate will concentrate around its mean:
−t2/c(t) P(| f (x1, ··· , xn) − E[ f (x1, ··· , xn)]| > t) ≤ e . Here, c(t) quantifies the sensitivity of the function to its variables.
2 Concentration Inequalities
Now we define several types of functions and the concentration inequalities: LIONS @ EPFL Advanced Topics in Data Sciences Prof. Volkan Cevher
2.1 Bounded Difference Function n Definition 2. A function f : X → R is called a bounded difference function if for some non-negative c1, ··· , cn,
0 sup | f (x1, ··· , xi, ··· , xn) − f (x1, ··· , xi , ··· , xn)| ≤ ci. 0 {x1,...,xi,...,xn,xi ∈X}
Example 2.1 (Chromatic number of a Random Graph). Let V = {1, ··· , n} and G be a random graph such that each pair i, j ∈ V is independently connected with probability p. Let Xi j = 1 if (i, j) are connected and 0 otherwise. The Chromatic number of G is the number of colors needed to color the vertices such that no two connected vertices have the same color. Defining: Chromatic number = f (X11, ··· , Xi j, ··· , Xnn) it can be shown that f is a bounded difference function with Ci j = 1.
n Theorem 2.1 (Bounded difference inequality). Let f : X → R satisfy the bounded difference property with ci’s and let X1, ··· , Xn be independent random variables. Then: −2t2 P(| f (x , ··· , x ) − E[ f (x , ··· , x )]| > t) ≤ 2 exp . 1 n 1 n PN 2 i=1 ci Proof. To prove this result we need to exploit some common probability bounds. As these bounds are essential to prove the inequality, therefore, we discuss them in details with examples, meanwhile, keeping the flow of the proof:
• Cramer Chernoff Bound
– Markov’s Inequality We start by stating Markov’s inequality and its proof:
E[Z] Theorem 2.2 (Markov’s Inequality). Let Z be a non-negative random variable then, P(Z ≥ t) ≤ t . Proof. This can be proven by showing that: Z ∞ Z ∞ z Z ∞ z E[Z] PZ(z) dz ≤ PZ(z) dz ≤ PZ(z) dz = . t t t 0 t t
– Chebyshev’s Inequality We extend this result to any non-decreasing and non-negative function ϕ of the non- negative random varibale Z then, E[ϕ(Z)] P(Z ≥ t) ≤ P(ϕ(Z) ≥ ϕ(t)) ≤ . ϕ(t)
On choosing ϕ(Z) = Z2, and substituting |Z − E[Z]| into Z, we get the Chebyshev’s inequality: Var[Z] P(|Z − E[Z]| ≥ t) ≤ . t2 λZ λZ – Chernoff Bound Similarly, by choosing ϕ(Z) = e where, λ ≥ 0 and taking ψZ(λ) = log , we get the Chernoff bound: " !# −λt λZ P(Z ≥ t) ≤ inf e E[e ] = E exp −sup(λt − ψZ(λ)) . λ≥0 λ≥0
2 LIONS @ EPFL Advanced Topics in Data Sciences Prof. Volkan Cevher
– Cramer-Transform To get the Cramer-Chernoff inequality from this Chernoff bound we first define Cramer Transform of the random variable Z,:
∗ ψZ(t) = sup λt − ψZ(λ) λ≥0 Thus, by putting the Cramer transform of Z in the Chernoff bound we derive: