<<

, , , and Beyond∗

Wojciech Szpankowski Purdue University W. Lafayette, IN 47907

August 2, 2011

NSF STC Center for Science of Information

Plenary ISIT, St. Petersburg, 2011 Dedicated to PHILIPPE FLAJOLET

∗Research supported by NSF Science & Technology Center, and Humboldt Foundation. Outline

1. Legacy

2. Analytic Combinatorics + IT = Analytic Information

3. The Redundancy Rate Problem (a) Universal Memoryless Sources (b) Universal Renewal Sources (c) Universal Markov Sources 4. Post-Shannon Challenges: Beyond Shannon

5. Science of Information: NSF Science and Technology Center

Algorithms: are at the heart of virtually all technologies; Combinatorics: provides indispensable tools for finding patterns and structures; Information: permeates every corner of our lives and shapes our universe. Shannon Legacy: Three Theorems of Shannon

Theorem 1 & 3. [Shannon 1948; Lossless & Lossy ]

compression rate source H(X) ≥ for distortion level D: lossy rate distortion function R(D) ≥

Theorem 2. [Shannon 1948; Channel Coding ] In Shannon’s words: It is possible to send information at the capacity through the channel with as small a frequency of errors as desired by proper (long) encoding. This statement is not true for any rate greater than the capacity. Post-Shannon Challenges

1. Back off from infinity (Ziv’97): Extend Shannon findings to finite size data structures (i.e., sequences, graphs), that is, develop information theory of various data structures beyond first-order asymptotics.

Claim: Many interesting information-theoretic phenomena appear in the second-order terms.

2. Science of Information: Information Theory needs to meet new challenges of current applications in

biology, , knowledge extraction, economics, ... to understand new aspects of information in:

structure, time, space, and semantics, and dynamic information, limited resources, , representation-invariant information, and cooperation & dependency. Outline Update

1. Shannon Legacy 2. Analytic Information Theory 3. Source Coding: The Redundancy Rate Problem 4. Post-Shannon Information 5. NSF Science and Technology Center Analytic Combinatorics+Information Theory=Analytic Information Theory

In the 1997 Shannon Lecture presented compelling arguments • for “backing off” from first-order asymptotics in order to predict the behavior of real systems with finite length description.

To overcome these difficulties, one may replace first-order analyses by • non-asymptotic analysis, however, we propose to develop full asymptotic expansions and more precise analysis (e.g., large deviations, CLT).

1 Following Hadamard’s precept , we study information theory problems • using techniques of such as generating functions, combinatorial , Rice’s formula, Mellin transform, Fourier series, sequences distributed modulo 1, saddle point methods, analytic poissonization and depoissonization, and singularity analysis.

This program, which applies complex-analytic tools to information theory, • 2 constitutes analytic information theory.

1 The shortest path between two truths on the real line passes through the complex plane. 2 Andrew Odlyzko: “Analytic methods are extremely powerful and when they apply, they often yield estimates of unparalleled precision.” Some Successes of Analytic Information Theory

Wyner-Ziv Conjecture concerning the longest match in the WZ’89 • compression scheme (W.S., 1993). Ziv’s Conjecture on the distribution of the number of phrases in the LZ’78 • (Jacquet & W.S., 1995, 2011). Redundancy of the LZ’78 (Savari, 1997, Louchard & W.S., 1997). • Steinberg-Gutman Conjecture regarding lossy pattern matching compression • (Luczak & W.S., 1997; Kieffer & Yang, 1998; Kontoyiannis, 2003). Precise redundancy of Huffman’s (W.S., 2000) and redundancy of • a fixed-to-variable no prefix free code (W.S. & Verdu, 2010). Minimax Redundancy for memoryless sources (Xie &Barron, 1997; W.S., • 1998; W.S. & Weinberger, 2010), Markov sources (Risannen, 1998; Jacquet & W.S., 2004), and renewal sources (Flajolet & W.S., 2002; Drmota & W.S., 2004). Analysis of variable-to-fixed such as Tunstall and Khodak codes • (Drmota, Reznik, Savari, & W.S., 2006, 2008, 2010). Entropy of hidden Markov processes and the noisy constrained capacity • (Jacquet, Seroussi, & W.S., 2004, 2007, 2010; Han & Marcus, 2007). ... • Outline Update

1. Shannon Legacy 2. Analytic Information Theory 3. Source Coding: The Redundancy Rate Problem (a) Universal Memoryless Sources (b) Universal Markov Sources (c) Universal Renewal Sources Source Coding and Redundancy

Source coding aims at finding codes C : ∗ 0, 1 ∗ of the shortest length A → { } L(C, x), either on average or for individual sequences.

Known Source P : The pointwise and maximal redundancy are:

n n n Rn(Cn,P ; x1 ) = L(Cn, x1 ) + log P (x1 ) n n R∗(C ,P ) = max[L(C , x ) + log P (x )] n n n 1 1 x1 where P (xn) is the of xn = x x . 1 1 1 ··· n Unknown Source P : Following Davisson, the maximal minimax redundancy R∗ ( ) for a family of sources is: n S S n n R∗ ( ) = minsup max[L(C , x ) + log P (x )]. n n n 1 1 S Cn P x ∈S 1 Source Coding and Redundancy

Source coding aims at finding codes C : ∗ 0, 1 ∗ of the shortest length A → { } L(C, x), either on average or for individual sequences.

Known Source P : The pointwise and maximal redundancy are:

n n n Rn(Cn,P ; x1 ) = L(Cn, x1 ) + log P (x1 ) n n R∗(C ,P ) = max[L(C , x ) + log P (x )] n n n 1 1 x1 where P (xn) is the probability of xn = x x . 1 1 1 ··· n Unknown Source P : Following Davisson, the maximal minimax redundancy R∗ ( ) for a family of sources is: n S S n n R∗ ( ) = minsup max[L(C , x ) + log P (x )]. n n n 1 1 S Cn P x ∈S 1

Shtarkov’s Bound:

n n dn( ) := log supP (x1 ) Rn∗ ( ) log supP (x1 ) +1 S P ≤ S ≤ P xn n ∈S xn n ∈S 1X∈A 1X∈A Dn( ) S | {z } Learnable Information and Redundancy

k 1. := = Pθ : θ Θ set of k-dimensional parameterized S M {n ∈ } n distributions. Let θˆ(x ) = arg maxθ Θ log 1/Pθ(x ) be the ML estimator. ∈ Learnable Information and Redundancy

k 1. := = Pθ : θ Θ set of k-dimensional parameterized S M {n ∈ } n distributions. Let θˆ(x ) = arg maxθ Θ log 1/Pθ(x ) be the ML estimator. ∈

C ()Θ balls ; log C θ() useful n n n n 2. Two models, say Pθ(x ) and P (x ) are θ′ indistinguishable if the ML estimator θˆ 1 ⁄. n with high probability declares both models are the same. θˆ ˆ .θ θ.' 3. The number of distinguishable distributions (i.e, θ), Cn(Θ), summarizes then learnable information, indistinquishable I(Θ) = log2 Cn(Θ). models Learnable Information and Redundancy

k 1. := = Pθ : θ Θ set of k-dimensional parameterized S M {n ∈ } n distributions. Let θˆ(x ) = arg maxθ Θ log 1/Pθ(x ) be the ML estimator. ∈

C ()Θ balls ; log C θ() useful bits n n n n 2. Two models, say Pθ(x ) and P (x ) are θ′ indistinguishable if the ML estimator θˆ 1 ⁄. n with high probability declares both models are the same. θˆ ˆ .θ θ.' 3. The number of distinguishable distributions (i.e, θ), Cn(Θ), summarizes then learnable information, indistinquishable I(Θ) = log2 Cn(Θ). models

4. Consider the following expansion of the Kullback-Leibler (KL)

n n 1 T 2 D(Pˆ Pθ) := E[log Pˆ(X )] E[log Pθ(X )] (θ θˆ) I(θˆ)(θ θˆ) d (θ, θˆ) θ|| θ − ∼ 2 − − ≍ I where I(θ) = I (θ) is the matrix and d (θ, θˆ) is a { ij }ij I rescaled Euclidean known as Mahalanobis distance.

5. Balasubramanian proved the number of distinguishable balls Cn(Θ) of radius O(1/√n) is asymptotically equal to the minimax redundancy: P Learnable Information θˆ k = log Cn(Θ) = infθ Θmaxxn log = Rn∗ ( ) ∈ Pθ M Outline Update

1. Shannon Legacy

2. Analytic Information Theory

3. Source Coding: The Redundancy Rate Problem (a) Universal Memoryless Sources (b) Universal Renewal Sources (c) Universal Markov Sources Maximal Minimax for Memoryless Sources

For a memoryless source over the alphabet = 1, 2,..., m we have A { } P (xn) = p k1 p km, k + + k = n. 1 1 ··· m 1 ··· m Then

n Dn( 0) := sup P (x1 ) M n n P (x1 ) Xx1 k1 km = sup p1 pm n p1,...,pm ··· Xx1 n k1 km = sup p1 pm k1,...,km p ,...,pm ··· k + +km=n 1 1 ···X   k1 km n k1 km = . k1,...,km n ··· n k + +km=n     1 ···X   since the (unnormalized) likelihood distribution is

k1 km n k1 km k1 km sup P (x1 )= sup p1 pm = n p ,...,p ··· n ··· n P (x1 ) 1 m     Generating Function for Dn(M0)

We write

k1 km k1 km n k1 km n! k k D ( ) = = 1 m n 0 n M k1,...,km n ··· n n k1! ··· km! k + +km=n     k + +km=n 1 ···X   1 ···X Let us introduce a tree-generating function

k k 1 ∞ k 1 ∞ k − B(z) = zk = , T (z) = zk k! 1 T (z) k! Xk=0 − Xk=1 where T (z) = zeT (z) (= W ( z), Lambert’s W -function) that enumerates − − all rooted labeled trees. Let now

n ∞ n D (z) = zn D ( ). m n M0 n=0 n! X Then by the formula

D (z) = [B(z)]m 1. m − Asymptotics for FINITE m

1 The function B(z) has an algebraic singularity at z = e− , and 1 1 β(z) = B(z/e) = + + O( (1 z). 2(1 z) 3 − − q By Cauchy’s coefficient formulap

m n! n m 1 β(z) Dn( 0) = [z ][B(z)] = √2πn(1 + O(1/n)) dz. M nn 2πi zn+1 I

For finite m, the singularity analysis of Flajolet and Odlyzko implies

n α nα 1 [z ](1 z)− − , α / 0, 1, 2,... − ∼ Γ(α) ∈ { − − }

that finally yields (cf. Clarke & Barron, 1990, W.S., 1998)

m √ m 1 n √π Γ( 2 )m 2 R∗ ( 0) = − log + log + n M 2 2 Γ(m) 3Γ(m 1) · √n   2 ! 2 − 2 3+ m(m 2)(2m + 1) Γ2(m)m2 1 + − 2 + 36 − 9Γ2(m 1) · n ··· 2 − 2 ! Redundancy for LARGE m

Now assume that m is unbounded and may vary with n. Then

m 1 β(z) 1 g(z) Dn,m( 0) = √2πn dz = √2πn e dz M 2πi zn+1 2πi I I where g(z) = m ln β(z) (n + 1)ln z. −

The saddle point z0 is a solution of g′(z0) = 0, that is,

1 2 3 g(z) = g(z0) + (z z0) g′′(z0) + O(g′′′(z0)(z z0) ). 2 − −

Under mild conditions satisfied by our g(z) (e.g., z0 is real and unique), the saddle point method leads to:

g(z0) e g′′′(z0) D ( ) = 1+ O , n,m 0 ρ M 2π g (z ) × (g′′(z0)) | ′′ 0 |    p for some ρ < 3/2. m = o(n) m = n n = o(m) Theorem 1 (Orlitsky and Santhanam, 2004, and W.S. and Weinberger, 2010). (i) For m = o(n)

m 1 n m m log e m 1 m R∗ ( 0) = − log + log e + O n,m M 2 m 2 3 n 2 − n r r 

(ii) For m = αn + ℓ(n), where α is a positive constant and ℓ(n) = o(n),

2 R∗ ( ) = n log B + ℓ(n) log C log A + O(ℓ(n) /n) n,m M0 α α − α p 1 α+2 −Cα where Cα := 0.5 + 0.5 1 + 4/α, Aα := Cα + 2/α, Bα = αCα e . (iii) For n = o(m) p m 3 n2 3 n 1 n3 Rn,m∗ ( 0) = n log + log e log e + O + . M n 2 m − 2 m √n m2 ! Outline Update

1. Shannon Legacy

2. Analytic Information Theory

3. Source Coding: The Redundancy Rate Problem (a) Universal Memoryless Sources (b) Universal Renewal Sources (c) Universal Markov Sources Renewal Sources (Virtual Large Alphabet)

The renewal (introduced in 1996 by Csiszar´ and Shields) defined R0 as follows:

Let T , T ... be a sequence of i.i.d. positive-valued random variables • 1 2 with distribution Q(j) = Pr T = j . { i } In a binary renewal sequence the positions of the 1’s are at the renewal • epochs T , T + T ,... with runs of zeros of lengths T 1, T 1,.... 0 0 1 1 − 2 − Renewal Sources (Virtual Large Alphabet)

The renewal process (introduced in 1996 by Csiszar´ and Shields) defined R0 as follows:

Let T , T ... be a sequence of i.i.d. positive-valued random variables • 1 2 with distribution Q(j) = Pr T = j . { i } In a binary renewal sequence the positions of the 1’s are at the renewal • epochs T , T + T ,... with runs of zeros of lengths T 1, T 1,.... 0 0 1 1 − 2 −

For a sequence xn = 10α110α21 10αn1 0 0 0 ··· ··· k∗ define k as the number of i such that α = m. Then m i | {z } n k k k P (x )=[Q(0)] 0[Q(1)] 1 [Q(n 1)] n 1Pr T > k∗ . 1 ··· − − { 1 } Renewal Sources (Virtual Large Alphabet)

The renewal process (introduced in 1996 by Csiszar´ and Shields) defined R0 as follows:

Let T , T ... be a sequence of i.i.d. positive-valued random variables • 1 2 with distribution Q(j) = Pr T = j . { i } In a binary renewal sequence the positions of the 1’s are at the renewal • epochs T , T + T ,... with runs of zeros of lengths T 1, T 1,.... 0 0 1 1 − 2 −

For a sequence xn = 10α110α21 10αn1 0 0 0 ··· ··· k∗ define k as the number of i such that α = m. Then m i | {z } n k k k P (x )=[Q(0)] 0[Q(1)] 1 [Q(n 1)] n 1Pr T > k∗ . 1 ··· − − { 1 } Theorem 2 (Flajolet and W.S., 1998). Consider the class of renewal processes. Then 2 R∗ ( 0) = √cn + O(log n). n R log 2 2 where c = π 1 0.645. 6 − ≈ Maximal Minimax Redundancy

It can be proved that r 1 D ( ) n r n+1 − ≤ n R0 ≤ m=0 m

n Pk0 k1 kn 1 k k0 k1 kn 1 − rn = rn,k, rn,k = − k k k k ··· k k=0 (n,k) 0 n 1       X IX  ··· −  where (n,k) is is the integer partition of n into k terms, i.e., I

n = k0 + 2k1 + + nkn 1, k = k0 + + kn 1. ··· − ··· − n But we shall study sn = k=0 sn,k where

P k0 kn 1 k k k − sn,k = e− k ! ··· k ! (n,k) 0 n 1 IX −

k n i since S(z, u) = k,n sn,ku z = i∞=1 β(z u). P Q Maximal Minimax Redundancy

It can be proved that r 1 D ( ) n r n+1 − ≤ n R0 ≤ m=0 m n Pk0 k1 kn 1 k k0 k1 kn 1 − rn = rn,k, rn,k = − k k k k ··· k k=0 (n,k) 0 n 1       X IX  ··· −  where (n,k) is is the integer partition of n into k terms, i.e., I

n = k0 + 2k1 + + nkn 1, k = k0 + + kn 1. ··· − ··· − n But we shall study sn = k=0 sn,k where

P k0 kn 1 k k k − sn,k = e− k ! ··· k ! (n,k) 0 n 1 IX − k n i since S(z, u) = k,n sn,ku z = i∞=1 β(z u). P Q

n n c 1 sn = [z ]S(z, 1) = [z ] exp 1 z + a log 1 z − −  

Theorem 3 (Flajolet and W.S., 1998). We have the following asymptotics 7 2 5 1 sn exp 2√cn log n + O(1) , log rn = √cn log n+ log log n+O(1). ∼ − 8 log 2 −8 2   Outline Update

1. Shannon Legacy

2. Analytic Information Theory

3. Source Coding: The Redundancy Rate Problem (a) Universal Memoryless Sources (b) Universal Renewal Sources (c) Universal Markov Sources Maximal Minimax for Markov Sources

Markov source over = m (FINITE) with transition matrix P = p m : |A| { ij}i,j=1 k k11 k kmm k11 kmm k 11 mm Dn( 1) = sup p11 pmm = n( ) ... , M P ··· |T | k1 km xn k n     X1 X∈P where (m) denotes Markov types, while (k) := (xn) denotes the Pn Tn Tn 1 number of sequences of type k = [kij], with kij representing the n of pairs (ij) in x1 . Maximal Minimax for Markov Sources

Markov source over = m (FINITE) with transition matrix P = p m : |A| { ij}i,j=1 k k11 k kmm k11 kmm k 11 mm Dn( 1) = sup p11 pmm = n( ) ... , M P ··· |T | k1 km xn k n     X1 X∈P where (m) denotes Markov types, while (k) := (xn) denotes the Pn Tn Tn 1 number of sequences of type k = [kij], with kij representing the numbers n of pairs (ij) in x1 .

n For circular strings (i.e., after the nth symbol we re-visit the first symbol of x1 ), the matrix k = [k ] satisfies the following constraints denoted as (m): ij Fn

m m 1 i,j m kij = n, j=1 kij = j=1 kji ≤ ≤ P P P For example: m=3 k11 + k12 + k13 + k21 + k22 + k23 + k31 + k32 + k33 = n k12 + k13 = k21 + k31 k12 + k32 = k21 + k23 k13 + k23 = k31 + k32

Matrices satisfying (m) are called balanced matrices, and (m) Fn |Fn | denotes the number of balanced matrices over . A Markov Types and Eulerian Cycles

Example: Let = 0, 1 and A { }

1 2 k = 2 2   Markov Types and Eulerian Cycles

Example: Let = 0, 1 and A { }

1 2 k = 2 2  

(m) – Markov types but also ... Pn a set of all connected Eulerian di-graphs G = (V (G)], E(G)) such that V (G) and E(G) = n. ⊆ A | | (m) – set of connected Eulerian digraphs on . En A (m) – balanced matrices but also ... Fn set of (not necessary connected) Eulerian digraphs on . A

m2 3m+3 Asymptotic equivalence: (m) = (m) + O(n − ) (m) . |Pn | |Fn | ∼ |En | Markov Types – Main Results

Theorem 4 (Knessl, Jacquet, and W.S., 2010). (i) For fixed m and n → ∞ m2 m n − m2 m 1 n(m) = d(m) + O(n − − ) |P | (m2 m)! − m 1 1 ∞ ∞ − 1 1 d(m) = dϕ1dϕ2 dϕm 1. (2π)m 1 ··· 1+ ϕ2 · 1 + (ϕ ϕ )2 ··· − − j=1 j k=ℓ k ℓ Z−∞ Z−∞ Y Y6 − (m 1) fold − − (ii) When m we find that provided that m4 = o(n) → ∞| {z }

3m/2 m2 √2m e m2 m n(m) n − |P | ∼ m2m22mπm/2 ·

6 12 20 Some numerics: (3) 1 n , (4) 1 n , (5) 37 n . |Pn | ∼ 12 6! |Pn | ∼ 96 12! |Pn | ∼ 34560 20! Markov Types – Main Results

Theorem 4 (Knessl, Jacquet, and W.S., 2010). (i) For fixed m and n → ∞ m2 m n − m2 m 1 n(m) = d(m) + O(n − − ) |P | (m2 m)! − m 1 1 ∞ ∞ − 1 1 d(m) = dϕ1dϕ2 dϕm 1. (2π)m 1 ··· 1+ ϕ2 · 1 + (ϕ ϕ )2 ··· − − j=1 j k=ℓ k ℓ Z−∞ Z−∞ 6 − (m 1) fold Y Y − − (ii) When m we find that provided that m4 = o(n) → ∞| {z } 3m/2 m2 √2m e m2 m n(m) n − |P | ∼ m2m22mπm/2 ·

6 12 20 Some numerics: (3) 1 n , (4) 1 n , (5) 37 n . |Pn | ∼ 12 6! |Pn | ∼ 96 12! |Pn | ∼ 34560 20!

Open Question:

Counting types and sequences of a given type 5 in a Markov field. (Answer: (2) 1 n ?) |Pn | ∼ 12 5! Markov Redundancy: Main Results

Theorem 5 (Rissanen, 1996, Jacquet & W.S., 2004). Let be a Markov M1 source over an m-ry alphabet. Then

m(m 1) n R∗ ( 1) = − log + log Am + log(1/n) n M 2 2π   with

j yij A = mF (y ) d[y ] m m ij q ij (1) Pj √yij ZK Yi where (1) = y : y = 1 and F ( ) is a polynomial. K { ij ij ij } m ·Q P In particular, for m = 2 A2 = 16 Catalan where Catalan is Catalan’s ( 1)i × constant − 0.915965594. i (2i+1)2 ≈ P Theorem 6. Let be a Markov source of order r. Then Mr r m (m 1) n r R∗ ( r) = − log + log A + O(1/n) n M 2 2π m   r where Am is a constant defined in a similar fashion as Am above. Outline Update

1. Shannon Information Theory 2. Source Coding 3. The Redundancy Rate Problem 4. Post-Shannon Information (Science of Information) 5. NSF Science and Technology Center Post-Shannon Challenges: Time & Space

We need to extend information theory to include structure, time & space, and ....

1. Coding Rate for Finite Blocklength: Polyanskiy, Poor & Verdu, 2010

1 V 1 log M ∗(n,,ε) C Q− (ε) n ≈ − r n where C is the capacity, V is the channel dispersion, n is the block length, ε error probability, and Q is the complementary Gaussian distribution. Post-Shannon Challenges: Time & Space

We need to extend information theory to include structure, time & space, and ....

1. Coding Rate for Finite Blocklength: Polyanskiy, Poor & Verdu, 2010

1 V 1 log M ∗(n,,ε) C Q− (ε) n ≈ − r n where C is the capacity, V is the channel dispersion, n is the block length, ε error probability, and Q is the complementary Gaussian distribution.

Time & Space: Classical Information Theory is at its weakest in dealing with problems of delay (e.g., information arriving late maybe useless or has less value).

2. Speed of Information in DTN: Jacquet et al., 2008.

3. Real-Time Communication over Unreliable Wireless Channels with Hard Delay Constraints: P.R. Kumar & Hou, 2011.

4. The Quest for Fundamental Limits in MANET, Goldsmith, et al., 2011. Structure

Structure: Measures are needed for quantifying information embodied in structures (e.g., material structures, nanostructures, biomolecules, gene regulatory networks protein interaction networks, social networks, financial transactions). Structure

Structure: Measures are needed for quantifying information embodied in structures (e.g., material structures, nanostructures, biomolecules, gene regulatory networks protein interaction networks, social networks, financial transactions).

Information Content of Unlabeled Graphs: A random structure model of a graph is defined for an unlabeled S G version. Some labeled graphs have the same structure.

G1 1G2 1G3 1G4 1 S1 S2

23 2 3 2 3 2 3

G5 1G6 1G7 1G8 1 S3 S4

23 2 3 2 3 2 3

H = E[ log P (G)] = P (G) log P (G), G − − G X∈G H = E[ log P (S)] = P (S) log P (S). S − − S X∈S Automorphism and Erdos-R¨ enyi´ Graph Model

a Graph Automorphism:

b c For a graph G its automorphism is adjacency preserving permutation of vertices of G. d e Automorphism and Erdos-R¨ enyi´ Graph Model

a Graph Automorphism:

b c For a graph G its automorphism is adjacency preserving permutation of vertices of G. d e

Erdos¨ and Renyi´ model: (n, p) generates graphs with n vertices, where G edges are chosen independently with probability p. If G has k edges, then

k (n) k P (G) = p (1 p) 2 − . −

Theorem 7 (Y. Choi and W.S., 2008). For large n and all p satisfying ln n p n ≪ and 1 p ln n (i.e., the graph is connected w.h.p.), − ≫ n n n 1 H = h(p) log n!+o(1) = h(p) n log n+n log e log n+O(1), S 2 − 2 − − 2     where h(p) = p log p (1 p) log (1 p) is the . − − − − n (h(p)+ε)+log n! n (h(p) ε)+log n! AEP for structures: 2−(2) P (S) 2−(2) − . ≤ ≤

Sketch of Proof: 1. H = H log n!+ S P (S) log Aut(S) . S G − ∈S | | 2. S P (S) log Aut(S) = o(1) by asymmetry of (n,p). ∈S | | P G P Structural Zip (SZIP) Asymptotic Optimality of SZIP

Theorem 8 (Choi, W.S., 2008). Let L(S) be the length of the code.

(i) For large n, n E[L(S)] h(p) n log n + n (c + Φ(log n))+ o(n), ≤ 2 −   where h(p) = p log p (1 p) log (1 p), c is an explicitly computable − − − − constant, and Φ(x) is a fluctuating function with a small amplitude or zero.

(ii) Furthermore, for any ε > 0,

P (L(S) E[L(S)] εn log n) 1 o(1). − ≤ ≥ − (iii) The algorithm runs in O(n + e) on average, where e # edges. Asymptotic Optimality of SZIP

Theorem 8 (Choi, W.S., 2008). Let L(S) be the length of the code.

(i) For large n, n E[L(S)] h(p) n log n + n (c + Φ(log n))+ o(n), ≤ 2 −   where h(p) = p log p (1 p) log (1 p), c is an explicitly computable − − − − constant, and Φ(x) is a fluctuating function with a small amplitude or zero.

(ii) Furthermore, for any ε > 0,

P (L(S) E[L(S)] εn log n) 1 o(1). − ≤ ≥ − (iii) The algorithm runs in O(n + e) on average, where e # edges.

Table 1: The length of encodings (in bits) Networks # of # of our adjacency adjacency nodes edges algorithm matrix list coding US Airports 332 2,126 8,118 54,946 38,268 12,991 Protein interaction (Yeast) 2,361 6,646 46,912 2,785,980 1 59,504 67,488 Collaboration () 6,167 21,535 115,365 19,012, 861 55 9,910 241,811 Collaboration (Erdos)¨ 6,935 11,857 62,617 24,043,645 308,2 82 147,377 Genetic interaction (Human) 8,605 26,066 221,199 37,0 18,710 729,848 310,569 Real-world Internet (AS level) 25,881 52,407 301,148 334,900,140 1,572, 210 396,060 Outline Update

1. Shannon Information Theory 2. Source Coding 3. The Redundancy Rate Problem 4. Post-Shannon Information 5. NSF Science and Technology Center on Science of Information NSF Center for Science of Information

In 2010 National Science Foundation established $25M

Science and Technology Center for Science of Information (http: soihub.org) to advance science and technology through a new quantitative understanding of the representation, communication and processing of information in biological, physical, social and engineering systems.

The center is located at Purdue University and partner istitutions include: Berkeley, MIT, Princeton, Stanford, UIUC, UCSD and Bryn Mawr & Howard U. NSF Center for Science of Information

In 2010 National Science Foundation established $25M

Science and Technology Center for Science of Information (http: soihub.org) to advance science and technology through a new quantitative understanding of the representation, communication and processing of information in biological, physical, social and engineering systems.

The center is located at Purdue University and partner istitutions include: Berkeley, MIT, Princeton, Stanford, UIUC, UCSD and Bryn Mawr & Howard U.

Specific Center’s Goals:

define core theoretical principles governing transfer of information. • develop meters and methods for information. • apply to problems in physical and social sciences, and engineering. • offer a venue for multi-disciplinary long-term collaborations. • transfer advances in research to education and industry. • Acknowledgments

My French Connection:

Philippe Flajolet (1948-2011) Analytic Combinatorics Acknowledgments

My French Connection:

Philippe Flajolet (1948-2011) Analytic Combinatorics

Philippe Jacquet My copain and accomplice for the last thirty something years in the analytic information theory adventure. Acknowledgments

My French Connection:

Philippe Flajolet (1948-2011) Analytic Combinatorics

Philippe Jacquet My copain and accomplice for the last thirty something years in the analytic information theory adventure.

My Palo Alto Connection: Acknowledgments

My French Connection:

Philippe Flajolet (1948-2011) Analytic Combinatorics

Philippe Jacquet My copain and accomplice for the last thirty something years in the analytic information theory adventure.

My Palo Alto Connection:

Finally: TheMaster! ....andALLmyco-authors. That’s IT