Common Randomness, Efficiency, and Actions A

COMMON RANDOMNESS, EFFICIENCY, AND ACTIONS A DISSERTATION SUBMITTED TO THE DEPARTMENT OF ELECTRICAL ENGINEERING AND THE COMMITTEE ON GRADUATE STUDIES OF STANFORD UNIVERSITY IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF DOCTOR OF PHILOSOPHY Lei Zhao August 2011 © 2011 by Lei Zhao. All Rights Reserved. Re-distributed by Stanford University under license with the author. This work is licensed under a Creative Commons Attribution- Noncommercial 3.0 United States License. http://creativecommons.org/licenses/by-nc/3.0/us/ This dissertation is online at: http://purl.stanford.edu/bn436fy2758 ii I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Thomas Cover, Primary Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Itschak Weissman, Co-Adviser I certify that I have read this dissertation and that, in my opinion, it is fully adequate in scope and quality as a dissertation for the degree of Doctor of Philosophy. Abbas El-Gamal Approved for the Stanford University Committee on Graduate Studies. Patricia J. Gumport, Vice Provost Graduate Education This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file in University Archives. iii Preface The source coding theorem and channel coding theorem, first established by Shannon in 1948, are the two pillars of information theory. The insight obtained from Shan- non’s work greatly changed the way modern communication systems were thought and built. As the original ideas of Shannon were absorbed by researchers, the mathe- matical tools in information theory were put to great use in statistics, portfolio theory, complexity theory, and probability theory. In this work, we explore the area of common randomness generation, where remote nodes use nature’s correlated random resource and communication to generate a random variable in common. In particular, we investigate the initial efficiency of common randomness generation as the communication rate goes down to zero, and the saturation efficiency as the communication exhausts nature’s randomness. We also consider the setting where some of the nodes can generate action sequences to influence part of nature’s randomness. At last, we consider actions in the framework of source coding. The tools from channel coding and distributed source coding are combined to establish the funda- mental limit of compression with actions. iv Acknowledgements The five years I spent at Stanford doing my Ph.D. have been a very pleasant and fulfilling journey. And it is my advisor Thomas Cover, who made it possible. His weekly round-robin group meeting was the best place for research discussion and was also full of interesting puzzles and stories. He revealed the pearls of information theory as well as statistics through all those beautiful examples and always encouraged me on every small findings I obtained. It is a privilege to work with him and I would like to thank him for his support, and guidance. I am also truly grateful to Professor Tsachy Weissman, who taught me amazing universal schemes in information theory and was always willing to let me do “random drawing” on his white boards. I really like his way of asking have-we-convinced- ourselves questions, which often led to surprisingly simple yet insightful discoveries. Professor Abbas El Gamal is of great influence on me. I would like to extend my sincere thanks to him. His broad knowledge on network information theory and his teaching of EE478 were invaluable to my research. I would like to thank my colleagues at Stanford, especially, Himanshu Asnani, Bernd Bandemer, Yeow Khiang Chia, Paul Cuff, Shirin Jalali, Gowtham Kumar, Vinith Misra, Alexandros Manolakos, Taesup Moon, Albert No, Idoia Ochoa, Haim Permuter, Han-I Su, and Kartik Venkat. Last but not least, I am grateful to my family. I thank my parents for their constant support and love. I thank my wife for her love, and for completing my life. v Contents Preface iv Acknowledgements v 1 Introduction 1 2 Hirschfeld-Gebelein-Renyi maximal correlation 4 2.1 HGRcorrelation ............................. 4 2.2 Examples ................................. 6 2.2.1 Doubly symmetric binary source . 7 2.2.2 Z-Channel with Bern(1/2)input................. 7 2.2.3 ErasureChannel ......................... 8 3 Common randomness generation 11 3.1 Commonrandomnessandefficiency . 11 3.1.1 Commonrandomness andcommoninformation . 13 3.1.2 Continuity at R =0 ....................... 14 3.1.3 Initial Efficiency (R 0) ..................... 14 ↓ 3.1.4 Efficiency at R H(X Y )(saturationefficiency) . 16 ↑ | 3.2 Examples ................................. 18 3.2.1 DBSC(p)example......................... 18 3.2.2 Gaussianexample......................... 20 3.2.3 Erasureexample ......................... 23 3.3 Extensions................................. 24 vi 3.3.1 CRperunitcost ......................... 24 3.3.2 Secretkeygeneration. .. .. 24 3.3.3 Non-degenerate V ......................... 26 3.3.4 Broadcastsetting ......................... 26 4 Common randomness generation with actions 28 4.1 Commonrandomnesswithaction . 28 4.2 Example.................................. 32 4.3 Efficiency ................................. 34 4.3.1 Initial Efficiency . 34 4.3.2 Saturationefficiency ....................... 35 4.4 Extensions................................. 35 5 Compression with actions 37 5.1 Introduction................................ 37 5.2 Definitions................................. 39 5.2.1 Losslesscase............................ 39 5.2.2 Lossycase............................. 39 5.2.3 Causalobservationsofstatesequence . 40 5.3 Losslesscase................................ 40 5.3.1 Lossless, noncausal compression with action . 40 5.3.2 Lossless, causal compression with action . 46 5.3.3 Examples ............................. 48 5.4 Lossycompressionwithactions . 51 6 Conclusions 54 A Proofs of Chapter 2 56 A.1 Proof of the convexity of ρ(PX PY X ) in PY X ............ 56 ⊗ | | B Proofs of Chapter 3 59 B.1 Proof of the continuity of C(R) at R =0................ 59 vii C Proofs of Chapter 4 61 C.1 ConverseproofofTheorem5. 61 C.2 Proof for initial efficiency with actions . 63 C.3 ProofofLemma 5 ............................ 66 D Proofs of Chapter 5 68 D.1 ProofofLemma6............................. 68 D.2 ProofofLemma7............................. 70 Bibliography 71 viii List of Tables ix List of Figures n n 1.1 Generate common randomness: K = K(X ), K′ = K′(Y ) satisfy- ing P(K = K′) 1 as n . What is the maximum common → → ∞ 1 randomness per symbol, i.e. what is sup n H(K)? ........... 2 2.1 ....................................... 6 2.2 ....................................... 6 2.3 X Bern(1/2) .............................. 7 ∼ 2.4 ρ(X; Y )=1 2 min p, 1 p ...................... 7 − { − } 2.5 Z-channel ................................. 8 1 p 2.6 ρ(X; Y )= 1+−p ............................. 8 2.7 ErasureChannelq ............................. 8 3.1 Common Randomness Capacity: (Xi,Yi) are i.i.d.. Node 1 generates a r.v. K based on the Xn sequence it observes. It also generates a message M and transmits the message to Node 2 under rate constraint n R. Node 2 generates a r.v. K′ based on the Y sequence it observes and M. We require that P(K = K′) approaches 1 as n goes to infinity. The entropy of K measures the amount of common randomness those two nodes can generate. What is the maximum entropy of K? ... 12 3.2 The probability structure of Un...................... 17 3.3 DBSC example: X Bern(1/2), pY X (x x)=(1 p), pY X (1 x x)= p. 18 ∼ | | − | − | 3.4 C(R) for p =0.08. ............................ 19 3.5 GaussianExample ............................ 21 3.6 Auxiliary r.v. U inGaussianexample. ................. 21 x 3.7 Gaussian example: C(R) for N =0.5.................. 22 3.8 Erasureexample ............................. 23 3.9 Erasure example: C R curve...................... 23 − 3.10 Commonrandomnessperunitcost. 24 3.11 SecretKeyGeneration .......................... 25 3.12CRbroadcastsetup............................ 26 4.1 Common Randomness Capacity: X is an i.i.d. source. Node { i}i=1,... 1 generates a r.v. K based on the Xn sequence it observes. It also generates a message M and transmits the message to Node 2 under rate constraint R. Node 2 first generates an action sequence An as a function of M and receives a sequence of side information Y n, where n n n Y (A ,X ) p(y a, x). Then Node 2 generates a r.v. K′ based on | ∼ | n both M and Y sequence it observes and M. We require P(K = K′) to be close to 1. The entropy of K measures the amount of common randomness those two nodes can generate. What is the maximum entropy of K? .............................. 29 4.2 CRwithActionexample ......................... 33 4.3 Correlate A with X ............................ 33 4.4 CR with action example: option one: set A X; option two: correlate ⊥ A with X.................................. 34 5.1 Compression with actions. The Action encoder first observes the state sequence Sn and then generates an action sequence An. The ith output Y is the output of a channel p(y a, s) when a = A and s = S . i | i i The compressor generates a description M of 2nR bits to describe Y n. The remote decoder generates Yˆ n based on M and it’s available side information Zn as a reconstruction of Y n. ............... 38 5.2 Binary example with side information Z = ............... 48 ∅ H2(b) H2(p) dH2 5.3 The threshold b∗ solves − = , b [0, 1/2]......... 50 b db ∈ 5.4 Comparison between the non-causal and causal rate-cost functions. TheparameteroftheBernoullinoiseissetat0.1. 51 xi Chapter 1 Introduction Given a pair of random variables (X,Y ) with joint distribution p(x, y), what do they have in common? Different quantities can be justified as the right measure of “common” in different settings. For example, in linear estimation, correlation determines the minimum mean square error (MMSE) when we use one random variable to esti- mate the other.

Common Randomness, Efficiency, and Actions A

4.2 Variance and Covariance

On the Efficiency and Consistency of Likelihood Estimation in Multivariate

Sampling Student's T Distribution – Use of the Inverse Cumulative

The Smoothed Median and the Bootstrap

A Note on Inference in a Bivariate Normal Distribution Model Jaya

Chapter 23 Flexible Budgets and Standard Cost Systems

Random Variables Generation

Comparison of the Estimation Efficiency of Regression

The Third Moment in Equity Portfolio Theory – Skew

A Computational View of Market Efficiency

Energy Efficiency Title

Relative Efficiency