Information Flow in Biological Networks

INFORMATION FLOW IN BIOLOGICAL NETWORKS Gaˇsper Tkaˇcik A DISSERTATION PRESENTED TO THE FACULTY OF PRINCETON UNIVERSITY IN CANDIDACY FOR THE DEGREE OF DOCTOR OF PHILOSOPHY RECOMMENDED FOR ACCEPTANCE BY THE DEPARTMENT OF PHYSICS September 2007 c Copyright by Gaˇsper Tkaˇcik,2007. All rights reserved. Abstract Biology presents us with several interesting kinds of networks that transmit and process information. Neurons communicate by exchanging action potentials; proteins in a signaling pathway detect chemicals at the cell surface and conduct the information to the nucleus in a cascade of chemical reactions; and in gene regulation, transcription factors are responsible for integrating environmental signals and appropriately regulating their target genes. Understanding the collective behavior of biological networks is not easy. These systems are inherently noisy and thus require a model of the mean dynamics as well as that of the noise; in addition, if we view the regulatory networks as information transmission devices, implemented by the “hardware” of chemical reactions, we need to describe them in a probabilistic, not deterministic (or noiseless), language. Unfortunately, connecting theory with experiment then becomes exponentially hard due to sampling problems as the number of interacting elements grows, and progress depends on finding some simplifying principle. In this work two such principles are presented. In the first half I discuss a bottom- up approach and analyze the responses of a set of retinal ganglion cells to a naturalistic movie clip, and the activation states of proteins in a signaling cascade of immune system cells. The simplifying principle here is the idea that the distribution of activities over elements of the network is maximum entropy (or most random), but still consistent with some experimentally measured moments (specifically, pairwise correlations). The analogies between maximum entropy and Ising models are illustrated and connected to the previously existing theoretical work on spin-glass properties of neural networks. In the second part a top-down approach is presented: viewing genetic regulatory networks as being driven to maximize the reliability of the information transmission between their inputs and outputs, I first examine the best performance of genetic regulatory elements achievable given experimentally motivated models of noise in gene regulation; and second, make the hypothesis that, in some systems at least, such optimization is beneficial for the organism and that its predictions are verifiable. Data from early morphogenesis in the fruit fly, Drosophila melanogaster, are used to illustrate these claims. iii Acknowledgements There is something remarkable about my advisors, William Bialek and Curtis Callan, apart from their sheer strength as scientific thinkers. Bill and Curt seem to enjoy life, and let younger colleagues take part in that. They lead by example in showing that being ‘pro- fessorial’ and having fun are not incompatible. It sounds like a simple idea, but I found it important. At times, even a strong combination of intellectual curiosity and a long-term view of one’s scientific endeavors (in particular how they might be realized by obtaining a tenured faculty position) fail to fully motivate the pursuit of a career in science. At those moments, it becomes particularly important to be happy and to enjoy the company of the people around you. I would also like to thank people that I had the pleasure of interacting with in the past few years: Justin, Sergey, Thomas, Noam, Elad, Greg and Stephanie. Postdoctoral fellows especially were happy to share some of their accumulated wisdom – both scientific and of things more general – with their younger, inexperienced friends. I hereby acknowledge the support of the Princeton University, Burroughs-Wellcome Fund and Charlotte E. Procter fellowship. Some parts of this thesis have appeared as preprints on the arXiv. I would like to sep- arately thank my collaborators on the following projects: Section 2.6 Ising models for networks of real neurons arXiv:q-bio.NC/0611072 with E Schneidman, MJ Berry II and W Bialek Section 3.2 The role of input noise in gene regulation arXiv:q-bio.MN/0701002 with T Gregor and W Bialek Section 4.6 Information flow and optimization arXiv:0705.0313v1 in transcriptional control with CG Callan Jr and W Bialek iv Sonji. Contents Abstract . iii Acknowledgements . iv 1 Introduction 1 2 Building networks from data 5 2.1 Computing correlations . 5 2.2 Networks from correlations . 9 2.3 Computing interactions . 12 2.4 Networks from interactions . 17 2.5 Example I: Biochemical network . 19 2.5.1 Discovering structure in the data . 20 2.5.2 Analyzing a single condition . 24 2.5.3 Combining multiple conditions . 28 2.5.4 Discussion . 33 2.6 Example II: Network of retinal ganglion cells . 35 2.6.1 Extension to stimulus dependent models . 41 2.7 Summary . 51 3 Genetic regulatory networks 53 3.1 Signal transduction in gene regulation . 53 3.2 Input and output noise in transcriptional regulation . 58 3.2.1 Introduction . 58 3.2.2 Global consistency . 59 3.2.3 Sources of noise . 61 3.2.4 Signatures of input noise . 63 3.2.5 Discussion . 66 3.3 Alternative noise sources . 69 3.3.1 Effects of non-specific binding sites . 69 3.3.2 Effects of TF diffusion along the DNA . 75 3.4 Summary . 79 4 Building networks that transmit information 80 4.1 Introduction . 80 4.2 Maximizing information transmission . 81 4.2.1 Small noise approximation . 82 4.2.2 Large noise approximation . 84 4.2.3 Exact solution . 85 vi 4.3 A model of signals and noise . 86 4.4 Results . 88 4.4.1 Capacity of simple regulatory elements . 88 4.4.2 Cooperativity, dynamic range and the tuning of solutions . 92 4.4.3 Non-specific binding and the costs of higher capacity . 93 4.5 Discussion . 96 4.6 Example: Information flow in morphogenesis of the fruit fly . 98 5 Conclusion 105 A Methods 107 A.1 The biochemical network . 107 A.2 Network of retinal ganglion cells . 110 A.3 Input and output noise in transcriptional regulation . 122 A.4 Information flow in transcriptional regulation . 127 A.4.1 Finding optimal channel capacities . 127 A.4.2 Channel capacity at constrained expense . 129 A.4.3 Capacity as the number of distinguishable discrete states . 131 A.4.4 Solutions with alternative input-output relations . 137 A.4.5 Validity of Langevin approximations . 137 A.4.6 Fine-tuning of optimal distributions . 139 Bibliography 141 vii List of Figures 2.1 Comparison of correlation coefficient and mutual information dependency measures. 6 2.2 Correlation vs information for yeast ESR genes. 10 2.3 Clustering of the yeast ESR genes. 11 2.4 Difference between correlations and interactions. 13 2.5 Graphical illustration of conditional independence. 14 2.6 MAP signaling network in immune system cells. 19 2.7 Raw activation data for MEK protein. 21 2.8 Discretization of the protein activation data. 21 2.9 Effects of discretization of continuous data. 23 2.10 Correlation between activation levels of signaling proteins. 24 2.11 Interaction maps between proteins for different conditions. 25 2.12 Sparsity of the interaction maps. 26 2.13 Information capture by Ising models. 27 2.14 Verification of the model by 3-point correlations. 28 2.15 Verification of the model by histograming the energy. 29 2.16 Interaction map across conditions. 31 2.17 Frequency plots of activation patterns for all 9 conditions. 32 2.18 Success of the pairwise Ising model for neurons. 37 2.19 Thermodynamics of the Ising models for neurons. 38 2.20 Stable states of the Ising models for neurons. 40 2.21 Observed and predicted pattern probabilities in stimulus dependent maxent 44 2.22 Condition dependent number of spikes and frequency of silence . 44 2.23 Simultaneous reconstruction of spike triggered averages for 10 neurons. 46 2.24 Neuron-stimulus couplings for 10 neurons . 47 2.25 Distributions of stimulus dependent magnetic fields for 10 neurons. 48 2.26 Spike-stimulus channel capacity scaling. 50 2.27 Most informative stimulus codewords . 50 3.1 Schematic diagram of genetic regulatory mechanisms. 55 3.2 Simplified schematic of transcriptional regulation. 59 3.3 Expression noise as a function of the mean expression. 62 3.4 The input/output relation of Bicoid-Hunchback system. 64 3.5 Hunchback noise as the function of its mean. 65 3.6 Scaling of the output noise with the output mean. 66 3.7 A diagram of the position weight matrix. 70 3.8 Schematic diagram of transcription factor translocation on the DNA. 75 3.9 The effect of 1D diffusion along the DNA on the noise. 78 viii 4.1 Schematic diagram of the regulatory element. 82 4.2 Large noise approximation. 85 4.3 Information plane for activator without cooperativity. 89 4.4 Information capacity of activators vs repressors. 90 4.5 Approximations for channel capacity of regulatory elements. 91 4.6 Effects of TF concentration constraints on the channel capacity. 93 4.7 Effects of costs on channel capacity of simple regulatory elements. 95 4.8 Probabilistic formulation of transcriptional regulation. 98 4.9 Hunchback mean expression in variance as functions of Bicoid concentration. 102 4.10 Information capacity scaling as a function of noise variance. 103 4.11 Predicted and observed distributions of Hunchback expression. 104 A.1 Interaction maps with random quantizations. 108 A.2 Bootstrap error estimation for correlation functions. 111 A.3 Reconstruction precision of 40-neuron Monte Carlo. 112 A.4 Energy histograms for networks of neurons. 113 A.5 Reconstruction precision for 20 neurons. 114 A.6 Comparisons between couplings extracted from 20 and 40-neuron nets.

Information Flow in Biological Networks

July 2007 (Volume 16, Number 7) Entire Issue

The Struggle for Quantum Theory 47 5.1Aliensignals

Advanced Information on the Nobel Prize in Physics, 5 October 2004

November 2011 Volume 20, No.10 TM

Twenty Five Years of Asymptotic Freedom

Sam Treiman Was Born in Chicago to a First-Generation Immigrant Family

IAS Letter Spring 2004

October 2010 Volume 19, No

Robert Dicke and the Naissance of Experimental Gravity Physics

October 2009

Of the Annual

April 2010 Volume 19, No