Importance of Randomness in Biological Networks: a Random Matrix Analysis

PRAMANA c Indian Academy of Sciences Vol. 84, No. 2 — journal of February 2015 physics pp. 285–293 Importance of randomness in biological networks: A random matrix analysis SARIKA JALAN1,2 1Complex Systems Lab, Discipline of Physics, Indian Institute of Technology Indore, IET-DAVV Campus, Khandawa Road, Indore 452 017, India 2Centre for Bio-Sciences and Bio-Engineering, Indian Institute of Technology Indore, IET-DAVV Campus, Khandawa Road, Indore 452 017, India E-mail: [email protected] DOI: 10.1007/s12043-015-0940-9; ePublication: 29 January 2015 Abstract. Random matrix theory, initially proposed to understand the complex interactions in nuclear spectra, has demonstrated its success in diverse domains of science ranging from quantum chaos to galaxies. We demonstrate the applicability of random matrix theory for networks by pro- viding a new dimension to complex systems research. We show that in spite of huge differences these interaction networks, representing real-world systems, posses from random matrix models, the spectral properties of the underlying matrices of these networks follow random matrix theory bringing them into the same universality class. We further demonstrate the importance of randomness in interactions for deducing crucial properties of the underlying system. This paper provides an overview of the importance of random matrix framework in complex systems research with biological systems as examples. Keywords. Network theory; biological systems; spectra of matrices PACS Nos 64.60.aq; 02.10.Ud; 87.19.xj 1. Introduction The field of network analysis helps us look at every individual component and its interactions as part of a complex social structure [1]. It yields explanations to various phenomena in a wide variety of disciplines ranging from physics to psychology to economics and attempts to draw the reason behind the formation of specific network ties or importance of individual’s position in a network in determining the opportunities and constraints that the individual encounters, in turn affecting its outcome [2,3]. Causal relations between structural attributes and success factors, which seemed thoroughly random to the eyes of a researcher until a decade, have been analysed under network theory framework [4]. The post-genomic era aims to understand the role of proteomics and genomics in Pramana – J. Phys., Vol. 84, No. 2, February 2015 285 Sarika Jalan human health and diseases [5]. The ample availability of data in functional genomic and proteomics has been possible owing to the development of high-throughput data- collection techniques, that have resulted from the basic gene-based traditional molecular biology approach to a systems approach of network biology [6,7]. It has been increas- ingly realized that dissecting the genetic and chemical circuitry prevents us from further understanding the biological processes as a whole [8–10]. In order to understand the com- plexities involved, all reactions and processes should be analysed together [11]. Network biology provides such a framework where biological processes are considered as complex networks of interactions between numerous components of the cell rather than as independent interactions involving only a few molecules [12]. In this paper we shall provide an overview of recent developments in understanding the complex biological systems achieved through random matrix analysis of the underlying networks. Random matrix theory (RMT), proposed by Wigner to explain the statistical properties of nuclear spectra, has elucidated a remarkable success in understanding complex systems which include disordered systems, quantum chaotic systems, spectra of large complex atoms, etc. [13,14]. Further studies illustrate the usefulness of RMT in understanding the statistical properties of the empirical cross-correlation matrices used in the study of multivariate time series of price fluctuations in the stock market [15], EEG data of the brain [16], variation of various atmospheric parameters [17], etc. In this paper, we review the recent extension of this theory to biological networks. The spectra of any real-world network can be divided into three parts, the first part consisting of extremal eigenvalues at both the ends of the spectra, second comprising the smooth middle region and the third part consisting of degenerate eigenvalues mostly found at values 0 and −1. In the following, we explore the properties of these three segments of spectra and their corresponding eigenvectors in detail in order to gain a deeper understanding of biological systems under a mathematical framework. 2. Methods and techniques 2.1 Construction of networks A network consists of nodes (or vertices) which are connected through edges (or links). The adjacency matrix A of a network is constructed as eq. (1): 1, if i ∼ j, A = (1) ij 0, otherwise. Apart from the simple manner of network construction as mentioned earlier, different types of networks can be constructed based on the nature of connections. For example, [11] considers a gene coexpression network generated from the gene coexpression data of six brain regions relevant to Alzheimer’s disease. A binary network is then created in the following manner. Based on Pearson’s product–moment correlation value calculated for each probe set-pair expression level on the microarray (where one gene is represented by one or more probe sets), a threshold value can be chosen and if the coexpression strength is greater than the threshold, the value one is assigned to the corresponding element in the matrix. The coexpression value being less than the threshold yields zero entry in the 286 Pramana – J. Phys., Vol. 84, No. 2, February 2015 Importance of randomness in biological networks matrix. The use of threshold leads to the generation of a network with much less number of edges which might result in many disconnected components and in such cases one analyses properties of the largest connected component. 2.2 Nearest-neighbour spacing distribution (NNSD) The spectra of the corresponding adjacency matrix is denoted by λi = 1,...,N and λ1 >λ2 >λ3 > ···>λN . The random matrix studies of the eigenvalue spectra consider two properties: (1) global properties such as spectral density distribution of eigenvalues defined as N 1 1, if x = 0 ρ(λ) = δ(λ − λ ), where δ(x) = , (2) N j 0, if x = 0 j=1 is the delta function. (2) Local properties such as eigenvalue fluctuations around ρ(λ). In order to calculate local properties in RMT, it is customary to unfold the data by a ¯ ¯ ¯ λ transformation λi = N(λi ), where N = ρ(λ )dλ is the averaged integrated eigen- λmin value density [14]. In the absence of an analytical form for N¯ , we unfold the spectrum numerically by a polynomial curve fitting. Using the unfolded spectra, we calculate the (i) ¯ ¯ nearest-neighbour spacings s = λi+1 − λi distribution (NNSD) and fit it by the Brody distribution (eq. (3)) characterized by the parameter β as follows [18]: β β+1 Pβ (s) = As exp −αs , (3) where A and α are determined by the parameter β as + β + 2 β 1 A = (1 + β)α, α = . β + 1 As β goes from zero to one, the Brody formula smoothly changes from the Poisson statistics (P(s) = exp(−s)) to the Gaussian orthogonal ensemble (GOE) statistics characterized by P(s) = π/2s exp(−πs2/4). The GOE represents a universality class of chaotic systems with time-reversal symmetry yielding level repulsion at small spacings and a Gaussian fall-off at large spacings. This Brody distribution does not model pseudointegrable systems which are non-integrable as well as non-chaotic. 2.3 3(L) statistics We analyse the long-range correlations in eigenvalues using 3(L) statistics which mea- sures the least-square deviation of the spectral staircase function, representing average integrated eigenvalue density N(λ)¯ , from the best fitted straight line for a finite interval of length L of the spectrum given by [18] x+L 1 ¯ ¯ 2 ¯ 3(L; x) = mina,b [N(λ) − aλ − b] dλ, (4) L x Pramana – J. Phys., Vol. 84, No. 2, February 2015 287 Sarika Jalan where a and b are regression coefficients obtained after least square fit. Average over several choices of x gives 3(L), the spectral rigidity. In case of GOE statistics, 3(L) statistic depends logarithmically on L given as 1 (L) ∼ ln L. (5) 3 π 2 2.4 Inverse participation ratio (IPR) l k Let uk be the lth component of the kth eigenvector u . The eigenvector components of the GOE random matrix are the Gaussian-distributed random variables. The distribution =| l |2 of r uk , in the limit of large matrix dimension, is represented by the Porter–Thomas distribution [19]. The inverse participation ratio (IPR) of an eigenvector is defined as N = [ l ]4 Ik uk . (6) l=1 The meaning of I√k is illustrated by two limiting cases: (1) a vector with identical com- l ≡ = 1 = ponents uk 1/ N has Ik 1/N, whereas (2) a vector with one component uk 1 and the remainder zero has Ik = 1. Thus, the IPR quantifies the reciprocal of the number of eigenvector components that contribute significantly. For a vector with components following Porter–Thomas distribution, the IPR takes the value 3/N. 3. Universal spacing distribution All undirected networks entail real eigenvalues. The density distribution ρ(λ) calculated using eq. (2) for most of the biological networks as well as those considered here resem- ble triangular distribution with a peak at zero eigenvalues. The scafree degree distribution followed by the underlying networks is known to be one of the reasons for the occurrence of the triangular shape of the spectral density of the corresponding matrices [20]. Further, sparseness of real-world networks has been debated to bring upon high degeneracy at the zero eigenvalue [21,22]. While calculating the NNSD of the networks, we exclude the flat region of the spectra as well as the extremal eigenvalues and analyse only the smooth part of the spectra.

Importance of Randomness in Biological Networks: a Random Matrix Analysis

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support