Network Motifs Bioinformatics: Sequence Analysis

COMP 571 - Spring 2015 Luay Nakhleh, Rice University Motifs

✤ Not all subgraphs occur with equal frequency

✤ Motifs are subgraphs that are over-represented compared to a randomized version of the same network

✤ To identify motifs:

✤ Identify all subgraphs of n nodes in the network

✤ Randomize the network, while keeping the number of nodes, edges, and distribution unchanged

✤ Identify all subgraphs of n nodes in the randomized version

✤ Subgraphs that occur significantly more frequently in the real network, as compared to the randomized one, are designated to be the motifs Outline

✤ Motifs in cellular networks: case studies

✤ Efficient sampling in networks

✤ Comparing the local structure of networks

✤ Motif evolution Motifs in Cellular Networks: Case Studies Motifs in Transcription Regulation Networks: The Data

✤ Research group: and co-workers

✤ Organism: E. coli

✤ Nodes of the network: 424 operons, 116 of which encode transcription factors

✤ (Directed) Edges of the network: 577 interactions (from an operon that encodes a TF to an operon that is regulated by that TF)

✤ Source: mainly RegulonDB database, but enriched with other sources Motifs in Transcription Regulation Networks: Findings

✤ Alon and colleagues found that much of the network is composed of repeated appearances of three highly significant motifs

✤ feedforward (FFL)

✤ single input module (SIM)

✤ dense overlapping regulons (DOR)

✤ Each network motif has a specific function in determining gene expression, such as generating “temporal expression programs” and governing the responses to fluctuating external signals

✤ The motif structure also allows an easily interpretable view of the entire known transcriptional network of the organism Motifs in Transcription Regulation Networks: Motif Type (1): Feedforward loops

general TF

feedforward loop (FFL) specific TF

effector operon

coherent if the direct effect of X on Z has the same indirect effect of X on Z through Y FFL is { incoherent otherwise FFL Types Relative abundance of the eight FFL types in the transcription networks of yeast and E. coli. FFL types are marked C and I for coherent and incoherent, respectively. Motifs in Transcription Regulation Networks: Motif Types (2) and (3): Variable-size motifs

Single input module (SIM) Dense overlapping regulon (DOR)

* All operons Z1,...,Zn are regulated with the same sign * None is regulated by a TF other than X * X is usually autoregulatory

Motifs in Transcription Regulation Networks: Functional Roles of Motifs Motifs in Other Networks

✤ Following their success at identifying motifs in transcription regulation network in E. coli, Alon and co-workers analyzed other types of networks: gene regulation (in E. coli and S. cerevisiae), neurons (in C. elegans), food webs (in 7 ecological systems), electronic circuits (forward logic chips and digital fractional multipliers), and WWW Motifs in Other Networks Motif Types

Issues with the Null Hypothesis

✤ In analyzing the neural-connectivity map of C. elegans, Alon and co-workers generated randomized networks in which the probability of two neurons connecting is completely independent of their relative positions in the network

✤ However, in reality, two neighboring neurons have a greater chance of forming a connection than two distant neurons at opposite ends of the network

✤ Therefore, the test performed by Alon and co-worker was not null to this form of localized aggregation and would misclassify a completely random but spatially clustered network as one that is nonrandom and that has significant network motifs

✤ In this case, a is more appropriate

✤ The issue of null models hold also for regulatory networks... ANALYSIS

The evolution of genetic networks by non-adaptive processes

Michael Lynch Abstract | Although numerous investigators assume that the global features of genetic networks are moulded by natural selection, there has been no formal demonstration of the adaptive origin of any genetic network. This Analysis shows that many of the qualitative features of known transcriptional networks can arise readily through the non-adaptive processes of genetic drift, mutation and recombination, raising questionsNeutral forces acting on intragenomic variability shape about whether natural selection is necessary or even sufficient for the origin of many aspects of gene-network topologies. The widespread reliance on computationalthe Escherichia coli regulatory procedures that are devoid of population-genetic details to generate hypotheses for the evolution of network configurations seems to be unjustified. Troy Ruths1 and Luay Nakhleh1 Neutral forces acting on intragenomic variabilityDepartment shape of Computer Science, Rice University, Houston, TX 77251 Although the ubiquity of genetic pathways underlying be of adaptive value, the physical mechanisms that the Escherichia coli regulatory network topologyEdited by Sean B. Carroll, University of Wisconsin, Madison, WI, and approved March 27, 2013 (received for review October 9, 2012) metabolic and developmental processes is beyond dis- give rise to genome architectural features are logically 1 1 pute, the mechanisms by whichTroy genetic Ruths and networks Luay Nakhleh become distinct from the adaptive processes that Cisutilize-regulatory such networks (CRNs) play a central role in cellular deci- underlying genome. This coupling allows us to incorporate knowl- Department of Computer Science, Rice University, Houston, TX 77251 13 sion making. Like every other biological system, CRNs undergo evo- edge about genomes and their features, which is currently much established evolutionarily areEdited byfar Sean from B. Carroll, clear. University ofMany Wisconsin, Madison,features WI, and approved as evolutionary March 27, 2013 (received resources for review October. Contrary 9, 2012) to some REF. 14 physicists, engineers and computerCis-regulatory scientists, networks (CRNs) and play some a central rolesuggestions in cellular deci- (forunderlying example, genome. This coupling), allowsthere us tois incorporatelution,no evidence knowl- which shapes their properties by a combination of adaptive richer than our knowledge of CRNs. In particular, an important sion making. Like every other biological system, CRNs undergo evo- edge about genomes and their features, which is currently much insight into improving the quantifiability of neutral trends is that cell and developmental biologists,lution, which are shapes convinced their properties that by a combination that ofgenetic adaptive pathwaysricher than our emerge knowledge of de CRNs. novo In particular, in andresponse an importantnonadaptive to evolutionary forces. Teasing apart these forces is fi biological networks exhibit propertiesand nonadaptive evolutionarythat could forces. only Teasing aparta selective these forces is challenge.insight into improving Rather, the pathway quanti ability ofevolution neutral trends prob is that - promoter length and the spontaneous gain and loss rates of TF an important step toward functional analyses of the different com- promoter length and the spontaneous gain andan loss important rates of TF step toward functional analyses of the different com- be products of natural selectionponents (for of example, CRNs, designing REFS regulatory 1–5 perturbation); ably experiments, proceedsbinding by a sites gradual (TFBS) vary augmentation, substantially within a ponentsmaking genome and thatuse of CRNs, designing regulatory perturbation experiments, binding sites (TFBS) vary substantially within a genome and that and constructing synthetic networks. Although tests of neutrality reducing each distribution to one value potentially eclipses impor- however, the matter has rarelyand selection been based examined on molecular sequencein the data exist,of mutational no such tests tant variation emergent properties arising and struct independentlyure at the networkand level. constructing Previousat dif- synthetic networks. Although tests of neutrality reducing each distribution to one value potentially eclipses impor- are currently available based on CRNs. In this work, we present work assumed all promoters were the same length (10, 12, 15, 16), whereas the current work incorporates variability in promoter context of well-establisheda evolutionary unique genotype model principles. of CRNs that is grounded ferent in a genomic loci, as occurs in nearly all other evolutionary tant emergent properties and structure at the network level. Previous lengths. Finally, by subjecting a population ofand individuals selection whose based on molecular sequence data exist, no such tests context and demonstrate its use in identifying portions of the Five popular concepts in biology today — redundancy, processes. genotypes are thus constructed to nonadaptive forces of evolu- work assumed all promoters were the same length (10, 12, 15, 16), CRN with properties explainable by neutral evolutionary forces tion, we provide a simulation framework for generatingare currently data cor- available based on CRNs. In this work, we present robustness, , complexityat the system, subsystem,and evolvability and operon levels. We leverageQualitative our model responding observations to a null model suggest of only neutral that forces. the Wecomplex leveraged - whereas the current work incorporates variability in promoter against experimentally derived data from Escherichia coli. The this framework to analyze and quantify emergenta properties unique in an genotype model of CRNs that is grounded in a genomic fi — invoke a vision of the cellresults as ofan this electronic analysis show statisticallycircuit, signi cantity andof substan-regulatoryEscherichia and coli CRN.protein-interaction networks lengths. Finally, by subjecting a population of individuals whose tial neutral trends in properties previously identified as adaptive It is important to note that graph-theoretic techniques,context such asand demonstrate its use in identifying portions of the fi genotypes are thus constructed to nonadaptive forces of evolu- designed by and for .in origin —Termsdegree distribution, like plug-ins, clustering coef cient,increases and motifs —fromthe edge-rewiringprokaryotes model, to control unicellular for certain networkCRN eukaryotes properties, with properties explainable by neutral evolutionary forces within the E. coli CRN. Our model captures the tightly coupled ge- such as the number of edges, and in- and out-degrees, to produce tion, we provide a simulation framework for generating data cor- nucleation kernels, input andnome output–interactome switches, of an organism capaci and enables- to analyses multicellular of how an “ acceptableeukaryotes,” null model. with One simple of the strengths autoregulatoryat of the our model system, is subsystem, and operon levels. We leverage our model evolutionary events acting at the genome level, such as mutation, that by incorporating well-studied and quantifiable information at responding to a null model of only neutral forces. We leveraged tance and hard wiring abound.and at the Alon population states level, suchthat as genetic it is drift, giveloops rise to being neutral morethe sequence common level, network and propertiesmulticomponent becomeagainst emergent loops prop- experimentally derived data from Escherichia coli. The patterns that we can quantify in CRNs. erties rather than control parameters. this framework to analyze and quantify emergent properties in an “…wondrous that the solutions found by evolution have being less commonOur analysis in microbes, reveals surprising although results. First,results the several existing subgraph of this analysis show statistically significant and substan- noncoding DNA | population6 genetics | ncDNA | binding sites types, such as the feed-forward loop, which were previously iden- Escherichia coli CRN. much in common with good engineering,” and Adami data are knowntified asto network be subject motifs, follow to nonadaptive numerous trends.tial Second,potential neutral using trends in properties previously identified as adaptive our model highlighted other subgraph types that seem to arise It is important to note that graph-theoretic techniques, such as major cellular process underlying the central dogma15–17 of mo- states that digital organisms allow “…researchers to biases . Moreover,unexpectedly it with is an high open frequency. question Third, as aas whole, to whether the E. coli fi lecular biology is cis-regulation. This process involves the in origin—, clustering coef cient, and motifs— A CRN follows neutral patterns, as reflected by the degree distri- the edge-rewiring model, control for certain network properties, binding of specialized proteins, called transcription factors (TFs), address fundamental questions about the genetic basis pathway complexitybution, the number is a of necessary edges, and clustering prerequisite coefficient properties for to binding sites, in non-protein–coding DNA (ncDNA) regions within the E. coli CRN. Our model captures the tightly coupled ge- such as the number of edges, and in- and out-degrees, to produce of the evolution of complexity,upstream genome of target genes.organization, The links between TFsthe and evolution their target that of are complex very similar tophenotypes, those emerging in or our modelwhether at both the system and the operon level. Fourth, if we discardnome the information–interactome of an organism and enables analyses of how binding7 sites form the cis-regulatory network (CRN) in the cell. an “acceptable” null model. One of the strengths of our model is robustness and evolvability.”Reconstructing a CRN from experimental data,genome elucidating architectures its on the variability of multicellular in promoter lengths species and use, instead, are simply a single length for all promoters (which is not supported byevolutionary empirical data), events acting at the genome level, such as mutation, fi dynamic and topological properties, and understanding how these fi that by incorporating well-studied and quanti able information at Is this reduction of the fieldproperties of evolution emerge during developmentto a sub and- evolutionmore are conducive major all results to changethe passive signi cantly. In summary, using of our network model, we established that nonadaptive forces, in combinationand with atE. the coli- population level, such as genetic drift, give rise to neutral endeavors in experimental and computational biology (1–5). fi the sequence level, network properties become emergent prop- discipline of engineering justified?The complexity For some, of CRNs, the coupled answer with observed connections.“unexpected” speci Givenc genomic the features, large could numbers explain much of of thetranscrip organization - of the E. coli CRN. 8,9 trends in their properties, such as scale-freeness (6), high degree patterns that we can quantify in CRNs. erties rather than control parameters. is far from a definitive yes . The population-level proc- tion factors– in most cells and their reliance on simple of clustering (7), and overrepresented subgraphs (3, 8 10), has Model led to several hypotheses of adaptive origins and explanations of Our analysis reveals surprising results. First, several subgraph esses and cellular limitations that dictate a genome’s binding sites Ourthat model are consists subject of operons to stochastic and transcription mutational factors, where CRNs and their properties. Central to most of these studies was noncoding DNA population genetics ncDNA binding sites types, such as the feed-forward loop, which were previously iden- the use of simplistic graph-theoretic models, such as randomly transcription factors are operons with additional binding-site | | | capacity for evolution are quiterewiring distinct the connectivity from of athe biological con network,- turnover, to serve as a null theremotif are information. many For plausible each operon, a nonzeromechanisms promoter length for is tified as network motifs, follow nonadaptive trends. Second, using Department of Biology, model for CRN connectivity maps and their properties (11). provided in base pair units, and for each transcription factor, its straints that govern the humanHowever, design it has of been non-biological shown that when subjecting the CRNs emergence to the binding-site of novel motifs intracellular are provided in International transactions Union of Pure by our model highlighted other subgraph types that seem to arise Indiana University, constructs10,11. In addition,various although neutral evolutionary there forcesis little and tracing effectively their trajectory in neutraland Applied processes Chemistry13,18–20 (IUPAC). code from RegulonDBmajor (17). cellular process underlying the central dogma of mo- Bloomington, time, many of these topological patterns may simply arise sponta- IUPAC code describes ambiguous sites in a sequence motif, where unexpectedly with high frequency. Third, as a whole, the E. coli question that adaptive exploitationneously due to theof forcesnetworks of mutation, has recombination, The gene dupli-followingeach IUPAC analyses character show may correspond the ease to one, Awith two,lecular three,which or biology is cis-regulation. This process involves the fl Indiana 47405, USA. cation, and genetic drift (10, 12). These studies call into question CRN follows neutral patterns, as re ected by the degree distri- e-mail: [email protected] occurred in numerous instancesarguments (for that example, were made in favorREF. of 12 adaptive), explanationscommonly for the observed features of genetic pathwaysbinding can of specialized proteins, called transcription factors (TFs), fi emergence and conservation of CRN properties (8, 13–15) and Author contributions: T.R. and L.N. designed research; T.R. and L.N. performed research; – bution, the number of edges, and clustering coef cient properties doi:10.1038/nrg2192 and the final products of pathwaysidentify important may almost parameters always that may signifiemergecantly affect without the T.R. contributed any direct new reagents/analytic selection tools; T.R. for analyzed such data;to and properties, T.R. binding and L.N. wrote sites, in non-protein coding DNA (ncDNA) regions evolution of CRNs from a neutral perspective. Specifically (12), the paper. that are very similar to those emerging in our model at both the fl upstream of target genes. The links between TFs and their target they highlighted the role that promoter length, binding-site size, The authors declare no con ict of interest. system and the operon level. Fourth, if we discard the information and population size may play in forming certain topological pat- This article is a PNAS Direct Submission. binding sites form the cis-regulatory network (CRN) in the cell. terns known as motifs. Nonetheless, a lingering question remains: Freely available online through the PNAS open access option. on the variability in promoter lengths and use, instead, a single fi 1 NATURE REVIEWS | GENETICS Which speci c parts of a CRN arise due to nonadaptive forces and, To whom correspondence VOLU may beM addressed.E 8 | E-mail:OCT [email protected] 2007 or nakhleh@ | 803 a CRN from experimental data, elucidating its moreover, can we quantify these patterns to allow statistical testing? rice.edu. length for all promoters (which is not supported by empirical data), To investigate this question, we developed a unique model This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10.dynamic and topological properties, and understanding how these fi of a CRN genotype that couples an individual’s CRN with its 1073/pnas.1217630110/-/DCSupplemental. properties emerge during development and evolution are major all results change signi cantly. In summary, using our model, we established that nonadaptive forces, in combination with E. coli- 7754–7759 | PNAS | May 7, 2013 | vol. 110 | no. 19 www.pnas.org/cgi/doi/10.1073/pnas.1217630110endeavors in experimental and computational biology (1–5). fi The complexity of CRNs, coupled with observed “unexpected” speci c genomic features, could explain much of the organization trends in their properties, such as scale-freeness (6), high degree of the E. coli CRN. – of clustering (7), and overrepresented subgraphs (3, 8 10), has Model led to several hypotheses of adaptive origins and explanations of CRNs and their properties. Central to most of these studies was Our model consists of operons and transcription factors, where the use of simplistic graph-theoretic models, such as randomly transcription factors are operons with additional binding-site rewiring the connectivity of a , to serve as a null motif information. For each operon, a nonzero promoter length is model for CRN connectivity maps and their properties (11). provided in base pair units, and for each transcription factor, its However, it has been shown that when subjecting CRNs to the binding-site motifs are provided in International Union of Pure various neutral evolutionary forces and tracing their trajectory in and Applied Chemistry (IUPAC) code from RegulonDB (17). time, many of these topological patterns may simply arise sponta- IUPAC code describes ambiguous sites in a sequence motif, where neously due to the forces of mutation, recombination, gene dupli- each IUPAC character may correspond to one, two, three, or cation, and genetic drift (10, 12). These studies call into question arguments that were made in favor of adaptive explanations for the emergence and conservation of CRN properties (8, 13–15) and Author contributions: T.R. and L.N. designed research; T.R. and L.N. performed research; identify important parameters that may significantly affect the T.R. contributed new reagents/analytic tools; T.R. analyzed data; and T.R. and L.N. wrote evolution of CRNs from a neutral perspective. Specifically (12), the paper. they highlighted the role that promoter length, binding-site size, The authors declare no conflict of interest. and population size may play in forming certain topological pat- This article is a PNAS Direct Submission. terns known as motifs. Nonetheless, a lingering question remains: Freely available online through the PNAS open access option. Which specific parts of a CRN arise due to nonadaptive forces and, 1To whom correspondence may be addressed. E-mail: [email protected] or nakhleh@ moreover, can we quantify these patterns to allow statistical testing? rice.edu. To investigate this question, we developed a unique model This article contains supporting information online at www.pnas.org/lookup/suppl/doi:10. of a CRN genotype that couples an individual’s CRN with its 1073/pnas.1217630110/-/DCSupplemental.

7754–7759 | PNAS | May 7, 2013 | vol. 110 | no. 19 www.pnas.org/cgi/doi/10.1073/pnas.1217630110 Efficient Sampling in Networks The Issue

✤ Identifying network motifs requires computing subgraph concentrations

✤ The number of subgraphs grows exponentially with their number of nodes

✤ Hence, exhaustive enumeration of all subgraphs and computing their concentrations are infeasible for large networks

✤ In this part, we describe mfinder, an efficient method for estimating subgraph concentrations and detecting network motifs Subgraph Concentrations

✤ Let Ni be the number of appearances of subgraphs of type i

✤ The concentration of n-node subgraphs of type i is the ratio between their number of appearances and the total number of n-node connected subgraphs in the network: Subgraphs Sampling

✤ The algorithm samples n-node subgraphs by picking random connected edges until a set of n nodes is reached Sampling Probability

To sample an n-node subgraph, an ordered set of n-1 edges is iteratively randomly picked. In order to compute the probability, P, of sampling the subgraph, we need to check all such possible ordered sets of n-1 edges [denoted as (n-1)-permutations] that could lead to sampling of the subgraph

The probability of sampling the subgraph is the sum of the probabilities of all such possible ordered sets of n-1 edges: Pr P = ! " [Ej = ej|(E1,...,Ej−1)=(e1,...,ej−1)] σ∈Sm Ej ∈σ

where Sm is the set of all (n-1)-permutations of the edges from the specific subgraph edges that

could lead to a sample of the subgraph. Ej is the j-th edges in a specific (n-1)-permutation (σ) Correction for Non-uniform Sampling

✤ Different probabilities of sampling different subgraphs

After each sample, a weighted score of W=1/P is added to the score of the relevant subgraph type Calculating the Concentrations of n-node Subgraphs

✤ Define score Si for each subgraph of type i

✤ Initialize Si to 0 for all i

✤ For every sample, add the weighted score W=1/P to the accumulated score Si of the relevant type i

✤ After ST samples, assuming we sampled L different subgraph types, calculate the estimated subgraph concentrations:

Accuracy Running Time Convergence How Many Samples Are Enough?

✤ It is a hard problem

✤ Further, the number of samples required for good estimation with a high probability is hard to approximate when the concentration distribution is not known a priori

✤ Alon and co-workers used an approach similar to adaptive sampling

✤ Let and be the vectors of estimated subgraphs concentration after the iterations i and i-1, respectively. The average instantaneous convergence rate is

and the maximal instantaneous convergence rate is

By setting the thresholds CGavg, CGmax and the value of Cmin, the required accuracy of the results and the minimum concentration of subgraphs can be adjusted Comparing the Local Structure of Networks ✤ To understand the design principles of complex networks, it is important to compare the local structure of networks from different fields

✤ The main difficulty is that these networks can be of vastly different sizes

✤ In this part, we introduce an approach for comparing network local structure based on the significance profile (SP) Significance Profile

• For each subgraph i, the statistical significance is described by the Z score: Nreali −⟨Nrandi⟩ Zi = std(Nrandi) where

Nreali is the number of times subgraph type i appears in the network ⟨Nrandi⟩ is the mean of its appearances in the randomized network ensemble

std(Nrandi) is the standard deviation of its appearances in the randomized network ensemble

• The SP is the vector of Z scores normalized to length 1: Z SP = i i 2 !"i Zi

The correlation coefficient matrix of the triad significance profiles for the directed networks on the previous slide The Subgraph Ratio Profile (SRP)

• When analyzing subgraphs (particularly 4-node subgraphs) in undirected graphs, the normalized Z scores of the subgraphs showed a significant dependence on the network size • In undirected networks, an alternative measure is the SRP Nreali −⟨Nrandi⟩ ∆i = Nreali + ⟨Nrandi⟩ + ε where ε ensures that |∆| is not misleadingly large when the subgraph appears very few times in both the real and random networks • The SRP is the vector of ∆ i scores normalized to length 1: ∆i SRPi = ∆2 !"i i

Motif Evolution Motif Conservation

✤ Wuchty et al. recently showed that in S. cerevisiae, proteins organized in cohesive patterns of interactions are conserved to a substantially higher degree than those that do not participate in such motifs.

✤ They found that the conservation of proteins in distinct topological motifs correlates with the interconnectedness and function of that motif and also depends on the structure of the overall interactome topology.

✤ These findings indicate that motifs may represent evolutionary conserved topological units of cellular networks molded in accordance with the specific biological function in which they participate. Experimental Setup

✤ Test the correlation between a protein’s evolutionary rate and the structure of the motif it is embedded in

✤ Hypothesis: if there is evolutionary pressure to maintain specific motifs, their components should be evolutionarily conserved and have identifiable orthologs in other organisms

✤ They studied the conservation of 678 S. cerevisiae proteins with an ortholog in each of five higher eukaryotes (Arabidopsis thaliana, C. elegans, D. melanogaster, Mus musculus, and Homo sapiens) from the InParanoid database

Convergent Evolution

✤ Convergent evolution is a potent indicator of optimal design

✤ Conant and Wagner recently showed that multiple types of trascriptional regulation circuitry in E. coli and S. cerevisiae have evolved independently and not by duplication of one or a few ancestral circuits (a) Two indicators of common ancestry for gene circuits. Each of n = 5 circuits of a given type (a feed-forward loop for illustration) is represented as a node in a circuit graph. Nodes are connected if they are derived from a common ancestor, that is, if all k pairs of genes in the two circuits are pairs of duplicate genes. A = 0 if no circuits share a common ancestor (the graph has n isolated vertices); A 1 if all circuits share one common ancestor (the graph is fully connected). The number C of connected components indicates the number of common ancestors (two in the middle panel) from which the n circuits derive. Fmax is the size of the largest family of circuits with a single common ancestor (the graph's largest ). (b) Little common ancestry in six circuit types. We considered two circuits to be related by common ancestry if each pair of genes at corresponding positions in the circuit had significant sequence similarity. Each row of the table shows values of C, A and Fmax for a given circuit type, followed in parentheses by their average values standard deviations and P values

A = 1-(C/n) n: number of circuits (nodes in the graph) C: number of components in the graph

Fmax is size of largest family

A Textbook Focused on Network Motifs

✤ “An Introduction to Systems Biology: Design Principles of Biological Circuits”, by Uri Alon, Chapman & Hall/CRC, 2007. Acknowledgments

✤ Materials in this lecture are mostly based on:

✤ “Superfamilies of evolved and designed networks”, by Milo et al.

✤ “Network motifs: simple building blocks of complex networks”, by Milo et al.

✤ A comment on the above two by Artzy-Randrup et al.

✤ “Network motifs in the transcriptional regulation network of Escherichia coli”, by Shen-Orr et al.

✤ “Efficient sampling algorithm for estimating subgraph concentrations and detecting network motifs”, by Kashtan et al.

✤ “Convergent evolution of gene circuits”, by Conant and Wagner.

✤ “Evolutionary conservation of motif constituents in the yeast protein interaction network”, by Wuchty et al.