<<

4

Chapter 1 Chapter 2 Chapter 3 Chapter 4 Chapter 5 Chapter 6

Quorum Chemo- Descriptor Spial sensing relationships

Conclusions Introduction and perspectives

Atomic Pathway Cellular level level level level

Chapter 4: Chemogenomics 4-1

Contents of chapter 4

Contents of chapter 4...... 1 Summary of chapter 4...... 2 Introduction ...... 2 Chemogenomic experiments...... 4 Heterozygous deletions ...... 5 Homozygous deletions...... 5 Overexpression...... 6 Beyond deletion and overexpression libraries ...... 7 Detection methods...... 7 Non-competitive arrays...... 7 Competitive mutant pools ...... 7 Data interpretation and analysis...... 8 Clustering...... 8 Matrix operations ...... 9 Applications ...... 10 Chemogenomic networks...... 12 Results and discussion...... 13 Interpretation of a chemogenomic network...... 14 Properties of the chemogenomic network...... 15 Materials and methods ...... 16 Data acquisition ...... 16 Construction of the chemogenomic network...... 16 Computation of network parameters and overlap with other networks ...... 17 Linking the chemical and biological properties of small molecules ...... 17 Results and discussion...... 18 What differentiates bioactive small molecules from others? ...... 18 What differentiates small molecules that hit S. cerevisiae from those that hit Sc. pombe?...... 20 Conclusions ...... 22 Materials and Methods ...... 22 References...... 23

Chapter 4: Chemogenomics 4-2

Summary of chapter 4

A robust knowledge of the interactions between small molecules and specific proteins aids the development of new biotechnological tools, the identification of targets, and the gain of specific biological insights. Such knowledge can be obtained through chemogenomic screens. In these screens, each from a is applied to each cell type from a library of cells, and the resulting are recorded. Chemogenomic screens have recently become very common and will continue to generate large amounts of data. Therefore methods to investigate these data become important. In this chapter, I develop computational methods for the analysis of chemogenomic data. For this, I focus on the fungus Saccharomyces cerevisiae, and, to a lesser extent, Schizosaccharomyces pombe.

Parts of this chapter appear in the following articles:

Wuster, A. and M. Madan Babu (2008). “Chemogenomics and biotechnology.” Trends Biotechnol 26(5): 252-8.

Wuster, A. (2008). “Networking with .” Mol BioSyst 4(1): 14-7

The section on linking the chemical and biological properties of SMs would not have been possible without the help of Laura Schuresko-Kapitzky and Nevan Krogan of the University of California, San Francisco, who generated much of the data this section is based on.

Introduction

The effects of small molecules (SMs) on cells were central to the research of Paul Ehrlich (1854-1915; figure 4-1). For much of his career he strived to identify 'magic bullets', SMs that would allow him to target specific tissues or microbes whilst sparing others (Ehrlich 1911). In order to find a cure for syphilis, he systematically screened a library of hundreds of SMs for their effect against Treponema pallidum, the causative agent of the disease. After testing 605 different compounds, he eventually identified arsphenamine, which he later marketed as Salvarsan 606 (Gensini, Conti et al. 2007). With this strategy, he initiated a whole new approach to which has persisted until today (Lipinski and Hopkins 2004). The aim of Paul Ehrlich's chemical screen was to find an SM that was active against a single pathogen. The data Chapter 4: Chemogenomics 4-3 structure created by his screen was one-dimensional and can be encoded by a single vector of the 606 compounds tested. Nowadays, chemical screens are more complex and wider in scope. In this chapter, I mostly work with the data obtained through two- dimensional chemogenomic screens. As in Paul Ehrlich's screen, the first dimension in such screens is a chemical library. The second dimension is a library of different cell types. For the data I work with in this chapter, the cell types are well-defined mutants in a library of S. cerevisiae deletion strains, where in each strain a different gene has been deleted. Alternatively, the cell types could be defined in other ways, such as in a library of cancer cell lines or of meiotic recombinants (Perlstein, Deeds et al. 2007; Perlstein, Ruderfer et al. 2007). The resulting data structure is a two- dimensional matrix where each data point has two coordinates and one specific associated value. The chemical coordinate specifies the SM that was applied, whilst the genetic coordinate specifies the cell type. The value of each data point is a measurement of the of interest, such as viability, growth rate, or cell size and shape. The results of a large number of chemogenomic screens with different designs have been published (table 4-B). They all have in common the data structure described above, but the experimental designs and aims, such as the identification of cellular targets of SMs, or the characterisation of cellular pathways, vary between them.

In this chapter, I describe the different ways in which chemogenomic data can be created, and then introduce the methods that I have developed to extract information from this data. For this purpose, I have divided the results of this chapter into two parts. The first part, on chemogenomic networks, takes a gene-centric view, whilst the second part, on linking the chemical and biological properties of SMs, takes an SM-centric view. I have only tested the methods in this chapter on S. cerevisiae, and, to a lesser extent, on Sc. pombe. Nevertheless, they can potentially also be applied to other systems, such as humans.

Figure 4-1. Paul Ehrlich (1854-1915) on a former D-Mark banknote. The molecular structure to the left of the portrait symbolises arsphenamine. The six-membered ring in the centre consists of arsenic atoms. It was not shown until 16 years after the issue of the banknote that arsphenamine is actually a mixture of compounds with three- and five-membered arsenic rings (Lloyd, Morgan et al. 2005). Source of the image: http://tinyurl.com/l3r9np Chapter 4: Chemogenomics 4-4

Chemogenomic experiments

An in-depth understanding of the subtly different ways in which chemogenomic experiments are designed is necessary in order to develop the appropriate methods to extract meaningful information from them. Although for all methods of carrying out chemogenomic screens the resulting data structure is similar, its interpretation depends on the design of the experiment. For yeast, there are at least three different types of mutant libraries that can be generated, such as heterozygous deletions, homozygous deletions, and overexpression libraries (table 4-1).

Screen A. Homozygous deletants B. Heterozygous C. Overexpression design → deletants

↓ Detection

Non- competitive (Baetz, McHardy et al. 2004) arrays 1 SM > 5000 mutants aim: determine (Luesch, Wu et al. 2005) (Dudley, Janse et al. 2005) for drugs 1’000 SMs 21 conditions 7’296 mutants 4’710 mutants (Chang, Bellaoui et al. 2002) aim: identify protein targets for aim: determine gene function 1 SM SMs well or 5’000 mutants agar aim: determine genes with plate certain function

(Parsons, Lopez et al. 2006) 82 SMs Competitive 4’111 mutants aim: determine mode of action barcoded for drugs pools (Butcher, Bhullar et al. 2006) (Hillenmeyer, Fung et al. 2008) (Giaever, Flaherty et al. 2004) 2 SMs 178 SMs and conditions 10 SMs 3’929 mutants 5’000 mutants 6’000 mutants aim: identify protein targets for aim: determine gene function aim: determine functional SMs interactions of SMs

(Hillenmeyer, Fung et al. 2008) 354 SMs and conditions 6’000 mutants aim: determine gene function

Table 4-1. This table shows examples of chemogenomic screens carried out in the recent past. The columns correspond to three different types of mutant libraries (screen design). Expression of a gene product can either be avoided altogether by deleting the gene altogether (homozygous deletion; empty red arrows), or expression can be reduced by deleting the gene on only one chromosome (heterozygous deletion; filled and empty red arrows), or expression can be increased by introducing extra copies of the gene into the cell (overexpression; multiple red arrows). The rows of the table correspond to two different detection methods that can be used to observe the phenotype of the mutants when treated with SMs. The SM can be added to a well or agar plate where each mutant occupies a separate well or spot on the agar. The effect is then observed directly (non-competitive arrays). Alternatively, the barcoded mutants can all be placed into a flask containing an SM. The effect on the growth of each individual mutant is then observed by measuring the abundance of the different mutants using microarrays whose probes are complementary to the barcodes of the mutants. The intensities of specific probes are then thought to be proportional to the abundance of the mutant strains in the pool (competitive barcoded pools). Chapter 4: Chemogenomics 4-5

In the following sections I discuss how each of these library types can be used to generate chemogenomic data. For each mutant library, the measured phenotype refers to changes in growth rate as a proxy for viability or fitness. The measured phenotypes can be caused by a variety of molecular interactions between proteins and SMs, whose interpretation depends on the choice of experimental setup.

Heterozygous deletions

This library type consists of diploid yeast cells that have two copies of each chromosome. Almost every single one of the 6’500 yeast genes can be deleted independently, as long as the deletion is only on one of the two chromosomes of a chromosome pair (heterozygous) (Deutschbauer, Jaramillo et al. 2005). In such a library, even cells with a deletion in an essential gene will normally survive as they still have another copy of the gene on the second chromosome. Data from a heterozygous deletion library and a homozygous deletion library, which is described in the next section, need to be interpreted in different ways (Baetz, McHardy et al. 2004; Giaever, Flaherty et al. 2004; Parsons, Lopez et al. 2006). The rationale behind screening against a heterozygous deletion library is that a cell that has only one copy of a particular gene produces less of the gene product. Such a cell is rendered more sensitive to an SM that is capable of disrupting the function of this gene product than a wildtype cell. This effect is also referred to as haploinsufficiency, and the method has also been described as haploinsufficiency profiling (Giaever, Flaherty et al. 2004). Data points obtained from such a screen that show decreased viability might therefore directly point to an interaction between a chemical compound and a gene product.

Homozygous deletions

This type of library consists of yeast cells in which each non-essential gene is deleted from both chromosomes of a diploid chromosome pair. For obvious reasons, in such a homozygous deletion library only non-essential genes can be deleted as the deletion of any of the approximately 1000 essential yeast genes will by definition be lethal (Deutschbauer, Jaramillo et al. 2005). The interpretation of such a chemogenomics screen is more complex for a homozygous deletion library than for a heterozygous deletion library. Clearly, an SM cannot act on a gene product after the respective gene has been knocked out. Therefore, if a homozygous deletion strain is sensitive to a particular SM, this can be interpreted in different ways. One possibility is that the deleted gene may be different from the gene targeted by the SM but has a similar function. In the absence of SMs, disruption of one of the two genes is Chapter 4: Chemogenomics 4-6 expected to have little effect as the second gene provides a back-up function. However, the disruption of the function of both genes, the one by the SM and the other by deletion, might result in decreased viability. This has been verified in a study which compared the effect of knocking out one gene and targeting the other with an SM with the effect of knocking out both genes, a method which is also referred to as a synthetic genetic array (SGA) (Parsons, Brost et al. 2004). This study considered the biosynthesis gene ERG11, a target of the SM fluconazole, as an example. There are 27 genes which cause synthetic lethality when mutated together with ERG11. 13 of these 27 mutants also cause lethality when ERG11 is not mutated but when fluconazole is added. Comparing the lethality profile of other SMs to the synthetic lethal profiles of their targets, it was found that SGA profiles generally tend to have a significant overlap with the chemogenomic profiles (Parsons, Brost et al. 2004). This shows that the effect of deleting ERG11 is similar to the effect of disrupting its function with fluconazole. In an approach which is complementary to the homozygous deletion library described above, it would be possible to screen for deletions which confer resistance to a compound whose addition would be deleterious to the wildtype strain. An example of this would be if a compound which is not normally toxic to the cell becomes toxic due to the action of a cellular modifying it. In this case, deletion of the genes encoding the modifying enzyme would confer resistance to the cell. For example, azidothymine becomes active against DNA polymerases and reverse transcriptases only after it has undergone several cycles of phosphorylation (Brenner 2004). Mutating the genes responsible for this phosphorylation could potentially decrease the sensitivity of the cell towards azidothymine. In this way, genes that are responsible for causing side effects to certain drugs could be identified.

Overexpression

A third approach investigates the effect of SMs under conditions of increased gene dosage, which can be amended by increasing gene copy number (Butcher, Bhullar et al. 2006; Luesch 2006). One such study (Luesch, Wu et al. 2005) used an overexpression library to screen for genes which confer increased resistance to SMs when multiple copies of a gene, and therefore presumably higher level of the gene product, are present. In this way it was possible to show, amongst other things, that the breast cancer therapeutic tamoxifen disrupts calcium homeostasis (Luesch, Wu et al. 2005). Chapter 4: Chemogenomics 4-7

Beyond deletion and overexpression libraries

Alternative approaches for generating libraries of different cell types include the exploration of the relationship of essential genes with SMs via titrable promoter alleles (Mnaimneh, Davierwala et al. 2004), and Decreased Abundance by mRNA Perturbation (DAmP). DAmP enables fine-tuning of protein concentration by disrupting the 3′ untranslated region of a gene by insertion of an antibiotic resistance marker, which greatly destabilises the corresponding mRNA (Muhlrad and Parker 1999). DAmP leads to proteins that are synthesised at substantially lower levels than in the wildtype but under their natural transcriptional regulation. DAmP can therefore be used to gain insights into the deleterious and alleviating effects SMs have at differential protein levels (Schuldiner, Collins et al. 2005).

Detection methods

Chemogenomic screens can be carried out in two fundamentally different designs: Non-competitive arrays and competitive mutant pools (table 4-1).

Non-competitive arrays

In this design, a set of arrays is constructed that contain each of the deletion or overexpression strains separate from the others, so that they can be observed individually. This can either be achieved by locating each strain into a separate well of a well plate, or by spotting them onto an agar surface. After each compound from the chemical library is added, changes in growth can be measured independently for each strain and each SM. Since each mutant strain is physically isolated, strains do not interact or compete for resources in this design.

Competitive mutant pools

In this design, each mutant has a unique DNA sequence also referred to as a bar code. All bar-coded mutant strains are initially grown together in the same flask. This mutant pool is then divided into multiple flasks and each compound from the chemical library is added to a different flask. In each flask, the different strains compete for resources and fitter strains in the presence of the SM increase in concentration. For analysis, the genetic material of the contents of the flasks is hybridised to microarrays, which contain probes that are complementary to the bar codes of the individual deletion strains. The observed differential intensities of the probes correspond to the differential concentration and therefore the differential viability of the respective strains in the presence of the SM. Alternatively, the differential concentration of the strains could be established using high-throughput Chapter 4: Chemogenomics 4-8 sequencing. However, in this screen design, the intrinsically different growth rates of mutants compared to wild type cells need to be taken into account when interpreting the obtained data. It is therefore important that the data are normalised by the growth rate of each mutant without any SM treatment.

It has been observed that the results of the array-based design and the pool-based design overlap but differ (Giaever, Shoemaker et al. 1999). A possible explanation for this is interactions between strains in mutant pools. For example, certain deletion strains could accumulate intermediates in a metabolic pathway from which an enzyme has been deleted, and these intermediates, when released, might interfere with the viability of other strains in the same flask. Alternatively, yeast strains grown in flasks might behave differently from yeast strains grown in arrays due to the selection pressures of interstrain competition for resources, which is not a limitation in the array-based detection method.

Data interpretation and analysis

To understand and interpret chemogenomic data is not a trivial task, as the observed phenotypic effects can have many causes and can also be indirect. Here I discuss the most common methods to interpret chemogenomic data, before introducing my own methods later in this chapter.

Clustering

Application of clustering algorithms is a standard way of analysing multidimensional data. In chemogenomic screens, data clustering can be applied to both the chemical and the genetic dimension. A cluster in the chemical dimension will result in groups of SMs that have similar effects on the different cell types (table 4-2). It has been shown that clusters in the chemical dimension often contain SMs which are chemically similar (Giaever, Flaherty et al. 2004). Therefore, the action of untested SMs could be predicted based on similarity to tested SMs from previous screens. Accordingly, a cluster in the genetic dimension groups cell types that are affected by the SMs in a similar way. Such a cell type cluster can then be expected to be enriched for mutated genes (deletion or overexpression) that have a certain function in common. In order to identify such clusters, a number of supervised or unsupervised algorithms that are also applied to microarray data have been described. They include hierarchical clustering, k-means, and self-organising maps (Madan Babu 2004). Alternatively, biclustering, an approach that analyses both dimensions at the same time, can be employed (Cheng and Church 2000; Madeira Chapter 4: Chemogenomics 4-9 and Oliveira 2004). The biclusters obtained in this way contain a subset of mutated genes which interact similarly with a subset of SMs, and vice versa. The aim of biclustering is to sort both dimensions simultaneously in such a way that functionally significant blocks can be identified amongst the overall pattern (figure 4-2C). Another important advantage of biclustering is that it can identify mutated genes or SMs that are associated with more than one cluster. This is a significant improvement over simpler clustering procedures since many genes may have more than one function. Biclustering approaches have successfully been applied to chemogenomic data (Dudley, Janse et al. 2005; Parsons, Lopez et al. 2006). Non-negative or Probabilistic sparse matrix factorisation (Greene, Cagney et al. 2008) are alternative algorithms to biclustering and is has been shown that in some cases they are more sensitive than hierarchical clustering. For example, the SMs verrucarin and neomycin sulphate are clustered together by matrix factorisation but not by hierarchical clustering (Parsons, Lopez et al. 2006). A detailed review of the various clustering methods and how they are implemented, can be found in references (D'Haeseleer 2005; Xu and Wunsch 2005).

le 1 le 2 le 3 u u ec ol olecu ll m m a sm smallsmall molec. . small molecule m ∆ gene 1 ∆ gene 2 ∆ gene 3 . . . ∆ gene n

A. Data structure B. One-dimensional C. Biclustering clustering Figure 4-2. The chemogenomic data structure and clustering methods. (A) A hypothetical data structure obtained from a chemogenomic screen. One dimension (columns) refers to the SMs added, whilst the other dimension (rows) refers to cell type, in this case defined deletion (∆) mutants. The values, here represented as white, grey and black boxes, refer to the growth or viability of the specific mutant when treated with a specific SM. (B) The columns of the matrix (SMs) have been clustered according to the similarity of their effect on different cell types. (C) Rows and columns of the matrix have been biclustered (see text). The resulting biclusters are highlighted in green.

Matrix operations

Although clustering is a helpful tool for the interpretation of chemogenomic data, it is not the only available method. Weinstein and coworkers (Scherf, Ross et al. 2000) have taken an interesting approach by integrating two different two-dimensional matrices obtained from a chemogenomics screen and this strategy could also prove valuable for the interpretation of other datasets. Their first matrix had SMs as its first dimension, cell lines as its second dimension, and the associated values referred to Chapter 4: Chemogenomics 4-10 differential growth. In a second matrix, genes were represented in the first dimension, cell lines in the second dimension, and the associated values referred to transcript levels as measured by microarrays. The common dimension in both matrices was the cell lines. The authors then correlated the expression of each gene in each cell line with its response to each SM and thereby obtained a third two-dimensional matrix, which has SMs as one dimension and genes as the other dimension. From this matrix, the authors could identify the genes that are expressed at a differential level in cell lines that have differential sensitivity to a specific SM. In this way, they could show how the transcript levels of particular genes relate to sensitivity to specific SMs.

Matrices can in many cases also be visualised and interpreted in networks. SM-gene networks have also been used to visualise and analyse chemogenomic-like datasets (Haggarty, Koeller et al. 2003; Sharom, Bellows et al. 2004; Lamb, Crawford et al. 2006; Ma'ayan, Jenkins et al. 2007; Yildirim, Goh et al. 2007) (figure 4-3). One conclusion that has been possible from the analysis of such drug-target networks is that there has been a trend for more rational over the last decade (Yildirim, Goh et al. 2007; Wuster 2008). Later in this chapter I introduce the concept of chemogenomic networks, which make use of the idea that matrices can be visualised and interpreted as networks.

target protein small molecule (SM)

target protein SM- SM network target network network

Figure 4-3. A bipartite network of drugs and their target proteins (drug-target network; centre panel) contains the information to derive both the target protein network and the drug network (Yildirim, Goh et al. 2007). In the target protein network (left panel) two proteins are linked if they are targeted by the same SM. In the drug network (right panel) two drugs are linked if they target the same protein. The giant component of each network is highlighted in light green.

Applications

The immediate purpose of a chemogenomic screen is to characterise the effect a set of SMs has at the level of genes or proteins. From a biotechnological point of view, such chemogenomics data can allow for the identification of proteins as novel drug targets (Bredel and Jacoby 2004). In a screen of SMs against a library of heterozygous yeast deletions, the gene products that interact with each of the SMs can be identified. For example, in a chemogenomic screen of mammalian cell lines Chapter 4: Chemogenomics 4-11

STAT3 could be identified as a potential target for apratoxin A, an SM that is cytotoxic against tumour cells (Luesch, Chanda et al. 2006). Many yeast genes have orthologues in humans (Hohmann 2005), which raises the possibility that interactions with SMs might also be conserved and that orthologues could be potential drug targets or might aid in gaining an understanding of the side effects of drugs at the molecular level (Luesch 2006). Nevertheless, it is not clear how well the responses to SMs are conserved between distantly related organisms. I address this question later in this chapter. Rapamycin is an example of an SM for which it is known that the molecular is conserved between S. cerevisiae and humans (Butcher, Bhullar et al. 2006). This is of particular interest as the majority of drugs are SMs (Lipinski and Hopkins 2004).

SMs that are found to be active against specific proteins could also serve as alternatives to genetic methods. In reverse , in order to disrupt the function of a protein, the DNA sequence that encodes the protein is mutated. Although this approach has generated a large variety of useful data, it has some drawbacks. First of all, mutagenesis can be difficult to implement in higher animals such as mammals. Furthermore, mutagenesis strategies do not exhibit a large degree of spatial and temporal flexibility, since it is difficult to mutate a gene for only a transient period of time or in a limited and clearly defined set of cells. An additional problem is that in multifunctional proteins it can be difficult to disrupt only one protein function whilst leaving others intact. Chemical genetics has the potential to overcome these problems (Stockwell 2000; Stockwell 2004; Spring 2005; Florian, Hümmer et al. 2007). The ultimate goal of chemical genetics is to identify SMs that are able to modulate the function of as many proteins in a cell as possible, either by simulating a gain-of-function or of a loss-of-function mutation. It has been estimated that between 8 and 14 percent of proteins in eukaryotic are 'druggable' in this way (Hopkins and Groom 2002). Chemical genetics can be more easily employed in mammalian cells than mutagenesis strategies and in a manner that is specific to a set of clearly defined cells, or during a clearly defined time window by adding and removing the compounds used in the screen. Furthermore, in a multifunctional protein potentially only one specific function could be modulated without affecting the others.

The potential of chemical genetics for drug development is well recognised. Recent applications include the targeting of cancer cell lines that carry mutations in cancer pathways with specific SMs (Dolma, Lessnick et al. 2003; Rognan 2007) or the use Chapter 4: Chemogenomics 4-12 of SMs to direct differentiation of stem cells (Huangfu, Maehr et al. 2008; Borowiak, Maehr et al. 2009; Chen, Borowiak et al. 2009). However, other areas of biotechnology, including green biotechnology might also benefit from chemical genetics. Crop plants could be improved with the use of SMs that target specific proteins, for example those involved in drought resistance pathways (Raikhel and Pirrung 2005). Since such SMs can be applied in a manner similar to pesticides, the introduction of genetically modified crops that is often associated with environmental, legal, or technical difficulties, could be avoided.

Besides these practical applications, chemogenomics data can also offer fundamental biological insights. Grouping genes according to their chemogenomic profiles can lead to the identification of clusters that are enriched for functional categories as defined by the Gene Ontology (GO) database (Ashburner, Ball et al. 2000). If a gene of previously unknown function is found in such a cluster, it is likely that it has a similar function to the other genes in the same cluster. Chemogenomics could therefore contribute to the determination of the function of genes whose function was previously unknown (Dudley, Janse et al. 2005). Moreover, it might even be possible to define novel functional classes of genes. More specifically, if an SM causes decreased viability when a set of genes is deleted, this set of genes can be interpreted as being necessary for conferring resistance to the SM. Therefore, these genes might have a previously unknown function in common. For example, deletion of members of the protein complexes SAGA, Swi/Snf and Ino80 each caused sensitivity to cadmium, cycloheximide, hydroxyurea, and glycerol (Dudley, Janse et al. 2005). Therefore, it has been assumed that these complexes share a function, which is to regulate genes that are required in the presence of these SMs.

Chemogenomic networks

Understanding and interpreting the data obtained from a chemogenomic screen is not trivial, as often the observed effects are not direct. However, in most cases it is true that if a defined mutant is less viable in the presence of an SM than in its absence, then the gene that has been mutated has some importance for resistance to the SM.

This makes a novel method for interpreting chemogenomic data possible that involves the construction of chemogenomic networks (figure 4-4). Nodes in chemogenomic networks are genes. Two genes are linked by an edge if they have one or more SMs in common that cause decreased viability when either of the two Chapter 4: Chemogenomics 4-13 genes is deleted. Because two genes can be deleterious when deleted in the presence of more than one different SM, edges in a chemogenomic network have a weight, which is an integer greater or equal to one. For example, if the edge connecting two genes has a weight of four, this means that there are four different SMs that cause a decrease in viability if either of the genes is deleted.

Although it is often assumed that drugs that have the highest selectivity for the protein they bind are the most effective, this may not be the case (Hopkins, Mason et al. 2006; Hopkins 2008; Nobeli, Favia et al. 2009). Instead, many of the most effective drugs bind multiple proteins instead of single targets. Chemogenomic networks may provide a way of rationally identifying clusters of gene products that are targeted by the same SMs. If such a cluster is implicated in a certain disease, SMs that target all or most of the genes of the cluster will be strong candidates for further investigation, although they may not have been identified as being candidates when only considering one SM-protein interaction at a time.

small 1 2 3 1 2 3 1 2 3 Chemo- molecules SMs 123 genomic A network B C A strain F A F A F A F D library B E B E B E B E E strain library strain F C D C D C D C D

Figure 4-4. As any network, chemogenomic networks consist of nodes and edges that connect the nodes. In chemogenomic networks, the nodes are genes or strains whilst the edges are relationships between these genes. Two genes are connected by an edge if they are sensitive to the same SM (SM). An edge is therefore an SM that causes a decrease in viability if either of the two genes it connects is deleted. Therefore, the edges in a chemogenomic network are weighted and undirected. This figure shows how a chemogenomic network is constructed. Starting with the chemogenomic data structure (panel 1; decreases in viability are symbolised by dark green), each SM is considered in turn. If the SM causes decreased viability in more than two strains, the edge weight between the two strains is increased by one.

Results and discussion

The properties of a chemogenomic network are dependent on the dataset that is used to generate it. In a chemogenomic dataset that uses twenty different SMs, the maximum possible weight of an edge is twenty. Furthermore, the network constructed from such a dataset can be expected to have a lower graph density (the proportion of all possible edges that is actually observed) than a network constructed from a dataset which uses one thousand different SMs. For comparing chemogenomic networks, it is therefore important to keep these issues in mind and to counteract them, for example by filtering the network by only including edges above a threshold weight. Chapter 4: Chemogenomics 4-14

Interpretation of a chemogenomic network

Two genes are linked by an edge in a chemogenomic network if both are needed for resistance to the same SM (see Materials and Methods). This suggests that the two genes have complementary functions. Figure 4-5 shows a chemogenomic network that has been generated using previously published data (Parsons, Lopez et al. 2006). In the figure, clusters of genes with a grey background are highlighted because they share common functions: ARN1 and ARN2 both are transporters that are specific to siderophore-iron chelates; RIP1 is cytochrome-c reductase whilst COX9 is a cytochrome-c oxidase; and DUG2, APE3, and PCA1 all are peptidases or hydrolases (http://www.yeastgenome.org/; last accessed 20 July 2009). The function of open reading frame YBR284W is unknown, but due to its association with three hydrolases in the chemogenomic network it is plausible to assume that it is one as well. It is also interesting to note that all four genes of this cluster are encoded within 15 open reading frames of each other on chromosome II. This demonstrates that edges in chemogenomic networks can be used to identify functional similarities between genes.

SHU1

ARN2

ARN1 SOP4 YIL077C UNG1

NUC1 DAL5 PIN4 VAM3 RPL12B GMH1 YAR029W SNX3 POR1 YIL092W YBR246W YPT7 VAM10 SPO22 YIR016W ECM34 APL5 ADA2 END3 WSC4 SKT5 YCR045C LUG1 PAN3 VAM7 RPS8A CHS5 RIM21 VPS24 MLP1 PEP7 YGR122W PEP8 VPS9 YKL023W SUR4 SNF8 MCH5 CAF40 APQ12 MIG1 RBK1 OCA5 LSM1 VPS27 ATG15 TOM5 VPS30 SRN2 BUD25 PAT1 ZAP1

WHI3 MDM20 VPS52 DUG2 APE3 LAS21 YPL105C TEF4 SMI1 YPT6 YBR284W URE2 VPS21

CDC73 PHO86

PCA1 REG1 GOS1 BEM1

VPS71 FTR1 COG6 PET130 YPL158C

YDJ1 SWA2 MOG1 CTR1 SHE4 RIP1 GCN5 COX9

FET3

Figure 4-5. Chemogenomic network filtered for edges with a weight of 20 or more. That is to say, two genes are only connected if they are important for resistance to the same 20 or more SMs. Only the largest connected component of the network is shown. The clusters of genes with a grey background are highlighted because it is known that they also share common functions. Chapter 4: Chemogenomics 4-15

Properties of the chemogenomic network

To ascertain the overlap between two networks, such as the chemogenomic network and the protein-protein interaction network, I first created a list of nodes which appear in both networks. Next, I went through both networks and retained only those edges which connected two nodes which were both on the list. The overlap between the two networks was then measured by counting the number of edges they had in common. To assess the significance of this result, I generated a null distribution of overlapping edge counts by permutating the node labels of the graphs. The Z score of the observed overlapping edge count relative to this distribution is given below (table 4-2).

min edge weight ≥ 1 ≥ 2 ≥ 5 edge count 2’084’597 817’683 149’374 connected node count 3’189 2’497 1’391 graph density 0.41 0.26 0.16 min degree 44 2 1 mean degree 1’307.367 654.93 214.77 max degree 2’870 2’169 1’046 min weight 1 2 5 mean weight 1.916 3.336 6.977 max weight 29 29 29 clustering coefficient 0.67 0.58 0.54 protein interaction edge overlap – absolute 4’870 2’579 819 – Z score 5.43 8.89 13.30 transcriptional interaction edge overlap – absolute 123’155 47’087 7’615 – Z score 1.58 -0.30 -1.46

Table 4-2. Parameters of the chemogenomic network at different minimum edge weights. The edge weight refers to how many SMs both nodes connect by the edge interact with. Whilst it is clear that there is a larger than expected overlap between the chemogenomic network and the protein interaction network, no such effect can be seen for the transcription interaction network, in which two genes are connected if they are regulated by the same transcription factor (target gene co-regulatory network).

In many cases a decrease in growth rate or fitness in a certain combination of SM and heterozygous deletant will not be indicative of a direct and physical interaction between the SM and the gene whose dosage has been increased. The gene may instead encode a protein that compensates for the direct effect of the SM (Venancio, Balaji and Aravind 2009; Mol BioSyst (in print)). The extent to which such “buffer genes” contribute to the interactions observed in chemogenomic data is not known. As a widespread occurrence of this compensation effect could severely skew the interpretation of chemogenomic data, more research in this area is needed.

In the chemogenomic networks described above, genes that respond to the same SMs are connected. Alternatively, one can envisage a network that has SMs as nodes. These nodes are connected if they interact with the same gene (figure 4-3). In such a way, it may be possible to identify clusters of SMs with similar biological activities. In the next section (Linking the chemical and biological properties of SMs), Chapter 4: Chemogenomics 4-16

I approach this problem from a different angle by asking if SMs that have similar chemical properties also have similar biological properties.

Materials and methods

Data acquisition

The chemogenomic data from which the chemogenomic network used in this analysis was constructed was published by Parsons et al. (Parsons, Lopez et al. 2006). The dataset describes the sensitivity of a haploid S. cerevisiae mutant library to 82 SMs.

The protein-protein interaction network was obtained from the BioGRID database (Stark, Breitkreutz et al. 2006), which combines information on protein-protein interaction from various sources and types of experiments. The transcription regulatory network was based on ChIP-chip data (Lee, Rinaldi et al. 2002; Harbison, Gordon et al. 2004; Balaji, Madan Babu et al. 2006), which identifies direct in vivo DNA binding events for transcription factors. According to this network, 144 transcription factors regulate 4’347 target genes via 12’230 regulatory interactions. In order to make the transcription regulatory network comparable to the chemogenomic network, it was processed in such a way that target genes that are regulated by the same transcription factor are connected by edges. The resulting network is, strictly speaking, not a transcription regulatory network, but a target gene co-regulatory network.

Construction of the chemogenomic network

The chemogenomic network is constructed by collapsing a bipartite network with two types of nodes into a simple network with only one type of node. In the initial bipartite network, the two types of nodes are SMs and genes or deletion strains. In the resulting simple network, genes are the only type of node. In this network, two genes are connected by an edge if they have one or more SMs they interacted with in the initial network in common. The edge weight corresponds to the number of those common SMs. The chemogenomic network can be constructed by considering one SM at a time, and connecting all genes that interact with it to each other (figure 4-4). This means that the chemogenomic network will be densely connected if SMs tend to interact with many genes in the initial bipartite graph. Therefore, it can be useful to filter edges that have a low weight (that is to say, that symbolise interaction of genes with only a few common SMs). Chapter 4: Chemogenomics 4-17

Computation of network parameters and overlap with other networks

Network parameters were computed using the statistical language R and the package igraph (http://cneurocvs.rmki.kfki.hu/igraph/). Overlap with the protein- protein interaction network and the transcription regulatory network was assessed using in-house Perl scripts. For this, all nodes that do not appear in both networks to be compared were deleted. Then, the number of common edges between the networks was counted. The significance of the overlap between networks was assessed by randomising the networks by permutating the node labels and repeating the analysis 1’000 times.

Linking the chemical and biological properties of small molecules

Not all sections of chemical space are equally likely to contain SMs that have a certain . This is the basic assumption of quantitative structure- activity relationships (QSAR), which describes how chemical structure is associated with biological activity. The central tenet of QSAR is that similar SMs bind similar proteins (Klabunde 2007). This insight enables us to define areas of chemical space that are occupied by SMs with specific affinity for certain protein families.

By focussing on those areas of chemical space that contain SMs with a biological activity of interest, it is possible to make the process of drug lead identification more rapid and efficient. By analysing the chemical properties of approved drugs, Lipinski's rule of five was discovered (Lipinski, Lombardo et al. 2001). This “rule” states that most drugs that have been approved by the American Food and Drug Administration (FDA) have less than five hydrogen-bond donors, less than ten hydrogen-bond acceptors, a log P value (a measure of hydrophobicity) of less than five, and a molecular weight of less that 1’500 Da. Therefore, SMs with known structures can be pre-screened in silico prior to a chemogenomic experiment to select those that are most likely to have certain biological activities.

However, Lipinski’s rule is limited in its usefulness to chemogenomic screens carried out in yeasts such as S. cerevisiae and Schizosaccharomyces pombe as it has been developed to describe the of approved drugs in humans. It is not clear if the same limitations to molecular properties also apply to those SMs that show bioactivity in yeasts. Here, I address this question by first computing the molecular properties of SMs, including the predicted octanol-water (log P), Chapter 4: Chemogenomics 4-18 the polar surface area, the number of atoms, the molecular mass and volume, the number of hydrogen bonds and acceptors, the number of rotable bonds, and the number of violations of Lipinski’s rule. I then compare those properties between SMs that are bioactive in S. cerevisiae and those that are not. Furthermore, I ask whether there are properties that distinguish SMs that are active in S. cerevisiae from those that are active in Sc. pombe. In the following, I use the term bioactivity to mean the ability of a compound to inhibit the growth of a yeast colony.

In order to determine whether SMs that are bioactive in S. cerevisiae tend to have different molecular properties from those SMs that are not bioactive, I analysed two distinct datasets. The first dataset (dataset A) has until now not been published and describes the IC50 values of 2’961 SMs in both S. cerevisiae and Sc. pombe. The second dataset (dataset B) has been created by a high-throughput screen by an NCI yeast anticancer drug screen and separates 87’264 SMs into those that have an effect on the growth of S. cerevisiae and those that do not (see Materials and Methods).

Results and discussion

In order to assess whether any molecular properties are significantly distinct for the SMs that are bioactive in S. cerevisiae and Sc. pombe compared to those SMs that are not, I analysed two distinct datasets, A and B (see Materials and Methods).

What differentiates bioactive small molecules from others?

For dataset A, there is a clear difference between active and non-active SMs for a number of chemical descriptors. The strongest predictor of biological activity is log P, a measure of hydrophobicity (table 4-3), which was more than 80% larger in active SMs than in inactive ones (p < 5.54 · 10-12).

Bioactive SMs on average tend to be more hydrophobic, have a lower polar surface area, have a higher molecular weight, a lower number of hydrogen bond acceptors, a lower number of hydrogen bond donors, and a higher volume than the other SMs screened. Furthermore, a bigger proportion of bioactive compounds, compared to compounds that are not bioactive, violate one or more of Lipinski's rules. Chapter 4: Chemogenomics 4-19

Property Explanation Mean Mean Wilcoxon active inactive test p-value logP predicted octanol-water partition coefficient log P 3.59 1.98 5.5 · 10-12 (hydrophobicity) PSA topological molecular polar surface area 55.32 80.97 1.7 · 10-9 (important to predict drug transport properties) natoms number of atoms 22.99 22.63 0.10 MW molecular weight 352.32 326.90 1.5 · 10-4 nON number of hydrogen bond acceptors 4.06 5.47 3.1 · 10-8 nOHNH number of hydrogen bond donors 1.15 2.14 6.3 · 10-9 nviolations how many of Lipinski's rules are violated 0.37 0.33 0.08 nrotb number of rotable bonds (measure of molecular 4.54 3.92 0.40 flexibility) volume molecular volume in Å3 299.00 284.20 4.6 · 10-3

Table 4-3. Active SMs are defined as those that have an IC50 value of less than 100 μM in both species. Inactive SMs are those that have a higher IC50 value in either or both species. The Wilcoxon test p-value indicates how likely it is that the descriptor is the same for SMs that hit S. cerevisiae and Sc. pombe, and SMs that hit neither.

These rules state that the majority of approved drugs have less than five hydrogen bond donors, less than ten hydrogen bond acceptors, a log P value of less than five, and a molecular weight of less than 500 Da (Lipinski, Lombardo et al. 2001). In all cases, the majority of SMs that I defined as being bioactive against the two fungi follow these rules. However, in the case of log P and molecular weight, the bioactive SMs are closer to Lipinski's threshold values than the drugs that are not bioactive.

In order to confirm the chemical differences between SMs that are bioactive and those that are not, I performed principal component analysis using the nine descriptors mentioned above as initial dimensions. After dimensionality reduction, the difference between the two types of molecules could be confirmed. It has previously been shown that there are differences between SMs from combinatorial synthesis, and drugs as well as natural products (Feher and Schmidt 2003). My analysis shows that there are also significant differences within classes of SMs according to their bioactivity.

In order to confirm these results, I repeated the analysis using a previously published dataset that was larger than the one used above. In the initial stage of an anticancer drug screen, 87’264 SMs were tested against six different strains of S. cerevisiae. Of these, 12’068 had a strong effect on yeast growth and 75’196 did not. The results of this screen have been made available online (http://dtp.nci.nih.gov/yacds/index.html; last accessed 20 July 2009) and constitute dataset B. As before, I determined the molecular properties of the tested drugs to see which ones are predictive of activity.

Chapter 4: Chemogenomics 4-20

PCA of molecular properties PC2 -200 -150 -100 -50 0 50 100

-800 -600 -400 -200 0 200 400

PC1

Figure 4-6: Plot of the first two principal components (PCs) of the molecular properties of the 2’961 SMs of dataset A. Active SMs (IC50 < 100 μM) are in red, inactive SMs in grey. The first principal component (PC1) explains 92% of the variance in the data, PC2 explains 6%.

The same descriptors are over- and underrepresented according to this dataset compared to the 2’961 SMs in dataset A.

Property Explanation Mean Mean Wilcoxon active inactive test p-value logP predicted octanol-water partition coefficient log P 3.41 2.44 < 2.2 · 10-16 (hydrophobicity) PSA polar surface area (important to predict drug 61.07 66.16 < 2.2 · 10-16 transport properties) natoms number of atoms 23.40 20.54 < 2.2 · 10-16 MW molecular weight 351.91 297.96 < 2.2 · 10-16 nON number of hydrogen bond acceptors 4.42 4.70 < 2.2 · 10-16 nOHNH number of hydrogen bond donors 1.22 1.38 < 2.2 · 10-16 nviolations how many of Lipinski's rules are violated 0.39 0.25 < 2.2 · 10-16 nrotb number of rotable bonds (measure of molecular 4.90 4.39 < 2.2 · 10-16 flexibility) volume molecular volume in Å3 300.91 262.36 < 2.2 · 10-16

Table 4-4. The Wilcoxon test p-value indicates how likely it is that the descriptor is the same for SMs that hit S. cerevisiae and those that do not. This table is similar to table 4-3, but is based on a different dataset (87’264 SMs from an NCI cancer drug screen).

What differentiates small molecules that hit S. cerevisiae from those that hit Sc. pombe?

Having established that there are chemical properties that allow distinguishing between SMs that are bioactive and those that are not, it is of interest to establish whether these properties vary between organisms. I therefore compared the chemical properties of SMs that are bioactive in S. cerevisiae to the properties of SMs that are bioactive in Sc. pombe. There are several important differences between the two species that would explain why such differences in sensitivity to Chapter 4: Chemogenomics 4-21

certain SMs exist, including differences in the length of cell cycle phases, or the presence of RNAi machinery in Sc. pombe but not in S. cerevisiae.

S. S. cerevisiae pombe

59 115 54

2733

Figure 4-7: Of 2’961 NCI SMs, the majority that has an IC50 value of less than 100 μM in one species also has an IC50 value of less than 100 μM in the other species. This indicates that an SM that is bioactive to one species is also likely to be bioactive in the other species.

IC50 < 1’000 μM IC50 < 100 μM IC50 < 10 μM IC50 < 1 μM hits S. cerevisiae only 84 59 20 2 hits Sc. pombe only 55 115 48 4 hits S. cerevisiae and Sc. pombe 132 54 48 29 hits neither 2’690 2’733 2’845 2’926 p-value < 2.2 · 10-16 < 2.2 · 10-16 < 2.2 · 10-16 1.89 · 10-7

Table 4-5: As figure 4-7, but at multiple different IC50 cut-offs. The strong overlap between SMs that hit Sc. pombe and S. cerevisiae does not depend on the IC50 cut-off used.

It could be that some SMs are bioactive in one organism but not the other because they have different molecular properties. As before, I obtained the molecular structures of each of the SMs that have different bioactivity in the two species and computed nine properties for each of them in order to assess whether one of these properties was significantly distinct for the SMs that hit one organism but not the other. No significant result could be found, indicating that bioactivity in S. cerevisiae is a good predictor of bioactivity in Sc. pombe and vice versa.

Property Explanation Mean S. Mean Wilcoxon cerevisiae Sc. test p-value pombe logP predicted octanol-water partition coefficient log 2.83 3.13 0.46 P (hydrophobicity) PSA polar surface area (important to predict drug 66.01 61.46 0.71 transport properties) natoms number of atoms 22.75 23.04 0.85 MW molecular weight 334.30 336.76 0.97 nON number of hydrogen bond acceptors 4.52 4.21 0.66 nOHNH number of hydrogen bond donors 1.63 1.74 0.91 nviolations how many of Lipinski's rules are violated 0.27 0.43 0.09 nrotb number of rotable bonds (measure of molecular 3.62 4.28 0.99 flexibility) volume molecular volume in Å3 288.12 296.88 0.56

Table 4-6. Properties of drugs that only hit S. cerevisiae versus those that only hit Sc. pombe (IC50 < 1’000 μM). The Wilcoxon test p-value indicates how likely it is that the descriptor is the same for drugs that hit S. cerevisiae and drugs that hit Sc. pombe. Chapter 4: Chemogenomics 4-22

Conclusions

Some of the above observations on the differences between biologically active and inactive SMs could be due to the way the chemical libraries that were tested were generated. The removal of chiral centres, introducing additional flexibility into the molecule and decreasing its size often lead to lower activity (Feher and Schmidt 2003). As chemical libraries are often created doing exactly those things, this may explain why, for example, the molecular weight is larger for SMs that are active compared to those that are inactive. Filtering chemical libraries for SMs with properties that are enriched in the bioactive set could therefore increase the efficiency of such screens.

Materials and Methods

Two datasets measuring the bioactivity of SMs in yeast were used.

The first dataset (dataset A) was created by Laura Schuresko-Kapitzky of the University of California, San Francisco and Scott Lokey of the University of California,

Santa Cruz (unpublished). It describes the IC50 values of 2’961 NCI compounds in both wild-type S. cerevisiae and wild-type Sc. pombe. The data was obtained by high-throughput yeast halo assays (Gassner, Tamble et al. 2007).

The second dataset (dataset B) describes the bioactivity of 87’264 NCI compounds on S. cerevisiae strains. In the initial stage of the anticancer drug screen that produced this dataset, 87’264 drugs were tested against six different strains of S. cerevisiae. Of these, 12’068 had a strong effect on yeast growth and 75’196 did not (http://dtp.nci.nih.gov/yacds/index.html).

I downloaded the molecular structures of each of the SMs (ftp://ftp.ncbi.nih.gov/pubchem/) and computed nine properties for each of them in order to assess whether one of these properties was significantly distinct for the SMs that affect growth. SM properties were computed using Molinspiration's mib tool (http://www.molinspiration.com/). A local version of the mib tool was kindly provided by John J. Irwin of the University of California, San Francisco.

All data was analysed using a combination of in-house Perl scripts and the statistical programming language R. Chapter 4: Chemogenomics 4-23

References

Ashburner, M., C. A. Ball, et al. (2000). "Gene ontology: tool for the unification of biology. The Gene Ontology Consortium." Nat Genet 25(1): 25-9. Baetz, K., L. McHardy, et al. (2004). "Yeast -wide drug-induced haploinsufficiency screen to determine drug mode of action." Proc Natl Acad Sci U S A 101(13): 4525-30. Balaji, S., M. Madan Babu, et al. (2006). "Comprehensive analysis of combinatorial regulation using the transcriptional regulatory network of yeast." J Mol Biol 360(1): 213-27. Borowiak, M., R. Maehr, et al. (2009). "Small molecules efficiently direct endodermal differentiation of mouse and human embryonic stem cells." Cell Stem Cell 4(4): 348-58. Bredel, M. and E. Jacoby (2004). "Chemogenomics: an emerging strategy for rapid target and drug discovery." Nat Rev Genet 5(4): 262-75. Brenner, C. (2004). "Chemical genomics in yeast." Genome Biol 5(9): 240. Butcher, R. A., B. S. Bhullar, et al. (2006). "Microarray-based method for monitoring yeast overexpression strains reveals small-molecule targets in TOR pathway." Nat Chem Biol 2(2): 103-9. Chang, M., M. Bellaoui, et al. (2002). "A genome-wide screen for methyl methanesulfonate-sensitive mutants reveals genes required for S phase progression in the presence of DNA damage." Proc Natl Acad Sci U S A 99(26): 16934-9. Chen, S., M. Borowiak, et al. (2009). "A small molecule that directs differentiation of human ESCs into the pancreatic lineage." Nat Chem Biol 5(4): 258-65. Cheng, Y. and G. M. Church (2000). "Biclustering of expression data." Proc Int Conf Intell Syst Mol Biol 8: 93-103. D'Haeseleer, P. (2005). "How does gene expression clustering work?" Nat Biotechnol 23(12): 1499-501. Deutschbauer, A. M., D. F. Jaramillo, et al. (2005). "Mechanisms of haploinsufficiency revealed by genome-wide profiling in yeast." Genetics 169(4): 1915-25. Dolma, S., S. L. Lessnick, et al. (2003). "Identification of genotype-selective antitumor agents using synthetic lethal chemical screening in engineered human tumor cells." Cancer Cell 3(3): 285-96. Dudley, A. M., D. M. Janse, et al. (2005). "A global view of pleiotropy and phenotypically derived gene function in yeast." Mol Syst Biol 1: 2005 0001. Ehrlich, P. (1911). "Aus Theorie und Praxis der Chemotherapie." Folia Serologica VII(7): 697-714. Feher, M. and J. M. Schmidt (2003). "Property distributions: differences between drugs, natural products, and molecules from combinatorial chemistry." J Chem Inf Comput Sci 43(1): 218-27. Florian, S., S. Hümmer, et al. (2007). "Chemical genetics: reshaping biology through chemistry." HSFP Journal 1(2): 104-114. Gassner, N. C., C. M. Tamble, et al. (2007). "Accelerating the discovery of biologically active small molecules using a high-throughput yeast halo assay." J Nat Prod 70(3): 383-90. Gensini, G. F., A. A. Conti, et al. (2007). "The contributions of Paul Ehrlich to infectious disease." J Infect 54(3): 221-4. Giaever, G., P. Flaherty, et al. (2004). "Chemogenomic profiling: identifying the functional interactions of small molecules in yeast." Proc Natl Acad Sci U S A 101(3): 793-8. Giaever, G., D. D. Shoemaker, et al. (1999). "Genomic profiling of drug sensitivities via induced haploinsufficiency." Nat Genet 21(3): 278-83. Chapter 4: Chemogenomics 4-24

Greene, D., G. Cagney, et al. (2008). "Ensemble non-negative matrix factorization methods for clustering protein-protein interactions." 24(15): 1722-8. Haggarty, S. J., K. M. Koeller, et al. (2003). "Multidimensional chemical genetic analysis of diversity-oriented synthesis-derived deacetylase inhibitors using cell-based assays." Chem Biol 10(5): 383-96. Harbison, C. T., D. B. Gordon, et al. (2004). "Transcriptional regulatory code of a eukaryotic genome." Nature 431(7004): 99-104. Hillenmeyer, M. E., E. Fung, et al. (2008). "The chemical genomic portrait of yeast: uncovering a phenotype for all genes." Science 320(5874): 362-5. Hohmann, S. (2005). "The Yeast Network: mating communities." Curr Opin Biotechnol 16(3): 356-60. Hopkins, A. L. (2008). "Network : the next paradigm in drug discovery." Nat Chem Biol 4(11): 682-90. Hopkins, A. L. and C. R. Groom (2002). "The druggable genome." Nat Rev Drug Discov 1(9): 727-30. Hopkins, A. L., J. S. Mason, et al. (2006). "Can we rationally design promiscuous drugs?" Curr Opin Struct Biol 16(1): 127-36. Huangfu, D., R. Maehr, et al. (2008). "Induction of pluripotent stem cells by defined factors is greatly improved by small-molecule compounds." Nat Biotechnol 26(7): 795-7. Klabunde, T. (2007). "Chemogenomic approaches to drug discovery: similar receptors bind similar ligands." Br J Pharmacol 152(1): 5-7. Lamb, J., E. D. Crawford, et al. (2006). "The Connectivity Map: using gene- expression signatures to connect small molecules, genes, and disease." Science 313(5795): 1929-35. Lee, T. I., N. J. Rinaldi, et al. (2002). "Transcriptional regulatory networks in Saccharomyces cerevisiae." Science 298(5594): 799-804. Lipinski, C. and A. Hopkins (2004). "Navigating chemical space for biology and medicine." Nature 432(7019): 855-61. Lipinski, C. A., F. Lombardo, et al. (2001). "Experimental and computational approaches to estimate solubility and permeability in drug discovery and development settings." Adv Drug Deliv Rev 46(1-3): 3-26. Lloyd, N. C., H. W. Morgan, et al. (2005). "The composition of Ehrlich's salvarsan: resolution of a century-old debate." Angew Chem Int Ed Engl 44(6): 941-4. Luesch, H. (2006). "Towards high-throughput characterization of small molecule mechanisms of action." Mol Biosyst 2(12): 609-20. Luesch, H., S. K. Chanda, et al. (2006). "A approach to the mode of action of apratoxin A." Nat Chem Biol 2(3): 158-67. Luesch, H., T. Y. Wu, et al. (2005). "A genome-wide overexpression screen in yeast for small-molecule target identification." Chem Biol 12(1): 55-63. Ma'ayan, A., S. L. Jenkins, et al. (2007). "Network analysis of FDA approved drugs and their targets." Mt Sinai J Med 74(1): 27-32. Madan Babu, M. (2004). Introduction to microarray data analysis. : Theory and Application. R. P. Grant, Horizon Press (UK): 225-249. Madeira, S. C. and A. L. Oliveira (2004). "Biclustering algorithms for biological data analysis: a survey." IEEE/ACM Trans Comput Biol Bioinform 1(1): 24-45. Mnaimneh, S., A. P. Davierwala, et al. (2004). "Exploration of essential gene functions via titratable promoter alleles." Cell 118(1): 31-44. Muhlrad, D. and R. Parker (1999). "Aberrant mRNAs with extended 3' UTRs are substrates for rapid degradation by mRNA surveillance." Rna 5(10): 1299-307. Nobeli, I., A. D. Favia, et al. (2009). "Protein promiscuity and its implications for biotechnology." Nat Biotechnol 27(2): 157-67. Chapter 4: Chemogenomics 4-25

Parsons, A. B., R. L. Brost, et al. (2004). "Integration of chemical-genetic and genetic interaction data links bioactive compounds to cellular target pathways." Nat Biotechnol 22(1): 62-9. Parsons, A. B., A. Lopez, et al. (2006). "Exploring the mode-of-action of bioactive compounds by chemical-genetic profiling in yeast." Cell 126(3): 611-25. Perlstein, E. O., E. J. Deeds, et al. (2007). "Quantifying fitness distributions and phenotypic relationships in recombinant yeast populations." Proc Natl Acad Sci U S A 104(25): 10553-8. Perlstein, E. O., D. M. Ruderfer, et al. (2007). "Genetic basis of individual differences in the response to small-molecule drugs in yeast." Nat Genet 39(4): 496-502. Raikhel, N. and M. Pirrung (2005). "Adding precision tools to the plant biologists' toolbox with chemical genomics." Plant Physiol 138(2): 563-4. Rognan, D. (2007). "Chemogenomic approaches to rational drug design." Br J Pharmacol 152(1): 38-52. Scherf, U., D. T. Ross, et al. (2000). "A gene expression database for the molecular pharmacology of cancer." Nat Genet 24(3): 236-44. Schuldiner, M., S. R. Collins, et al. (2005). "Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile." Cell 123(3): 507-19. Sharom, J. R., D. S. Bellows, et al. (2004). "From large networks to small molecules." Curr Opin Chem Biol 8(1): 81-90. Spring, D. R. (2005). "Chemical genetics to chemical genomics: small molecules offer big insights." Chem Soc Rev 34(6): 472-82. Stark, C., B. J. Breitkreutz, et al. (2006). "BioGRID: a general repository for interaction datasets." Nucleic Acids Res 34(Database issue): D535-9. Stockwell, B. R. (2000). "Frontiers in chemical genetics." Trends Biotechnol 18(11): 449-55. Stockwell, B. R. (2004). "Exploring biology with small organic molecules." Nature 432(7019): 846-54. Wuster, A. (2008). "Networking with Drugs." Mol BioSyst 4: 14-17. Xu, R. and D. Wunsch, 2nd (2005). "Survey of clustering algorithms." IEEE Trans Neural Netw 16(3): 645-78. Yildirim, M. A., K. I. Goh, et al. (2007). "Drug-target network." Nat Biotechnol 25(10): 1119-1126.