Integrative Computational Biology for Cancer Research
Total Page:16
File Type:pdf, Size:1020Kb
Hum Genet (2011) 130:465–481 DOI 10.1007/s00439-011-0983-z REVIEW PAPER Integrative computational biology for cancer research Kristen Fortney · Igor Jurisica Received: 3 February 2011 / Accepted: 2 April 2011 / Published online: 22 April 2011 © The Author(s) 2011. This article is published with open access at Springerlink.com Abstract Over the past two decades, high-throughput Introduction (HTP) technologies such as microarrays and mass spec- trometry have fundamentally changed clinical cancer Since the commercialization of DNA microarray technol- research. They have revealed novel molecular markers of ogy in the late 1990s, high-throughput (HTP) data relevant cancer subtypes, metastasis, and drug sensitivity and resis- to cancer research have been accumulating at an ever- tance. Some have been translated into the clinic as tools for increasing rate. These data have led to crucial insights into early disease diagnosis, prognosis, and individualized treat- fundamental cancer biology, including the mechanisms of ment and response monitoring. Despite these successes, tumorigenesis, metastasis, and drug resistance (Rhodes and many challenges remain: HTP platforms are often noisy Chinnaiyan 2005). They have also had enormous clinical and suVer from false positives and false negatives; optimal impact, e.g., several cancers can now be fractionated into analysis and successful validation require complex work- therapeutic subsets with unique prognostic outcomes Xows; and great volumes of data are accumulating at a based on their molecular phenotypes (Buyse et al. 2006; rapid pace. Here we discuss these challenges, and show Dhanasekaran et al. 2001; Lowe et al. 2010; Pegram et al. how integrative computational biology can help diminish 1998; Slamon and Press 2009; Spentzos et al. 2004; Zhu them by creating new software tools, analytical methods, et al. 2010b). Despite these successes, many cancers still and data standards. have a high mortality rate and no eVective treatment. Looking at 1.9 million patients from 31 countries and 5 continents, the CONCORD study found that current treatments achieve a 5-year survival rate for less than 50% of diagnosed can- Electronic supplementary material The online version of this cers (Coleman et al. 2008). For many cancers, survival article (doi:10.1007/s00439-011-0983-z) contains supplementary rates have not changed in decades—pancreatic cancer material, which is available to authorized users. remains almost 100% lethal, and the overall survival rate K. Fortney · I. Jurisica for lung cancer has improved only from 13% to 16%. Most Department of Medical Biophysics, cancers still lack any eVective early disease biomarkers, University of Toronto, Toronto, ON, Canada and predictive signatures are limited to a few known muta- I. Jurisica tions, such as EGFR or kRAS in lung cancer or HER2 in Department of Computer Science, breast cancer. Predictive and prognostic biomarkers are University of Toronto, Toronto, ON, Canada often inconsistent from study to study (i.e., they show poor overlap), and cannot be validated by other methods or in I. Jurisica (&) Ontario Cancer Institute, Princess Margaret Hospital, new cohorts of patients (Diamandis 2010; Dupuy and University Health Network, Toronto, ON, Canada Simon 2007; Lau et al. 2007). e-mail: [email protected] The key diYculty is that cancer is a complex and hetero- geneous disease: many genes are ampliWed, deleted, I. Jurisica Campbell Family Institute for Cancer Research, mutated, and up- or down-regulated. Many pathways are Toronto, ON, Canada activated or suppressed. These changes vary substantially 123 466 Hum Genet (2011) 130:465–481 in diVerent cancers, in diVerent patients with the same can- The challenges facing HTP cancer biology cer, and even in diVerent tumor samples from the same patient (Axelrod et al. 2009; Bachtiary et al. 2006; Black- A wealth of genomic and proteomic cancer data is now hall et al. 2004). To get the full picture, we will need to available from HTP screens. While these data have combine information from diverse experimental platforms improved our understanding of basic cancer biology and and other sources that oVer diVerent perspectives on the some have even translated into improved patient diagnosis problem, e.g., gene and protein expression, protein–protein and treatment, signiWcant challenges remain. In this section interactions (PPIs) and pathways, chromosomal aberra- we brieXy review some of the major obstacles to the pro- tions, mutation events, epigenetic changes, and clinical gress of HTP cancer research. information from drug trials and the bedside—leading to integrative computational biology. Noise The challenges fall into three main categories. The Wrst is noise: HTP platforms are inherently noisy—results vary Cancer is a heterogeneous disease, and HTP platforms are substantially from run to run and from lab to lab, and are noisy, i.e., the resulting data sets have false positives and prone to false positives and negatives. The second chal- false negatives. Consequently, there can be large variation lenge is volume: there is a vast quantity of relevant data, in results from lab to lab (Bell et al. 2009; Irizarry et al. new data are piling up at an increasing rate and old data 2005); methods are needed to control this noise and to inte- need to be constantly reinterpreted and updated in the light grate diVerent data sources to make them more reliable. The of new Wndings or reanalyzed with improved algorithms. main noise issues that plague HTP cancer research are false The third challenge is analysis: simple methods—such as negatives and false positives, intra- and inter-sample heter- diVerential expression analyses of microarray data—often ogeneity, and platform bias. miss much of the signal in the data. Integrative computational methods will continue to play False negatives and false positives in HTP data a central role in addressing these challenges. We need new designs for databases; new software and workXows to com- HTP screens suVer from substantial noise—both false bine and continuously update heterogeneous and distrib- positives and false negatives—which must be resolved uted data; new analytical methods to identify complex through complementary experiments and computational signals in diVerent data sources; and new standards for analysis (AuVray et al. 2009; Augen 2001). For example, generating, maintaining, and sharing data. These methods while HTP PPI screens can identify thousands of protein will depend on advances in many areas such as statistics, interactions at once, they do so at the cost of either high false knowledge representation and ontology, machine learning, discovery rates or poor sensitivity. The false discovery rate is data mining, graph theory, and visualization. Integrative the proportion of detected interactions that are false, and sen- analyses may ultimately lead to a better understanding of sitivity is the proportion of true interactions that are success- cancer, earlier diagnoses and true personalized medicine, fully detected. When interactions detected by two HTP where therapies are individually tailored based on combina- studies were tested in small-scale screens, they were found to tions of single nucleotide polymorphisms, and gene, protein have false discovery rates of 22% (Rual et al. 2005) and 38% and microRNA expression levels (AuVray et al. 2009; (Stelzl et al. 2005). An evaluation of Wve HTP methods Augen 2001; Cervigne et al. 2009; Reis et al. 2010; Zhu found that their sensitivity rates ranged from only 21 to 36% et al. 2010a, b). at false discovery rates of 0–11% (Braun et al. 2009). Simi- Integrative computational biology shares many tools and larly, mass spectrometry analyses of human serum typically goals with the closely related Weld of systems biology, the produce many false negatives (Gstaiger and Aebersold discipline that attempts to explain the structure and behav- 2009). One problem is that human serum has a high dynamic ior of complex biological systems as a function of their range—protein concentrations are estimated to vary over ten simpler components (Kirschner 2005). The term systems orders of magnitude (Gstaiger and Aebersold 2009). biology may be applied to describe anything from the out- Mass spectrometers have a much smaller range of detection, put of an ‘omics’ experiment to explicit mechanical models leading to false negatives: low abundance proteins are not of tumor growth (Deisboeck et al. 2009; Hornberg et al. detected. This challenge may be somewhat diminished by 2006). Integrative computational biology, in contrast, spe- extensive sample fractionation (Kislinger et al. 2006). ciWcally concerns the computational interrogation and inte- gration of diverse HTP molecular biology data. Tumors are heterogeneous In this review, we discuss the major challenges of HTP cancer studies and give examples of integrative computa- Often only a single sample from each tumor is available for tional methods that help to meet them. analysis. But tumors are highly heterogeneous, so these 123 Hum Genet (2011) 130:465–481 467 samples may not be representative of the whole tumor use every year; and there will always be new diVerences (Axelrod et al. 2009; Bachtiary et al. 2006; Blackhall et al. and conXicts to resolve. The challenge is to have infrastruc- 2004). Tumors comprise cells belonging to several distinct ture set-up to compare and integrate new technologies as subpopulations—e.g., tumor regions can be hypoxic to they come along, and systematically identify