Quick viewing(Text Mode)

Platforms for Biomarker Analysis Using High- Throughput Approaches In

Platforms for Biomarker Analysis Using High- Throughput Approaches In

unit 2. : practical aspects

chapter 7. Platforms for analysis using high-

throughput approaches in Unit 2 Chapter 7 Chapter , transcriptomics, , , and B. Alex Merrick, Robert E. London, Pierre R. Bushel, Sherry F. Grissom, and Richard S. Paules

Summary

Global biological responses that transcriptomics, proteomics and ultimate aim of genomic platforms reflect disease or exposure metabolomics that can be performed is to fully map this landscape to are kinetic and highly dynamic at a cell-, tissue-, or organism- more completely describe all of phenomena. While high-throughput wide basis. Bioinformatics is the the biological interactions within a DNA sequencing continues to discipline that derives knowledge living system, during disease and drive genomics, the possibility of from the large quantity and diversity , and define the behaviour more broadly measuring changes of biological, genetic, genomic and and relationships of all the in expression has been a data by integrating components of a biological system. recent development manifested by computer science, mathematics, The development of databases and a diversity of technical platforms. statistics and graphic arts. Gene, knowledge bases will support the Such technologies measure protein and expression integration of data from multiple transcripts, proteins and small profiles can be thought of as domains, as well as computational biological molecules, or , snapshots of the current, poorly- modelling. This chapter will and respectively define the fields of mapped molecular landscape. The describe the technical platform

Unit 2 • Chapter 7. Platforms for biomarker analysis using high-throughput approaches 121 in genomics, transcriptomics, proteomics, metabolomics, and bioinformatics methods involving DNA sequencing, that allow comparison of the genetic blueprint of an individual , nuclear genetic organization, evolution and is relatively static, the various levels magnetic resonance combined function. Nearly 300 genomes have of gene expression to form and with separation systems, and been completely sequenced and operate a complex organism are bioinformatics to derive genomic range from unicellular organisms, dynamically regulated, structurally and gene expression data and like E. coli and S. cervisiae, to complex and spatially determined. include the relevant bioinformatic model invertebrate organisms, such At any point in time, only a portion tools for analysis. These genomic, as Drosophila melanogaster and of a is expressed in or platforms should have C. elegans, to several mammalian specific cells and tissues. At the wide application to epidemiological species for which the completed mRNA level of gene expression, the studies. and ongoing genome projects are all represents all available online (3). transcribed at any one moment, and Introduction Although the conception of the the is the complement of idea for sequencing the human proteins making up cells and tissues. The sequencing of the human genome is relatively recent, the Small molecules and metabolites genome stands as one of the major project could not have occurred comprise the . The scientific achievements of the without the preceding decades global study of each gene expression twentieth century. It embodies a of biological and technological level is suffixed with “omics,” such defining moment in modern biology developments. Particularly as transcriptomics, proteomics and by which most high-throughput noteworthy of these contributions are metabolomics. Figure 7.1 suggests technologies are compared for size early cytogenetics and chromosomal a sequence of gene expression scope, and complexity. Beginning in studies at the beginning of the based on genomic DNA sequences 1990, it took roughly a decade for the twentieth century by Morgan and that are dynamically reflected in first draft of the human genome to colleagues, the discovery of DNA changes of transcripts, proteins be completed. By 2003, about 99% structure by Watson and Crick in and metabolites. Each level of of all gene-containing regions were 1953, DNA cloning in 1973 by Berg gene expression (represented by described, numbering about 20 500 and Cohen, the DNA sequencing upward, curved, dotted lines) has genes (1), although some regions of reaction in 1975 by Sanger, reverse the opportunity to feed back and the genome, such as centromeres, transcriptase in 1970 and restriction influence other levels reflective telomeres and gene deserts, endonucleases in 1971, and the of highly integrated, multicellular continue to undergo characterization polymerase chain reaction (PCR) processes in cells, tissues and and study. Data from the human by Mullis in 1983 (4). A new century organisms. Studies of each gene has provided a now begins with an era of “omics,” expression area utilize very generalized human map of the those fields describing a multitude of different technical platforms to three billion nucleotides comprising genomic functions aimed at further maximize large scale coverage of the DNA of a few human subjects. deciphering the biological meaning the transcriptome, proteome and However, studies on the variations of sequences in the human genome. metabolome. Technical platforms (polymorphisms) in human DNA The purpose of this chapter is may involve mass parallel analysis sequences are currently underway; to describe the technical platforms using robotics, miniaturization, samples from 270 individuals of in genomics, transcriptomics, automation and computer multiethnic backgrounds are being proteomics, metabolomics and processing. The integration of the used in a consortium called the bioinformatics that could be useful many levels of gene expression is International HapMap Project for in epidemiologic studies. These often referred to as haplotype mapping (http://www. analytical platforms favour high by bioinformatics or computational hapmap.org) (2). The goal is to sample throughput and generation biology. Bioinformatics represents identify the patterns of single of large data sets. an applied field of mathematics to nucleotide polymorphism (SNP) and groups, called haplotypes or haps, Omes and omics using statistics, computer science among individual human beings. In and artificial intelligence to design addition, interpretation of the human Gene expression constantly algorithms to derive biological genome has been greatly enhanced changes during health, adaptation, meaning from gene expression by the DNA sequencing of many other toxicity, disease and aging. While data.

122 Figure 7.1. The interdependence of the cellular metabolome, proteome, transcriptome, chromosomal aberrations and and genome. Each type of characterization provides a functional indication of the mutations that affect . activity of the proceeding set of molecules (solid lines). Conversely, there will be DNA sequencing does have inherent some degree of feedback regulation built into the system (dotted lines). advantages in achieving single- base resolution and importantly for de novo analysis of samples without the prior knowledge of existing DNA sequence required for fabricated sequence platforms (8). New sequencing technologies that are high-throughput and low-cost while maintaining high accuracy and completeness are in

continued development (9). New Unit 2 platforms often integrate real-time

(RT) PCR and may incorporate 7 Chapter microelectrophoresis, sequencing by hybridization, mass spectrometry, high-density oligonucleotide arrays, or incorporation of nanopore technology to bring within sight the goal of routine human genome sequencing (10) for (11). There are several alternatives to whole-genome assessment of chromosomal abnormalities that do not involve traditional DNA sequencing. Alternative platforms involve selective genotyping of very focused genomic loci (short Genomics such as those in in the Li- sections of DNA) that are potentially Fraumeni syndrome (5) and ATM in related to disease susceptibility. Chromosomal abnormalities are ataxia telangiectasia (6), have been Haplotype mapping data have responsible for many developmental found by sectional resequencing been extremely useful in providing defects and malignancies, and of genomic DNA or PCR-amplified candidate genomic regions with high include rearrangements in genomic DNA or RNA. However, de novo polymorphic variation. Hundreds DNA or changes in copy number, sequencing of individuals using of thousands of loci can be very such as deletions, duplications automated Sanger-based capillary rapidly genotyped using BeadArray and amplifications. Identification of electrophoresis systems has so far platforms, a technology based genomic changes and mutations been practical for only small regions upon direct hybridization of whole- that underlie disease rely on of the human genome-containing genome-amplified (WGA) genomic comparisons of DNA sequences candidate genes. DNA to BeadArrays of locus- between affected and unaffected Recent advances in nucleic specific, 50mer oligonucleotide individuals. Finding disease- acid sequencing technologies using sequences (12). As such, genome- causing chromosomal abnormalities massive parallel sequencing, called wide association studies (GWAS) by genomic analysis is confounded next-generation sequencing, now comprise an important evolving field by the fact that many sequence allow sequencing of much larger in genetic epidemiology in which polymorphisms are functionally genomic intervals (7). Sequencing more than 450 GWAS have been irrelevant and produce no observable of entire genomes can take place published, and the associations of biological consequence. Detection within a matter of several weeks, greater than 2000 single nucleotide of many disease-causing mutations, in a comprehensive search for polymorphisms (SNPs) or genetic

Unit 2 • Chapter 7. Platforms for biomarker analysis using high-throughput approaches 123 in genomics, transcriptomics, proteomics, metabolomics, and bioinformatics loci have been reported so far (13). to perform basic functions of the or oligonucleotides, have been used Equally as important are high- cell, regardless of cell type or extensively since the late 1990s, density oligonucleotide microarrays tissue, but a proportion of the particularly by academia-based for SNP detection in linkage expressed genes contribute to research scientists (see Figure analysis of susceptibility genes a cell’s unique phenotype and 7.2). Commercial oligonucleotide often used in cancer studies (14) or specialized functions. Beginning arrays provide highly reproducible (15). In addition, around 1989, DNA microarrays, platforms representing the entire fluorescence in situ hybridization consisting of thousands of high- genome. Oligonucleotides from (FISH) provides a visual map to density cDNAs or oligonucleotides 25–70 bp in length are arrayed by examine all the chromosomes of on support surfaces (called chips), either spotting pre-synthesized a patient for abnormalities using were introduced and have evolved oligonucleotides directly onto fluorescent probes for specific into powerful and versatile platforms glass, chemically synthesizing genes or whole chromosome probes for transcriptomic analysis (20–23). directly onto glass substrates (16). Although high-throughput Spotted microarrays, either cDNAs (e.g. Agilent Technologies, Inc.), sequencing methodologies have Figure 7.2. Platforms for gene expression in transcriptomics, proteomics and been developed to accommodate metabolomics. Transcriptomic platforms are cDNA or oligonucleotides bound the demand for sequence output, to glass slides or microbeads for analysis of mRNA. Metabolomic platforms are they consume large amounts of nuclear magnetic resonance (NMR) or mass spectrometry (MS) instruments for a valuable and potentially limiting small biologic molecules or metabolites. Proteomic platforms can be gel-based or liquid -based (e.g. linear column gradients or multidimensional genomic DNA. WGA can potentially chromatography (MuDPIT)) for separation of proteins before identification (ID) by remove DNA as a limiting factor mass spectrometry. Use of stable isotopes greatly facilitates protein quantitation for genomic analyses (17) that (ICAT, isotope coded affinity tags; iTRAQ, isobaric tags for relative and absolute include multiple displacement quantitation; SILAC, stable isotope labelling by amino acids in culture), while non- isotopic or label-free methods can also be used that include spectral counting and amplification (MDA), primer ion precursor signal measurement. Retentate chromatography mass spectrometry extension preamplification (PEP), (i.e. surface enhanced laser detection ionization (SELDI)) has been used for rapid and degenerate oligonucleotide profiling of biofluid samples using chemically reactive surfaces for separation and primed PCR (DOP) (18). However, MALDI for generating protein mass spectra. Alternatives for MS-based proteomics involve affinity arrays, such as microarrays or fluorescently tagged antibody genomic amplification technologies bound bead suspensions (i.e. Luminex technology). generate a certain level of replication error in sequence, which should be considered during verification studies. In summary, bead or chip DNA arrays or FISH platforms exemplify whole-genome technologies for high-throughput, compared to DNA sequencing for detection of genetic variation that can be linked to disease-based foci, protein, biomarkers and pharmacogenomic responses.

Transcriptomics

Transcriptomics studies the full, global complement of mRNA molecules expressed in cells and tissues. Some 20 500 genes are present within the human genome, of which about 10–15 000 are expressed at any one time in any particular tissue (19). Many expressed genes are necessary

124 or by synthesizing directly onto manufacturer. Investigators have fixed, paraffin-embedded material quartz wafers by photolithographic been allowed to shift resources for gene expression analysis. technology (e.g. Affymetrix, Inc.) away from multiple analyses of single An accessible, biological (24). In addition, oligonucleotides samples to focusing on expanding the material fluid of principal interest to can be covalently linked with micro- numbers of experimental samples. several clinical research scientists beads that can then be used in a In turn, this has resulted in significant is blood. Many researchers are 96-well microtiter dish format or on improvements in the confidence of interested in testing the utility a glass substrate (e.g. Illumina, Inc.). results from microarray experiments. of gene expression profiling of With ever increasing technological In fact, several large consortium peripheral blood leukocytes to advances, microarrays have efforts have demonstrated that generate biomarkers as surrogates progressed from chips with only comparable biological affects could for other tissues or organs affected several hundred probes to modern be revealed in carefully controlled in disease or injury processes DNA chips reflecting expression experiments in which multiple (28). The utility of this approach commercial microarray platforms, as has been demonstrated in studies of thousands, to even millions, of Unit 2 features per array. well as rigorously quality-controlled of inflammatory responses and

The strength of any gene spotted cDNA arrays, were used to diseases in both animal models and 7 Chapter expression analysis, and the ability analyse the same biological material humans, of neurological disorders, to determine expression profiles as (25–27). of angiomyolipoma (AML) and potential biomarkers of exposure Additional technological renal cell cancers, and of cardiac or effect, is dependent on proper improvements have allowed for the injury (29–36). Recent studies have experimental design and careful reduction in the starting amounts used gene expression analysis execution to minimize sources of of mRNA required in the labelling of blood samples to generate variance and error and maximize processes for the commercial molecular profiles as biomarkers useful biological information. Thus platforms. Most labelling protocols of exposure and exposure-induced it is critical, in any microarray used a single, PCR-based linear injury to arsenic, benzene, tobacco experiment, to design proper amplification of sample mRNA, which smoke and hepatotoxic levels of controls to include with samples of is used to incorporate a nucleotide acetaminophen (37–41). biological interest. Proper sampling conjugated with a fluorescent dye, A common application and integrity of the RNA obtained biotin, or some other chemical in transcriptomics, useful in from those samples are vital in modification. This amplification step epidemiology studies, is to compare determining the success of the has reduced the starting material transcript outputs between normal analysis. Once RNA is isolated, for a sample to be analysed to only and diseased tissues in what has proper labelling, hybridization, a few micrograms of total mRNA been termed transcript profiling washing and scanning can or less. Furthermore, protocols or expression profiling. Transcript dramatically influence the integrity have been developed for additional expression studies can query all of the resulting data. Since transcript rounds of PCR-based amplification known or predicted genes in an expression is generally expressed of starting mRNA samples that organism, providing an abundance in relative terms of a fold change make it possible to analyse very of information that represents a of a particular gene expressed small quantities in the range of snapshot of the expression status in one sample relative to a value nanograms, and even picograms, of of a tissue at any given time. One in a control or normal sample, mRNA. These developments have can gain considerable insight it is critical that the investigator facilitated gene expression profiling into molecular mechanisms from structure the experiment in such a of samples derived from laser properly structured microarray way as to minimize variations other capture microdissection (LCM), experiments, both on the level of than the one variable to be tested. for example, as well as biopsy individual genes and on the level of Fortunately, commercial microarray samples, and other clinically derived biological pathways and processes. providers have added increasingly samples that are limited in quantity. The potential for mRNA degradation more stringent quality control In addition, recent technological makes expression profiling most measurements in the production developments, particularly using applicable to freshly isolated tissues, facilities. This has resulted in high bead-based microarrays (e.g. cultured cells or flash-frozen tissue reproducibility and low variation Illumina BeadChip), have opened sections, but not paraffin-embedded in microarrays coming from a up the possibility of using formalin- tissue. While microarray approaches

Unit 2 • Chapter 7. Platforms for biomarker analysis using high-throughput approaches 125 in genomics, transcriptomics, proteomics, metabolomics, and bioinformatics can be used to interrogate the entire localization of transcription factor during any single expression genome on a single microarray binding sites can be accomplished analysis (52). Proteins exhibit many chip, focused arrays representing by chromatin immunoprecipitation attributes of interest to biomarker distinct gene subsets have been (ChIP) analysed on a microarray development in epidemiology used to focus upon changes in chip that forms the so-called ChIP- studies, including determination of specific pathways or processes. on-chip technique (45). The method protein sequence identity, quantity, These include both glass slide- can be innovatively combined with post-translational modifications based microarrays (e.g. the National different types of DNA arrays, (PTM), protein–protein interactions, Institute of Environmental Health such as SNP chips, to form “ChIP- structure and function. Some of the Sciences’ Human ToxChip (42)), on-SNP” (46). The future for array challenges in proteomic analysis and PCR-based gene expression technologies will also bring about a include: defining the identities and analyses (e.g. SuperArrays). revolution in clinical DNA diagnostics quantities of an entire proteome in DNA arrays can also reflect (47), develop pharmaceuticals a particular spatial location, such epigenetic effects upon gene in pharmacogenomics (48), and as serum or subcellular structures expression. Epigenetics is defined personalized medicine (49). like mitochondria; the existence as heritable changes in gene of multiple protein forms and expression that are not due to DNA Proteomics complexes; the evolving structural sequence alterations. Methylation and functional annotations of the is the most common epigenetic The field for describing protein human and rodent ; change and is detected by bisulfite expression on a global scale is and integration of proteomics conversion, methylation-sensitive proteomics, which aims to detail data with transcriptomics or other restriction , methyl-binding the structure and functions of all expression data. Primary aims of proteins, and anti-methylcytosine proteins in an organism over time. proteomic analysis are to achieve . Combining these The wide application of proteomics maximal proteome identification, techniques with DNA microarrays has generated great interest in many quantitative high-throughput protein and high-throughput sequencing established disciplines of exposure measurement, timely analysis, has made the mapping of DNA biology and medicine, including and discovery-oriented platforms. methylation feasible on a genome- the field of epidemiology (50,51). Proteomic platforms represent wide scale. Genomic DNA Chemical or toxicant exposure can combinations of technologies methylation occurs particularly at bind to or modify proteins, produce that describe protein attributes cytosines in clusters of cytosine- changes in protein expression, by the separation, quantitation guanine dinucleotides, or CpG and dysregulate critical biological and identification of all proteins islands (p is the phosphodiester pathways and processes that lead to in a biological sample. Proteomic bond between C and G bases). toxicity and disease, which in theory analysis includes four broad Methylation of CpG islands in should be detectable by proteomic categories of proteomic platforms: promoter regions frequently results analysis. Primary aims in proteomic mass spectrometry has played a in gene silencing, which normally analysis are the discovery of key central role in proteomic platform occurs during development (43), modified proteins, the determination development in large part because but is often observed as an early of affected pathways, and the of its sensitive and versatile ability alteration in some cancers by development of biomarkers for to identify proteins; the ability to causing inactivation of tumour association with and eventual separate proteins greatly determines suppressors genes, such as von prediction of disease. the designation of platform type Hippel-Lindau disease (VHL), The complexity of a proteome, by gel-based separation or liquid inhibitor of cyclin-dependent kinase represented by the total protein chromatophic separation linked 4a (p16INK4a), and breast cancer expression of a specific cell, organ, to mass spectrometry (53); solid gene 1 (BRCA1) (44). tissue or biofluid, presents numerous phase adsorption, based on DNA microarrays have also been challenges for comprehensive partitioning of peptides and proteins developed for expression beyond analysis. Proteins are more due to specific chemical properties, profiling. In addition to SNP and complex than nucleic acids, and has been exploited in rententate comparative genomic hybridization therefore proteomic analysis chromatography combined with (CGH) applications mentioned in involves measurement of just some mass spectrometry; and finally, the previous section, genome-wide of the many attributes of proteins affinity chromatography, which

126 sorts and identifies proteins in one reproducibility and quantitation Advantages of this newer platform reaction, is exemplified by use than conventional image analysis are the potential for detection and of antibodies in various formats for comparison of multiple 2D gels. identification of low abundance (54). The following proteomic Thus, separation of proteins by 2D proteins that may not be observed platforms represent some of gels, using single stains or multiple in gel staining-based methods. the primary technologies being fluors (i.e. DIGE), can be combined One drawback is that LC-MS/MS used for separating, identifying, with mass spectrometry for ready platforms, like MuDPIT, are only and quantifying proteins during protein identification to form a semiquantitative and somewhat toxicoproteomic studies (Cf. Figure versatile and discovery-oriented low-throughput in capacity. 7.2). platform for use in proteomic studies The issue of protein quantitation (56). In addition, some protein in proteomics is an important one, 2D PAGE and DIGE samples are sufficiently limited in since changes in protein expression their protein content that a simple may be a matter of altering existing

Two-dimensional polyacrylamide size separation (one dimension) by gene and protein expression rather Unit 2 gel electrophoresis (2D PAGE) SDS–PAGE can be used to identify than turning them on (induction) systems have been combined with protein bands of interest by mass or off (repression), which makes 7 Chapter mass spectrometry in an established spectrometry in 1D-Gel-MS. quantitation of proteins crucial in and adaptable platform. Since normal and diseased or control and 1975, 2D PAGE has been the most Multidimensional LC-MS/MS experimental states. The use of commonly used proteomic platform stable isotopes, as detailed in the to separate and comparatively Proteomic platforms incorporating section below, takes advantage of quantitate protein samples (55). liquid chromatography (LC) as the high resolution power of mass Current state-of-the-art 2D gels the primary means of separation spectrometers to discriminate use immobilized pH gradient (IPG) (versus gel-based separations) protein samples stable-isotopically gels to separate proteins by charge. have become the preferred means labelled for quantitative comparison. They are then resolved by mass of analysis. There are many However, rapid developments spectrometry using sodium dodecyl different types of LC separations are being made using “label-free” sulfate (SDS) gel electrophoresis often termed “multidimensional.” approaches to quantitation if sample for effective separation of complex In this proteomic platform, LC is mass spectral data are sufficiently protein samples in μg to mg used to separate protein digests detailed (58). The two main quantities. Either visible stains, by exploiting different biophysical approaches used are ion precursor such as Coomassie Blue or silver, properties of proteins before signal intensities and spectral or fluorescent staining are used for identification by tandem mass counting. Though they deliver sensitive protein detection. After spectrometry (MS/MS). One of the relative sample quantitation, versus electronic alignment (registration) of most notable multidimensional LC- absolute protein measurement, stained proteins in 2D gels by image MS/MS platforms is Multidimensional they are simpler and less costly analysis software, intensities of Protein Identification Technology than stable isotopic methods, but identical protein spots are compared (MuDPIT). MuDPIT attempts to not as precise. Label-free protein among treatment groups and a ratio identify all proteins in a sample quantitation methods should be (fold change) is calculated for each by two-dimensional separation of considered at the outset of designing protein. A relatively new variation of protein digests by charge (strong LC-MS/MS proteomic studies and the 2D PAGE technique, difference anion exchange matrix) and weighed against the considerable gel electrophoresis (DIGE), allows hydrophobicity (C18 column) with advantages and choices of stable an investigator to measure three online LC immediately before entry isotopic approaches. samples per gel that have been into a tandem mass spectrometer labelled with Cy2, Cy3 and Cy5 (MS/MS) for protein identification Stable isotope LC-MS/MS fluorescent dyes, which reduces (57). The platform has also been platforms: ICAT, iTRAQ some of the error associated called shotgun proteomics, as entire and SILAC with electronic registration during protein lysates are trypsin digested multiple gel alignment. This strategy into thousands of peptide fragments A primary goal of proteomics is allows for direct comparison of without any prior fractionation to comprehensively analyse all samples on one gel for better before separation and identification. proteins in a sample, or as many

Unit 2 • Chapter 7. Platforms for biomarker analysis using high-throughput approaches 127 in genomics, transcriptomics, proteomics, metabolomics, and bioinformatics as possible. However, the prospect (iTRAQ), or metabolic incorporation (SELDI-TOF-MS) instrument (63). of quantifying protein levels of isotopically tagged amino acids Analysis of samples is relatively for comparison among protein in cell culture (stable isotope rapid (100/day), and only a few μl of samples has been a difficult aspect labelling with amino acids in cell sample is necessary. A downside is of proteomic analysis. Protein culture (SILAC)). The applications that protein identification of peaks quantitation can be considered either of stable isotopes in proteomics is not readily accomplished without in relative terms as a proportion of have been recently reviewed for additional conventional separation treatment (test) samples compared their sensitive detection of proteins and analysis. to control samples, or in absolute in a quantitative and comparative terms as the number of molecules fashion (60). As mentioned above, Antibody arrays (moles) or concentration (molarity). continuing improvements in spectral Internal standards are useful, but counting for use in LC-MS/MS Protein microarrays represent a not realistic, for complex protein platforms should have wide utility promising new proteomic tool that samples of unknown composition in as a versatile, isotope-free method closely emulates the design for most proteomic studies. Since many of protein quantitation when stable parallel analysis of DNA microarray proteomic platforms are based in isotope use is not feasible (61). technology (64). Protein microarray mass spectrometry, comparison formats can be divided into two of intensity signals seems the SELDI-TOF mass types of multianalyte sensing most direct means for comparative spectrometry formats: forward phase arrays and measurements; however, intensities reverse phase arrays. In the forward are subject to many interfering Retentate chromatography-mass phase array format, the analyte is factors. Quantitation by mass spectrometry (RC-MS) is a high- captured from the solution phase by spectrometry has generally been throughput proteomic platform different capture molecules, such regarded as semiquantitative under that creates a laser-based mass as an antibody immobilized on a the best of circumstances. spectrum (based on matrix- substratum (i.e. glass slides). In The use of stable isotopes for assisted laser desorption ionization contrast, the reverse array format tagging proteins has made great time-of-flight (MALDI-TOF) mass immobilizes the individual test strides in proteomics for determining spectrometry) from a chemically- samples in each array spot so that, the relative amounts of proteins absorptive surface. The principle for example, hundreds of different among samples (59). Stable isotopes of this approach is the adsorptive patient blood or tissue samples of an element differ in mass due to retention of a subset of sample are arrayed and probed with one the number of neutrons, but have proteins on a thin, chromatographic detection protein, such as an antibody the same elemental and chemical support (i.e. hydrophobic, normal and a single analyte endpoint are characteristics as the element. phase, weak cation exchange, measured for comparison across Stable isotopes are not radioactive. strong anion exchange or multiple samples. Such microarray Common stable elements and their immobilized metal affinity supports). studies have been carried out for stable isotopes are 1H and 2H; 12C The absorptive surfaces are placed metastatic ovarian cancer (65). and 13C; 14N and 15N; 16O and 18O; on thin metal chips which can be Many different types of capture 32S and 34S. A unique feature of inserted into a MALDI-type mass molecules can be arrayed, including high-resolution mass spectrometers spectrometer. The laser rapidly peptides (i.e. peptide substrates for is the ability to finely distinguish desorbs proteins from each sample kinases on phosphorylation arrays), between small differences in mass, on a metal chip to create a mass proteins (protein–protein interaction even to the point of resolving spectrum profile. RC-MS can be arrays) and oligonucleotides (i.e. the relative abundance of stable performed upon any protein sample, transcription factor binding arrays isotopes in otherwise identical but thus far this platform has found to oligonucleotides), but the most samples. Several proteomic greatest utility in the analysis of prevalent are antibody arrays in platforms for protein quantitation serum and plasma for disease the forward phase format. Antibody and identification have been built (62). The lead arrays can directly separate proteins around the use of isotopic tagging commercial platform of RC-MS from complex biological fluids like of proteins (isotope coded affinity proteomic platforms is the surface- plasma, serum or cell lysates by tagging (ICAT), peptides (isotope tag enhanced laser desorption ionization affinity binding to specific antigenic for relative and absolute quantitation time-of-flight mass spectrometry sites on target proteins.

128 Generally, current commercial based on five genes and 10 proteins density LC-MS/MS platforms with antibody array platforms fall into with an accuracy of 91.7%, sensitivity stable isotope labelled peptides, three classes based on the targeted of 96.2% and specificity of 80% (67). spectral counting, and parallel proteins: cytokine/chemokine In a different approach, a molecular use of complementary proteomic arrays, cellular function protein epidemiological study used SELDI- platforms, such as tissue arrays arrays and cell signalling arrays. TOF MS for in vivo studies of (71). Study designs that remove Although not all proteins for any humans exposed to benzene. By abundant proteins from biofluids, given cell type or biofluid (i.e. blood, using two sets of 10 exposed and enrich subcellular structures, and serum, plasma, urine, cerebral spinal 10 unexposed subjects, researchers include cell-specific isolation from fluid) are currently represented on identified with chemically-reactive heterogeneous tissues will greatly antibody arrays, they do provide a surfaces and validated with antibody- increase differential expression rapid screen for protein alterations coated chips three differentially capabilities. Advancement in that may be relevant to tissue injury expressed proteins in the serum mechanistic insights and biomarker development using proteomics or disease (54). Antibodies can be of benzene-exposed individuals, Unit 2 placed in ordered array on glass two of which were identified as will be furthered by completely slides or on a fluorescent microbead PF4 and CTAP-III, both members defining plasma (serum) proteome 7 Chapter format (i.e. Luminex technology) of the CXC-chemokine family and circulating microparticles in for multiplexed separation, (68). The same can be done with humans and rodent species as identification and quantitation (66). peptides (instead of antibodies) as accessible biofluids (72). Some of For example, in a study investigating the affinity ligand. This method was the representative biomarkers and the chemotherapeutic and applied in the development of two patterns of protein and message radiotherapy of patients with rectal diagnostic antibodies against avian expression are shown in Table 7.1. cancer, 40 tumour samples were influenza detection for epidemiologic Reviews on using proteomics to analysed by DNA microarray and studies, in which the epitopes develop biomarkers and further plasma samples were analysed by of two monoclonal antibodies mechanistic insights have been antibody (Luminex) bead microarray (mAbs) against avian influenza published (73–75). platforms. Using a kernel-based nucleoprotein (NP) were found using method with Least Squares truncated NP recombinant proteins Metabolomics Support Vector Machines to predict and peptide array techniques (69). rectal cancer regression grade, Future developments in Analogous to the genomic investigators found that combining proteomics will see incorporation characterization of cellular DNA and integrating of microarray and of more sophisticated methods of and the proteomic characterization proteomics data improved predictive quantitation in proteomic analysis of the set of proteins expressed power leading to the best model (70), combining higher data at a given time, cells also can be

Table 7.1. Recently identified biomarkers and signatures of toxicity using transcriptomics and proteomics

Biomarker/Signature Identification Condition Reference No.

KIM-1 DNA microarray Renal toxicity (145)

Adipsin DNA microarray GI toxicity, functional gamma secretase inhibitors (146) (FGSIs)

CXC-chemokines SELDI Benzene exposure (68)

Troponin I,T 2D gel-MS Myocardial ischemia, infarction (147)

Aminopeptidase-P 2D gel-MS Radioimmunotherapy (148) Annexin A1

12 -gene signature DNA microarray -induced phospholipidosis (149)

Multigene blood signature DNA microarray Systemic sepsis (32)

Unit 2 • Chapter 7. Platforms for biomarker analysis using high-throughput approaches 129 in genomics, transcriptomics, proteomics, metabolomics, and bioinformatics characterized in terms of the set of approaches for clinical diagnosis Other more specialized low-molecular-weight metabolites of metabolic disorders (78,79) techniques, such as Fourier (typically < 1500 D) that comprise the summarize methods for metabolite transform infrared (FTIR) cellular “metabolome.” The cellular analysis and provide good spectroscopy and arrayed metabolome provides a functional examples of the application of MS electrochemical detection, have readout of the cellular proteome (Cf. to metabolomic analysis (Cf. Figure been used in some cases (81,82). Figure 7.1). Although the analysis of 7.2). Mass spectrometric analysis The main limitation of FTIR is the homogeneous cell populations in typically requires preparation of low level of detailed molecular culture, receiving identical nutrients the metabolic components using identification that can be achieved. and oxygenation, and exposed either (GC) In one study, MS was also employed to the same levels of excreted after chemical derivatization, or for metabolite identification (82). waste products, represents the LC, with the newer method of ultra- The extensive application of most ideal system for metabolomic performance liquid chromatography nuclear magnetic resonance (NMR) characterization, the approach has (UPLC) increasingly used. The use for metabolic characterization been extended to the analysis of of capillary electrophoresis (CE) of urine and other body fluids extracts and fluids derived from coupled to MS also has shown has been developed primarily by higher organisms. Urinary and some promise. It was reported that Nicholson and coworkers (83–85). blood metabolites have been a combination of approaches for The primary advantage of the NMR among the most frequent targets metabolite extraction produced over approach (Cf. Figure 7.2) is the lack for metabolomic characterization, 10 000 unique metabolite features, of required preparatory separations, but analyses of other fluids, such indicating both the complexity which in turn leads to an unbiased as (CSF), of the human metabolome and and potentially quantitative measure bronchoalveolar lavage fluid (76) the potential of metabolomics in of the constituents of the sample. and saliva (77), and of cellular biomarker discovery (80). Alternatively, while in principle it extracts also have been performed. Typical 1H NMR spectra illustrating Figure 7.3. Typical 1H NMR spectra illustrating the metabolite composition of urine the different metabolite composition (A), saliva (B) and plasma (C). Used with permission from (77). of urine, blood, and saliva are shown in Figure 7.3. Currently, metabolomic characterization is being used for a wide range of objectives in human nutrition and , and for the development of pharmaceuticals and agricultural products. The underlying objectives of these studies include: discovery of metabolite signatures as prognostic indicators, diagnostics or biomarkers of disease states; establishing toxicological markers for drug development and environmental toxicology; understanding mechanisms of metabolic diseases; and correlation of metabolite (metabotype) with genotype and environmental input (e.g. nutrition). The screening of neonates for genetic disorders in intermediary is an application that predates the more recent interest in metabolomic characterization. TMAO, trimethylamine oxide; DMA, dimethylamine. The recent reviews of analytical

130 is possible to obtain a molecular compound and all of its metabolites reveal clustering behaviour generally mass corresponding to each and conjugates can be identified, degrades cluster quality, and unfragmented metabolite observed these can simply be ignored or concluded that PCA was only useful in a mass spectrum, in an NMR eliminated from the analysis. One in special cases (91). Hence, there spectrum, molecular information is general source of toxicity resulting is a critical need for the identification distributed among the resonances from the administration of high of metabolic biomarkers that provide of a compound, and the position levels of test compounds is the universally quantifiable indications of these resonances can depend depletion of sulfur-containing amino of organ function and toxicity. critically on pH, salt concentration, acids that results from the excretion Another critical issue related to divalent ions and other physical of glutathione and cysteine the identification of biomarkers is the parameters. Additional caveats conjugates. Thus, it was reported need to separate acute and chronic discussed in various reviews include that rats receiving high levels of effects of illnesses or toxins. It is not loss of more volatile metabolites, acetaminophen excreted significant unusual for a particular metabolite amounts of pyroglutamic acid. This to become elevated during the metabolite contributions derived Unit 2 from intestinal bacteria (86), potential effect of excess acetaminophen acute phase of a toxic response, bacterial growth in stored fluids, and was prevented/reversed by but to become depressed as the 7 Chapter other factors (84,87). Some have supplementation with methionine chronic effects become significant. proposed using dimethylamine and (89). The glutathione analogue Alternatively, in the absence of its nitroso metabolite as biomarkers ophthalmic acid recently has been chronic effects, the metabolite level for small bowel bacterial overgrowth found to accumulate after high may return to pretreatment values. (86), while others have suggested dosage with acetaminophen, and For this reason, plots of data obtained monitoring 4-hydroxyphenylacetate may also function as a biomarker at different times after dosage, as a potential screening method for oxidative stress and glutathione or trajectory plots for individual for small bowel disease and depletion (90). subjects, can provide critical bacterial overgrowth syndromes Various multivariate analyses information on the time-dependent (88). Significant levels of ethanol of metabolite composition have response. Several of the issues in plasma or urinary samples been applied to detect differences discussed above are illustrated in generally indicate either bacterial among subject groups, such as studies identifying the association contamination of the sample or those receiving different treatments of the oxidative stress biomarker small bowel bacterial overgrowth or different chemical exposures. 8-oxoguanosine with Parkinson's (87). Such conditions typically This type of analysis, termed disease (92). In this study, elevation accompany renal failure or other “metabonomics” by the Nicholson of the 8-oxoguanosine level was serious illnesses. group, most frequently utilizes observed in cerebrospinal fluid In studies of chemical toxins principal component analysis (CSF), but not in the serum of or pharmaceutics, the xenobiotic (PCA) of the spectral data to patients with Parkinson. Further, and its metabolites and conjugates reveal clustering behaviour that there was a significant negative typically constitute an important differentiates treated from control correlation between the level of source of variation of the subjects. The clusters of data points the biomarker and the duration of metabolome. While of interest in PC plots reveal the uniformity the disease. Finally, as indicated in from a metabolic perspective, of the control and treated groups, Table 7.2, 8-oxoguanosine is also these compounds are not directly as well as the extent to which the elevated in other conditions, e.g. indicative of organ toxicity or treated group yields a distinct amyotrophic lateral sclerosis (93). therapeutic response, so that in metabolic phenotype. Since the Another important limitation on the general these metabolites are not axes of the PC plot are dependent on identification of some biomarkers relevant to the study. One approach the data set and do not correspond relates to the chemical reactivity, to dealing with this issue involves to independent variables, the ability which can deplete the free metabolite stable isotope labelling of the of different laboratories to utilize pool and lead to heterogeneous compounds under study, so that the the published information is limited. adduct formation and difficulties of compound and metabolites derived Interestingly, a recent study that detection. Homocysteine, which has from it will exhibit characteristic evaluated statistical methodology long been linked to cardiovascular features in the NMR or mass for the analysis of gene expression disease, provides one example (94). spectrum. Alternatively, if the test data found that the use of PCA to

Unit 2 • Chapter 7. Platforms for biomarker analysis using high-throughput approaches 131 in genomics, transcriptomics, proteomics, metabolomics, and bioinformatics Much of the early NMR work biomarkers that have been related to phosphate-containing metabolites evaluating the effects of various metastatic growth are proteins, but and in phospholipid composition toxins noted changes in the levels new metabolomic markers continue have been correlated with tumour of tricarboxylic acid cycle (TCA) to be developed. The metabolite stage and metastatic spread. In metabolites and other abundant 12(S)-hydroxyeicosatetraenoic studies of extracts from tumour cell molecules that may be present in acid (12(S)-HETE) has been lines, elevations in phosphorylcholine the nutrient source and that are not demonstrated to play a pivotal role or other membrane-related organ-specific (85). More recently, in experimental melanoma invasion phosphomonoesters have frequently several more specific metabolomic and metastasis, suggesting that been observed, and suggested to be biomarkers have been correlated 12-lipoxygenase expression may be correlated with metastatic potential with various diseases or treatments important in early human melanoma (98,99). Increases have been (Table 7.2). Notably, most of the carcinogenesis (95–97). Changes in observed in aspartyl-4-phosphate

Table 7.2. Biomarkers recently identified using metabolomics

Biomarker Sample Analysis Condition Reference No.

NAN; NMR – human and rodent urine Type 2 diabetes mellitus (110) 2-PY NAN; NMR – rat urine peroxosome proliferation (111,112) 2-PY ADMA MS –human blood plasma Renal failure; atherosclerosis (108,109)

Ophthalmic acid MS – mouse serum, liver extract Acetaminophen-induced hepatotoxicity (90)

Pyroglutamic acid NMR – rat urine APAP-induced deficiency of sulfur- amino acids (89)

3-nitrotyrosine HPLC - human CSF Amyotrophic lateral sclerosis (150)

8-oxoguanosine HPLC – human CSF Alzheimer's disease (93)

8-oxoguanosine HPLC – human CSF Parkinson's disease (92)

Modified nucleosides LC-IT-MS of human urine Breast cancer (102)

12(S)-HETEa HPLC - tumor cell extracts Human melanoma (95-97)

Aspartyl-4-phosphate DESI-MS/NMR – murine urine Lung cancer/ tumour growth (100)

Phosphorylcholine 31P NMR – cell extracts Breast cancer cell extracts (98)

Depressed lysophosphatidyl 31P NMR – blood plasma Renal cell carcinoma (101) choline levels Elevated xanthine, GC-MS – human urine Lesch-Nyhan syndrome (103) hypoxanthine, urate Glc-Gal-pyridinoline HPLC - human urine Synovial degradation – RA (104-106)

4-hydroxyphenyl acetate GC-LC – human urine SBBO (88)

Dimethylamine, GC – human serum; SBBO (86) nitrosodimethylamine GC – whole blood

a In most reported studies, concentration or enzymatic activity of 12-lipoxygenase, rather than 12(S)-HETE, has been determined. 2-PY, N-methyl-2-pyridone-5-carboxamide; ADMA, NG,NG-dimethylarginine; APAP, Acetaminophen; DESI, Desorption ; GC-LC, Gas chromatography- liquid chromatography; GC-MS, Gas chromatography-mass spectrometry; HPLC, High performance liquid chromatography; LC-IT-MS, Liquid chromatography-ion trap-mass spectrometry; NAN, N-methylnicotinamide; NMR, Nuclear magnetic resonance; RA, ; SBBO, small bowel bacterial overgrowth.

132 that may correlate with tumour to arise from incompletely mining methods and other analytical growth (100). Phosphorus-31 NMR derivatized pyroglutamic acid tools are typically employed in studies of blood plasma derived and serine (113,114), and the a bioinformatics framework to from patients with advanced renal quantitative abnormalities of these effectively manage, analyse and cell carcinoma have been found metabolites in urine from patients summarize the plethora of data. For to exhibit depressed levels of with chronic fatigue syndrome/ example, proteomics approaches, lysophosphatidylcholine (101). A myalgic encephalomyelitis has been such as SELDI-TOF and mass significantly improved discrimination reported to be artefactual. Early spectrometry in conjunction with of breast cancer patients based on analyses supporting the use of 1H bioinformatics tools, have greatly metabolomic analysis of modified NMR of blood sera to diagnose facilitated the discovery of new and nucleosides present in urine was coronary artery disease (115) have better serum biomarkers to detect recently reported (102). subsequently been found to be more cancer (120). A metabolomic approach using a equivocal than originally suggested The bioinformatics processes to

combination of gas chromatography and to compare unfavourably with translate omics data into clinically Unit 2 and MS has identified elevations in angiography-based diagnosis (116). useful biomarkers can comprise a the levels of the metabolites xanthine, As more specific biomarkers are myriad of steps, beginning with initial 7 Chapter hypoxanthine, urate and guanine identified, the power of this approach analysis of the data to validation of in patients with Lesch-Nyhan will continue to evolve, providing the biomarkers (121). This multistep syndrome (103). The urinary cross useful diagnostic information process typically involves discovery, link product, Glc-Gal-pyridinoline for pathological, environmental, data integration, predictive modeling, (104), has been identified as a toxicological, pharmaceutical and and delivery of the biomarkers to biomarker for synovial degradation nutritional research, as well as the clinic in a format to facilitate observed in osteoarthritis (105); enhancing the value of metabolomic implementation (122). Bioinformatic treatment with ibuprofen lowers the analysis for basic research into analyses may take different level of this excreted metabolite mechanisms of toxicity. Future approaches with several checkpoints (106). Endogenously formed NG,NG- developments in mass spectrometry along the way. A flowchart is dimethylarginine, also referred to platforms in metabolomics will described for the application of as asymmetric dimethylarginine increase the detectable coverage bioinformatics strategies to improve (ADMA), is a potent inhibitor of of the metabolome in clinical the identification of candidate nitric oxide synthase (107). Plasma specimens and experimental biomarkers from cancer genome- levels increase as a consequence species, and permit better wide expression analyses (123). The of renal failure (108), and ADMA has identification of metabolites in the process proceeds with acquisition of been identified as a biomarker for process of converting raw data to gene expression data from cancer atherosclerosis (109). Metabolomic biological knowledge (117). tissues, followed by the identification analyses have identified several of candidate genes as biomarkers. pyridine derivatives in urine Bioinformatics The next step entails meta-mining from diabetic rats (110), and the public cancer data sets of the same same derivatives have shown The wealth of data generated type of pathophysiology to reduce up as biomarkers of peroxisome through high-throughput omics the biomarker false-positive rate. The proliferation (Table 7.2) (111,112). approaches has become last step before use of the biomarkers The presence of these compounds increasingly complex and too vast in clinical trials involves validation of indicates a perturbation of the for conventional biomarker analysis the candidates by RT–PCR, ELISA, tryptophan-nicotinamide adenine strategies. Bioinformatics has tissue arrays, immunohistochemistry, dinucleotide (NAD) pathway. played a crucial role in biomarker and other types of bioassays. As is typical for new discovery and validation (Figure What are these bioinformatics technologies, some reports of 7.4). It is a multidisciplinary field tools and processes and how are putative biomarkers have proven involving biology, computer science, they used to aid in the discovery and controversial. Early identification of mathematics and statistics to validation of disease biomarkers? the partly-characterized metabolites derive knowledge from biological, It is helpful to appreciate that a CFSUM1 and CFSUM2, associated genomics and genetic data (118,119). useful biomarker must be objective, with chronic fatigue syndrome, Database systems, computational highly accurate and very reliable were subsequently demonstrated algorithms, statistical models, data in determining disease states and

Unit 2 • Chapter 7. Platforms for biomarker analysis using high-throughput approaches 133 in genomics, transcriptomics, proteomics, metabolomics, and bioinformatics Figure 7.4. Sequential scheme for the integration of omics with bioinformatics in the biomarker discovery process

assessing risk. In other words, they normalization (SVN) approach was methods. The underlying omics data must generalize well (i.e. extend) developed specifically to remove should be rich enough to be narrowed to broad cases, exhibit significance systematic error from microarray down to a core set of features that in their reporting, and be precise gene expression data (129). Baseline best represent the biology of the in their utility. Unfortunately, omics subtraction, signal smoothing and system for biomarker selection. data possesses systematic variation normalization methodologies were Caution must be taken with respect due to the experimental error in employed in the preprocessing of to the preprocessing of omics data the data acquisition process (124). mass spectrometry proteomics data for biomarker identification, as the There are technical limitations in the to reduce the noise and to make the use of a particular combination of sensitivity of detecting biomarkers analysis of spectral data comparable methods will surely add variability to that are lowly expressed or non- (130). In general, once the omics the set of indicators selected. In other abundant; however, biomarkers do data is made unbiased and adjusted words, the preprocessing and filtering not need to be highly expressed to make fair comparisons across steps are sources of variability in and or in large abundance. Thus, the samples, all the data can be used of themselves. The US Food and challenge in biomarker identification to mine for biomarkers or be filtered Drug Administration-led MicroArray is successfully mining omics data first to remove uninformative or Quality Control Phase II (MAQC- with inherent error to find the redundant information. Some believe II) consortium set out to address features that reliably, accurately that the inclusion of uninformative this by trying to understand the and objectively relate to the features in the biomarker selection limitations of various bioinformatics pathophysiology of a disease (125). process will severely degrade the data analysis methods in developing Data normalization and performance of the predictor model and validating microarray-based dimension reduction (data (131). Thus, it has been suggested predictive models, and determining condensing) techniques have been to remove variables that do not if best practices for development and widely used to preprocess omics contribute to a biological response validation of predictive models based data before biomarker discovery. of interest before the selection of on microarray gene expression and Standard ways of dealing with data biomarkers. Filtering of the data genotyping data can be derived normalization have been adopted can be based on a signal or relative for biomarker discovery and for omics data. Robust Multiarray level, fold change, a confidence personalized medicine (132). Average (RMA), loess and quantile level, standard deviation from the The development and utilization normalization methods have mean of the distribution, P-values, of classification and prediction seemed to pass the test of time mutual information, full or partial methods for analysis of omics data (126–128). A systematic variation correlations, or more elaborate have transcended the process

134 for identifying biomarkers from small set of relevant indicators with Finally, a host of techniques and molecular signatures. One of the minimum redundancy, and 2) the software to integrate omics data most widely used methods for validation of the predictors using are summarized, to shed additional classification is clustering. The a classifier and cross-validation light on the complex molecular process works by using a dissimilarity strategy. To balance false-positives interactions that take place on a measure for the feature profiles in the and false-negatives in the selection systems biology level (142). omics data to iteratively form groups of biomarkers, a clever method was Repositories of omics data of samples that are tightly clustered. proposed to use common peaks from various studies that can be Features that cluster well together in mass spectrometry data as the queried may bring about improved and can distinguish between predictive indicators (137). The means for detecting biomarkers groups of samples that differ in procedure applies AdaBoost (a of a clinical process or phenotype pathophysiology are considered to form of ensemble classifier training) than one which is isolated or from be potential biomarkers. Clustering to perform the classification and a small group of data sets (143). to select the informative common This realization has motivated the and an F-test-like score based on Unit 2 within- and between-sample gene peaks. generators of omics data to store expression variance measures were Bioinformatics approaches them in repositories for meta-mining 7 Chapter used effectively to identify an intrinsic to discover biomarkers can purposes. Figure 7.5 represents gene subset (i.e. a molecular portrait) take on more sophisticated a brief list of some of the publicly that has a high predictive score implementations. For instance, a accessible databases that store, for human breast tumours (133). dependence (interaction) network distribute and permit querying of More sophisticated bioinformatics modeling scheme was suggested omics data (a more comprehensive methods have been developed to for identifying biomarkers from list is presented in (142)). A plasma identify potential biomarkers. A groups of genes or proteins (138). proteome database at the Institute hybrid approach was developed Very clear differences were of Bioinformatics that stores based on the genetic algorithm observed in the dependence comparisons of human plasma (GA) and k-nearest neighbours networks for cancer and non-cancer protein concentration levels along (KNN) classifier that is capable samples. On the other hand, a gene with their isoforms in normal and of identifying gene and protein selection algorithm was used based disease states should be useful for molecular signatures of diseases on Gaussian processes to discover discovery of novel biomarkers (144). based on microarray and proteomics consistent gene expression patterns Another database, ONCOMINE, data, respectively (134,135). The associated with ordinal clinical stores a collection of curated cancer GA serves as a search tool to phenotypes (139). The method was gene expression profiles integrated choose small subsets of predictors, able to identify subsets of genes as with a therapeutic target database whereas the KNN functions as a potential biomarkers for colon and and biological resources, such as non-parametric (no distribution prostate cancers. The integration Gene Ontology, so that the data can model assumed) pattern recognition of time-course microarray gene be mined for putative biomarkers method to evaluate the discriminative expression data with cytotoxicity (123). ability of the subsets. More recently, measurements, by way of a partial The use of bioinformatics a hybrid approach for biomarker least squares objective criterion, tools has increased mechanistic discovery from microarray gene has been shown to be useful for understanding and development expression data was developed identifying biomarkers in primary rat of biomarkers in the analysis of to distinguish between types of hepatocytes exposed to cadmium massive genomics, proteomics cancer (136). This approach is (140). The approach demonstrates and metabolomics data (Cf. Figure based on Fisher’s ratio (a measure the value of integrating omics data 7.4). Bioinformatics techniques for the linear discriminative power with associated biological data to will continue to be useful in of variables) to select features glean more information about the organizing and extracting candidate “wrapped” with a classifier (hence, biomarker’s diagnostic utility. More biomarkers for chemical exposures the procedure is called FR-Wrapper) recently, a bioinformatics approach and disease for epidemiology, to perform predictions. With these was introduced that takes into clinical and experimental studies. hybrid approaches, the two main account the inherent correlation of However, mere access to objectives in biomarker discovery genes when using gene expression sophisticated bioinformatics tools are met: 1) the identification of a data for biomarker discovery (141). will be insufficient to grapple with

Unit 2 • Chapter 7. Platforms for biomarker analysis using high-throughput approaches 135 in genomics, transcriptomics, proteomics, metabolomics, and bioinformatics the identification of biomarkers from omics data (Figure 7.5). An ongoing and vigorous debate has emerged over the use and reproducibility of bioinformatics approaches and omics data for biomarker discovery and clinical applications (145,146). Clearly there is a need for rigorous quality control in the field of bioinformatics for the use of omics type data in clinical, diagnostic and http://www.genomesonline.org/index http://www.ncbi.nlm.nih.gov/Genbank http://www.ebi.ac.uk/embl/ http://www.ncbi.nlm.nih.gov/geo http://www.ebi.ac.uk/arrayexpress http://www.fda.gov/ScienceResearch/ BioinformaticsTools/Arraytrack/default.htm http://genome-www.stanford.edu/microarray http://www.oncomine.org http://edge.oncology.wisc.edu/ http://ca.expasy.org/world-2dpage http://www.plasmaproteomedatabase.org regulatory settings (Lyle Burgoon, URL personal communication). A fundamental understanding of the inherent problems and issues with omics data, and knowing how, where and when to apply which type of bioinformatics approach, are essential to effectively translating omics biomarkers into clinically useful diagnostic tools and epidemiological markers. An integrated solution for managing, analysing and and analysing managing, solution for An integrated MIAME-compliantinterpreting microarray gene expression data from pharmaco- and studies Description Gene expression/ molecular abundance repository supporting MIAME-compliant data A dynamic portal to query simultaneously worldwide proteomics databases Genetic sequence database sequence Genetic Nucleotide sequence database Repository MIAME-compliant for microarray data Microarray-based MIAME-compliant expression gene expression gene examining and mining for Database cancerin Toxicology-related gene expression data for mapping transcriptional changes from chemical exposure A comprehensive resource for all human plasma proteins along with their isoforms Completed and ongoing genome projects genome ongoing and Completed FDA-NCTR Host NIH-NCBI Expert Protein Analysis System (ExPASy) NIH-NCBI EMBL-EBI EMBL-EBI Michigan of University Wisconsin-Madison of University Pandey Lab and Institute of Bioinformatics Genomesonline.org TM Resource Gene ExpressionGene (GEO) Omnibus World-2DPAGE GenBank EMBL-Bank ArrayExpress ArrayTrack Stanford Microarray (SMD) Database Stanford University ONCOMINE Gene and Environment, Expression (EDGE) Database Protein Plasma Genomes OnLine (GOLD) Database Data Type Data Transcriptomics Proteomics Genomics Figure Online bioinformatics 7.5. resources various omics for fields, including genomics, transcriptomics, proteomics, metabolomics and

136 http://apropos.icmb.utexas.edu/OPD http://metlin.scripps.edu http://www.hmdb.ca/ http://cebs.niehs.nih.gov/ http://www.pharmgkb.org/ http://www.genome.jp/kegg URL Unit 2 Chapter 7 Chapter Description A repositoryA spectral mass data for metabolite Integrates study design, clinical pathology and histopathology, gene expression and proteomics data discrimination enables critical and studies of from all factorsstudy Database of small metabolites found in the human body body human the in metabolites found small of Database chemical,containing linking clinical or molecular and biology/biochemistry data Collects, encodes and disseminates knowledge about the impact of human genetic variations on drug responses. Provides curated primary genotype and phenotype data, annotated gene variants, and gene- and review, literature drug-disease via relationships summarizes important PGx genes and drug pathways KEGG is an integrated database that relates genes to functional outlines relationships metabolic pathways; among genes; disease-associated genes; pharmaceuticals-targeted genes For storing and disseminating mass spectrometry- mass disseminating and storing For based proteomics data Host The Scripps Research Institute Scripps Research The NIH-NIEHS Genome Alberta and Genome Canada Stanford Japan University, Kyoto Resource METLIN Metabolite Database Chemical Effects in Biological (CEBS) Systems Human MetabolomeHuman Database (HMDB) PharmGKB genes of encyclopaedia Kyoto and geomics pathways (KEGG) Open Proteomics Database (OPD) University at Austin of Texas Data Type Data Metabolomics Integrated

Unit 2 • Chapter 7. Platforms for biomarker analysis using high-throughput approaches 137 in genomics, transcriptomics, proteomics, metabolomics, and bioinformatics References

1. Mundy C (2001). The : 14. Middeldorp A, Jagmohan-Changur S, 25. Bammler T, Beyer RP, Bhattacharya a historical perspective. Pharmacogenomics, Helmer Q et al. (2007). A procedure for the S et al.; Members of the Toxicogenomics 2:37–49.doi:10.1517/14622416.2.1.37 PMID: detection of linkage with high density SNP Research Consortium (2005). Standardizing 11258196 arrays in a large pedigree with colorectal global gene expression analysis between cancer. BMC Cancer, 7:6.doi:10.1186/1471- laboratories and across platforms. Nat 2. Thorisson GA, Smith AV, Krishnan L, Stein 2407-7-6 PMID:17222328 Methods, 2:351–356.doi:10.1038/nmeth0605- LD (2005). The International HapMap Project 477a PMID:15846362 Web site. Genome Res, 15:1592–1593. 15. Giacomini KM, Brett CM, Altman RB et doi:10.1101/gr.4413105 PMID:16251469 al.; Pharmacogenetics Research Network 26. Irizarry RA, Warren D, Spencer F et al. (2007). The pharmacogenetics research (2005). Multiple-laboratory comparison of 3. Liolios K, Tavernarakis N, Hugenholtz P, network: from SNP discovery to clinical drug microarray platforms. Nat Methods, 2:345– Kyrpides NC (2006). The Genomes On Line response. Clin Pharmacol Ther, 81:328–345. 350.doi:10.1038/nmeth756 PMID:15846361 Database (GOLD) v.2: a monitor of genome doi:10.1038/sj.clpt.6100087 PMID:17339863 projects worldwide. Nucleic Acids Res, 34 27. Ulrich RG, Rockett JC, Gibson GG, Pettit Database issue;D332–D334.doi:10.1093/nar/ 16. Mateuca R, Lombaert N, Aka PV et al. SD (2004). Overview of an interlaboratory gkj145 PMID:16381880 (2006). Chromosomal changes: induction, collaboration on evaluating the effects of model detection methods and applicability in hepatotoxicants on hepatic gene expression. 4. Csako G (2006). Present and future of rapid human biomonitoring. Biochimie, 88:1515– Environ Health Perspect, 112:423–427. and/or high-throughput methods for nucleic acid 1531.doi:10.1016/j.biochi.2006.07.004 doi:10.1289/ehp.6675 PMID:15033591 testing. Clin Chim Acta, 363:6–31.doi:10.1016/j. PMID:16919864 cccn.2005.07.009 PMID:161027 38 28. Whitney AR, Diehn M, Popper SJ et al. 17. Gunderson KL, Steemers FJ, Ren H et al. (2003). Individuality and variation in gene 5. Royds JA, Iacopetta B (2006). p53 and (2006). Whole-genome genotyping. Methods expression patterns in human blood. Proc Natl disease: when the guardian angel fails. Cell Enzymol, 410:359–376.doi:10.1016/S0076- Acad Sci USA, 100:1896–1901.doi:10.1073/ Death Differ, 13:1017–1026.doi:10.1038/sj. 6879(06)10017-8 PMID:16938560 pnas.252784499 PMID:12578971 cdd.4401913 PMID:16557268 18. Pinard R, de Winter A, Sarkis GJ et 29. Bennett L, Palucka AK, Arce E et al. (2003). 6. Taylor AM, Byrd PJ (2005). Molecular al. (2006). Assessment of whole genome Interferon and granulopoiesis signatures in pathology of ataxia telangiectasia. J Clin amplification-induced bias through high- systemic lupus erythematosus blood. J Exp Pathol, 58:1009–1015.doi:10.1136/jcp.2005. throughput, massively parallel whole Med, 197:711–723.doi:10.1084/jem.20021553 026062 PMID:16189143 genome sequencing. BMC Genomics, 7:216. PMID:12642603 doi:10.1186/1471-2164-7-216 PMID:16928277 7. Nowrousian M (2010). Next-generation 30. Borovecki F, Lovrecic L, Zhou J et al. sequencing techniques for eukaryotic 19. Jongeneel CV, Iseli C, Stevenson BJ et (2005). Genome-wide expression profiling microorganisms: sequencing-based solutions al. (2003). Comprehensive sampling of gene of human blood reveals biomarkers for to biological problems. Eukaryot Cell, expression in human cell lines with massively Huntington’s disease. Proc Natl Acad Sci 9:1300–1310.doi:10.1128/EC.00123-10 PMID: parallel signature sequencing. Proc Natl USA, 102:11023–11028.doi:10.1073/pnas.05 20601439 Acad Sci USA, 100:4702–4705.doi:10.1073/ 04921102 PMID:16043692 pnas.0831040100 PMID:12671075 8. Rando OJ (2007). Chromatin structure in 31. Burczynski ME, Twine NC, Dukart G et al. the genomics era. Trends Genet, 23:67–73. 20. Brown PO, Botstein D (1999). Exploring (2005). Transcriptional profiles in peripheral doi:10.1016/j.tig.2006.12.002 PMID:17188397 the new world of the genome with DNA blood mononuclear cells prognostic of clinical 9. Metzker ML (2010). Sequencing microarrays. Nat Genet, 21 Suppl;33–37.doi: outcomes in patients with advanced renal cell technologies - the next generation. Nat Rev 10.1038/4462 PMID:9915498 carcinoma. Clin Cancer Res, 11:1181–1189. PMID:15709187 Genet, 11:31–46.doi:10.1038/nrg2626 PMID: 21. Davis TN (2004). Protein localization in 19997069 proteomics. Curr Opin Chem Biol, 8:49–53. 32. Fannin RD, Auman JT, Bruno ME et 10. Bentley DR (2006). Whole-genome doi:10.1016/j.cbpa.2003.11.003 PMID:150361 al. (2005). Differential gene expression re-sequencing. Curr Opin Genet Dev, 56 profiling in whole blood during acute systemic in lipopolysaccharide- 16:545–552.doi:10.1016/j.gde.2006.10.009 22. Lockhart DJ, Dong H, Byrne MC et al. PMID:17055251 treated rats. Physiol Genomics, 21:92–104. (1996). Expression monitoring by hybridization doi:10.1152/physiolgenomics.00190.2004 11. Bates S (2010). Progress towards to high-density oligonucleotide arrays. Nat PMID:15781589 personalized medicine. Drug Discov Today, Biotechnol, 14:1675–1680.doi:10.1038/nbt12 15:115–120.doi:10.1016/j.drudis.2009.11.001 96-1675 PMID:9634850 33. Heller RA, Schena M, Chai A et PMID:19914397 al. (1997). Discovery and analysis of 23. Schena M, Shalon D, Davis RW, Brown inflammatory disease-related genes using 12. Steemers FJ, Gunderson KL (2007). Whole PO (1995). Quantitative monitoring of gene cDNA microarrays. Proc Natl Acad Sci USA, genome genotyping technologies on the expression patterns with a complementary 94:2150–2155.doi:10.1073/pnas.94.6.2150 BeadArray platform. Biotechnol J, 2:41–49. DNA microarray. Science, 270:467–470. PMID:9122163 doi:10.1002/biot.200600213 PMID:17225249 doi:10.1126/science.270.5235.467 PMID:75 69999 34. Liew CC, Dzau VJ (2004). Molecular 13. Ku CS, Loy EY, Pawitan Y, Chia KS (2010). genetics and genomics of . Nat The pursuit of genome-wide association 24. Fodor SP, Read JL, Pirrung MC et al. Rev Genet, 5:811–825.doi:10.1038/nrg1470 studies: where are we now? J Hum Genet, (1991). Light-directed, spatially addressable PMID:15520791 55:195–206.doi:10.1038/jhg.2010.19 PMID: parallel chemical synthesis. Science, 20300123 251:767–773.doi:10.1126/science.1990438 PMID:1990438

138 35. Sullivan PF, Fan C, Perou CM (2006). 48. Hardiman G (2008). Applications 61. Mueller LN, Brusniak MY, Mani DR, Evaluating the comparability of gene of microarrays and biochips in Aebersold R (2008). An assessment of expression in blood and brain. Am J Med Genet pharmacogenomics. Methods Mol Biol, 448: software solutions for the analysis of mass B Neuropsychiatr Genet, 141B:261–268. 21–30.doi:10.1007/978-1-59745-205-2_2 spectrometry based quantitative proteomics doi:10.1002/ajmg.b.30272 PMID:16526044 PMID:18370228 data. J Proteome Res, 7:51–61.doi:10.1021/ pr700758r PMID:18173218 36. Tang Y, Lu A, Aronow BJ, Sharp FR 49. Allison M (2008). Is personalized medicine (2001). Blood genomic responses differ after finally arriving? Nat Biotechnol, 26:509 – 517. 62. Petricoin EF, Liotta LA (2004). SELDI- stroke, seizures, hypoglycemia, and hypoxia: doi:10.1038/nbt0508-509 PMID:18464779 TOF-based serum proteomic pattern blood genomic fingerprints of disease. Ann diagnostics for early detection of cancer. Neurol, 50:699–707.doi:10.1002/ana.10042 50. Vineis P, Perera FP (2007). Molecular Curr Opin Biotechnol, 15:24–30.doi:10.1016/j. PMID:11761467 epidemiology and biomarkers in etiologic copbio.2004.01.005 PMID:15102462 cancer research: the new in light of the old. 37. Argos M, Kibriya MG, Parvez F et al. Cancer Epidemiol Biomarkers Prev, 16:1954– 63. Engwegen JY, Gast MC, Schellens JH, (2006). Gene expression profiles in peripheral 1965.doi:10.1158/1055-9965.EPI-07-0457 Beijnen JH (2006). Clinical proteomics: lymphocytes by arsenic exposure and skin PMID:17932342 searching for better tumour markers with lesion status in a Bangladeshi population. SELDI-TOF mass spectrometry. Trends Cancer Epidemiol Biomarkers Prev, 15:1367– 51. Wild CP (2009). Environmental exposure Pharmacol Sci, 27:251–259.doi:10.1016/j.tips. 1375.doi:10.1158/1055-9965.EPI-06-0106 measurement in cancer epidemiology. 2006.03.003 PMID:16600386 PMID:16835338 Mutagenesis, 24:117–125.doi:10.1093/mutage /gen061 PMID:19033256 64. Cutler P (2003). Protein arrays: the

38. Bushel PR, Heinloth AN, Li J et al. (2007). current state-of-the-art. Proteomics, 3:3–18. Unit 2 Blood gene expression signatures predict 52. Wetmore BA, Merrick BA (2004). doi:10.1002/pmic.200390007 PMID:12548629 exposure levels. Proc Natl Acad Sci USA, Toxicoproteomics: proteomics applied to 104:18211–18216.doi:10.1073/pnas.0706 toxicology and pathology. Toxicol Pathol, 65. Sheehan KM, Calvert VS, Kay EW et 7 Chapter 987104 PMID:17984051 32:619–642.doi:10.1080/019262304905182 al. (2005). Use of reverse phase protein 44 PMID:15580702 microarrays and reference standard 39. Forrest MS, Lan Q, Hubbard AE et al. development for molecular network analysis (2005). Discovery of novel biomarkers by 53. Kline KG, Sussman MR (2010). Protein of metastatic ovarian carcinoma. Mol Cell microarray analysis of peripheral blood quantitation using isotope-assisted mass Proteomics, 4:346–355.doi:10.1074/mcp.T50 mononuclear cell gene expression in spectrometry. Annu Rev Biophys, 39:291–308. 0003-MCP200 PMID:15671044 benzene-exposed workers. Environ Health doi:10.1146/annurev.biophys.093008.131339 Perspect, 113:801–807.doi:10.1289/ehp.7635 PMID:20462376 66. Schwenk JM, Lindberg J, Sundberg PMID:15929907 M et al. (2007). Determination of binding 54. Sanchez-Carbayo M (2010). Antibody specificities in highly multiplexed bead-based 40. Lampe JW, Stepaniants SB, Mao M array-based technologies for cancer protein assays for antibody proteomics. Mol Cell et al. (2004). Signatures of environmental profiling and functional proteomic analyses Proteomics, 6:125–132. PMID:17060675 exposures using peripheral leukocyte using serum and tissue specimens. Tumour gene expression: tobacco smoke. Cancer Biol, 31:103–112.doi:10.1007/s13277-009-00 67. Daemen A, Gevaert O, De Bie T et al. Epidemiol Biomarkers Prev, 13:445–453. 14-z PMID:20358423 (2008). Integrating microarray and proteomics PMID:15006922 data to predict the response on 55. Righetti PG, Castagna A, Antonucci in patients with rectal cancer. Pac Symp 41. Fannin RD, Russo M, O’Connell TM et F et al. (2004). Critical survey of Biocomput, 166–177. PMID:18229684 al. (2010). Acetaminophen dosing of humans quantitative proteomics in two-dimensional results in blood transcriptome and metabolome electrophoretic approaches. J Chromatogr A, 68. Vermeulen R, Lan Q, Zhang L et al. (2005). changes consistent with impaired oxidative 1051:3–17.doi:10.1016/j.chroma.2004.05.106 Decreased levels of CXC-chemokines in phosphorylation. Hepatology, 51:227–236. PMID:15532550 serum of benzene-exposed workers identified PMID:19918972 by array-based proteomics. Proc Natl Acad 56. Yates JR 3rd (2004). Mass spectral Sci USA, 102:17041–17046.doi:10.1073/pnas. 42. Afshari CA, Nuwaysir EF, Barrett JC (1999). analysis in proteomics. Annu Rev Biophys 0508573102 PMID:16286641 Application of complementary DNA microarray Biomol Struct, 33:297–316.doi:10.1146/ technology to carcinogen identification, annurev.biophys.33.111502.082538 PMID: 69. Yang M, Berhane Y, Salo T et al. (2008). toxicology, and drug safety evaluation. Cancer 15139815 Development and application of monoclonal Res, 59:4759–4760. PMID:10519378 antibodies against avian influenza virus 57. Macdonald N, Chevalier S, Tonge R et nucleoprotein. J Virol Methods, 147:265–274. 43. Zilberman D, Henikoff S (2007). Genome- al. (2001). Quantitative proteomic analysis doi:10.1016/j.jviromet.2007.09.016 PMID:180 wide analysis of DNA methylation patterns. of mouse liver response to the peroxisome 06085 Development, 134:3959–3965.doi:10.1242/ proliferator diethylhexylphthalate (DEHP). dev.001131 PMID:17928417 Arch Toxicol, 75:415–424.doi:10.1007/s0020 70. Cho WC (2007). Proteomics technologies 40100259 PMID:11693183 and challenges. Genomics Proteomics 44. Esteller M (2008). Epigenetics in cancer. Bioinformatics, 5:77–85.doi:10.1016/S1672- N Engl J Med, 358:1148–1159.doi:10.1056/ 58. Neubert H, Bonnert TP, Rumpel K et al. 0229(07)60018-7 PMID:17893073 NEJMra072067 PMID:18337604 (2008). Label-free detection of differential protein expression by LC/MALDI mass 71. Chung JY, Braunschweig T, Tuttle K, 45. Hoheisel JD (2006). Microarray spectrometry. J Proteome Res, 7:2270–2279. Hewitt SM (2007). Tissue microarrays as a technology: beyond transcript profiling and doi:10.1021/pr700705u PMID:18412385 platform for proteomic investigation. J Mol genotype analysis. Nat Rev Genet, 7:200– Histol, 38:123–128.doi:10.1007/s10735-006- 210.doi:10.1038/nrg1809 PMID:16485019 59. Ong SE, Pandey A (2001). An 9049-2 PMID:16953460 evaluation of the use of two-dimensional gel 46. McCann JA, Muro EM, Palmer C et al. electrophoresis in proteomics. Biomol Eng, 72. Merrick BA (2008). The plasma proteome, (2007). ChIP on SNP-chip for genome-wide 18:195–205.doi:10.1016/S1389-0344(01)000 adductome and idiosyncratic toxicity in analysis of human histone H4 hyperacetylation. 95-8 PMID:11911086 toxicoproteomics research. Brief Funct BMC Genomics, 8:322.doi:10.1186/1471- Genomic Proteomic, 7:35–49.doi:10.1093/ 2164-8-322 PMID:17868463 60. Turner SM (2006). Stable isotopes, bfgp/eln004 PMID:18270218 mass spectrometry, and molecular fluxes: 47. Beaudet AL, Belmont JW (2008). Array- applications to toxicology. J Pharmacol 73. Kumar S, Mohan A, Guleria R (2006). based DNA diagnostics: let the revolution Toxicol Methods, 53:75–85.doi:10.1016/j. Biomarkers in cancer screening, research begin. Annu Rev Med, 59:113–129. vascn.2005.08.001 PMID:16213756 and detection: present and future: a review. doi:10.1146/annurev.med.59.012907.101800 Biomarkers, 11:385–405.doi:10.1080/13547 PMID:17961075 500600775011 PMID:16966157

Unit 2 • Chapter 7. Platforms for biomarker analysis using high-throughput approaches 139 in genomics, transcriptomics, proteomics, metabolomics, and bioinformatics 74. Lemley KV (2007). An introduction to 86. Dunn S, Simenhoff M, Ahmed K et al. 98. Aiken NR, Gillies RJ (1996). biomarkers: applications to chronic kidney (1998). Effect of oral administration of freeze- Phosphomonoester metabolism as a function disease. Pediatr Nephrol, 22:1849–1859. dried Lactobacillus acidophilus on small bowel of cell proliferative status and exogenous doi:10.1007/s00467-007-0455-9 PMID:17394 bacterial overgrowth in patients with end precursors. Anticancer Res, 16 3B;1393– 023 stage kidney disease: reducing uremic toxins 1397. PMID:8694507 and improving nutrition. Int Dairy J, 8:545– 75. Merrick BA, Bruno ME (2004). Genomic 553 doi:10.1016/S0958-6946(98)00081-8. 99. Singer S, Souza K, Thilly WG (1995). and proteomic profiling for biomarkers and Pyruvate utilization, phosphocholine and signature profiles of toxicity. Curr Opin Mol 87. London R, Houck D. Introduction to (ATP) are markers Ther, 6:600–607. PMID:15663324 metabolomics and metabolic profiling. of human breast tumor progression: a 31P- In: Hamadeh HK, Afshari CA, editors. and 13C-nuclear magnetic resonance (NMR) 76. Azmi J, Connelly J, Holmes E et al. (2005). Toxicogenomics: principles and applications. spectroscopy study. Cancer Res, 55:5140– Characterization of the biochemical effects Hoboken (NJ): Wiley-LSS; 2004. p. 299–340. 5145. PMID:7585561 of 1-nitronaphthalene in rats using global metabolic profiling by NMR spectroscopy and 88. Chalmers RA, Valman HB, 100. Chen H, Pan Z, Talaty N et al. (2006). pattern recognition. Biomarkers, 10:401–416. Liberman MM (1979). Measurement of Combining desorption electrospray ionization doi:10.1080/13547500500309259 PMID:1630 4-hydroxyphenylacetic aciduria as a mass spectrometry and nuclear magnetic 8265 screening test for small-bowel disease. Clin resonance for differential metabolomics Chem, 25:1791–1794. PMID:476929 without sample preparation. Rapid Commun 77. Walsh MC, Brennan L, Malthouse JP et al. Mass Spectrom, 20:1577–1584.doi:10.1002/ (2006). Effect of acute dietary standardization 89. Ghauri FY, McLean AE, Beales D et al. rcm.2474 PMID:16628593 on the urinary, plasma, and salivary (1993). Induction of 5-oxoprolinuria in the metabolomic profiles of healthy humans. Am J rat following chronic feeding with N-acetyl 101. Süllentrop F, Moka D, Neubauer S et Clin Nutr, 84:531–539. PMID:16960166 4-aminophenol (paracetamol). Biochem al. (2002). 31P NMR spectroscopy of blood Pharmacol, 46:953–957.doi:10.1016/0006- plasma: determination and quantification of 78. Chace DH, Kalas TA, Naylor EW (2002). 2952(93)90506-R PMID:8373447 phospholipid classes in patients with renal The application of tandem mass spectrometry cell carcinoma. NMR Biomed, 15:60–68. to neonatal screening for inherited disorders 90. Soga T, Baran R, Suematsu M et al. (2006). doi:10.1002/nbm.758 PMID:11840554 of intermediary metabolism. Annu Rev Differential metabolomics reveals ophthalmic Genomics Hum Genet, 3:17–45.doi: 10.1146/ acid as an oxidative stress biomarker 102. Frickenschmidt A, Frohlich H, Bullinger annurev.genom.3.022502.103213 PMID:1214 indicating hepatic glutathione consumption. D et al. (2008). Metabonomics in cancer 2359 J Biol Chem, 281:16768–16776.doi:10.1074/ diagnosis: mass spectrometry-based jbc.M601876200 PMID:16608839 profiling of urinary nucleosides from breast 79. Sim KG, Hammond J, Wilcken B (2002). cancer patients. Biomarkers, 13:435–449. Strategies for the diagnosis of mitochondrial 91. Yeung KY, Ruzzo WL (2001). Principal doi:10.1080/13547500802012858 PMID:1848 beta-oxidation disorders. Clin component analysis for clustering gene 4357 Chim Acta, 323:37–58.doi:10.1016/S0009- expression data. Bioinformatics, 17:763–774. 8981(02)00182-1 PMID:12135806 doi:10.1093/bioinformatics/17.9.763 PMID:11 103. Ohdoi C, Nyhan WL, Kuhara T (2003). 590094 Chemical diagnosis of Lesch-Nyhan 80. Want EJ, O’Maille G, Smith CA et al. syndrome using gas chromatography-mass (2006). Solvent-dependent metabolite 92. Abe T, Isobe C, Murata T et al. spectrometry detection. J Chromatogr B distribution, clustering, and protein extraction (2003). Alteration of 8-hydroxyguanosine Analyt Technol Biomed Life Sci, 792:123–130. for serum profiling with mass spectrometry. concentrations in the cerebrospinal fluid and doi:10.1016/S1570-0232(03)00277-0 PMID: Anal Chem, 78:743–752.doi:10.1021/ac0513 serum from patients with Parkinson’s disease. 12829005 12t PMID:16448047 Neurosci Lett, 336:105–108.doi:10.1016/ S0304-3940(02)01259-4 PMID:12499051 104. Seibel MJ, Gartenberg F, Silverberg 81. Gamache PH, Meyer DF, Granger MC, SJ et al. (1992). Urinary hydroxypyridinium Acworth IN (2004). Metabolomic applications 93. Abe T, Tohgi H, Isobe C et al. (2002). cross-links of collagen in primary of electrochemistry/mass spectrometry. J Remarkable increase in the concentration of hyperparathyroidism. J Clin Endocrinol Am Soc Mass Spectrom, 15:1717–1726. 8-hydroxyguanosine in cerebrospinal fluid Metab, 74:481–486.doi:10.1210/jc.74.3.481 doi:10.1016/j.jasms.2004.08.016 PMID:15589 from patients with Alzheimer’s disease. J PMID:1740480 749 Neurosci Res, 70:447–450.doi:10.1002/jnr. 10349 PMID:12391605 105. Gineyts E, Garnero P, Delmas PD (2001). 82. Kaderbhai NN, Broadhurst DI, Ellis DI et Urinary excretion of glucosyl-galactosyl al. (2003). via metabolic 94. Nekrassova O, Lawrence NS, Compton pyridinoline: a specific biochemical marker footprinting: monitoring metabolite secretion RG (2003). Analytical determination of of synovium degradation. Rheumatology by Escherichia coli tryptophan metabolism homocysteine: a review. Talanta, 60:1085– (Oxford), 40:315–323.doi:10.1093/ mutants using FT-IR and direct injection 1095.doi:10.1016/S0039-9140(03)00173-5 rheumatology/40.3.315 PMID:11285380 electrospray mass spectrometry. Comp Funct PMID:18969134 Genomics, 4:376–391.doi:10.1002/cfg.302 106. Gineyts E, Mo JA, Ko A et al. (2004). 95. Chen YQ, Duniec ZM, Liu B et al. (1994). PMID:18629082 Effects of ibuprofen on molecular markers of Endogenous 12(S)-HETE production by cartilage and synovium turnover in patients 83. Lindon JC, Holmes E, Nicholson JK (2006). tumor cells and its role in metastasis. Cancer with knee osteoarthritis. Ann Rheum Dis, Metabonomics techniques and applications Res, 54:1574–1579. PMID:7511046 63:857–861.doi:10.1136/ard.2003.007302 to pharmaceutical research & development. PMID:15194584 Pharm Res, 23:1075–1088.doi:10.1007/s110 96. Liu B, Marnett LJ, Chaudhary A et al. (1994). 95-006-0025-z PMID:16715371 Biosynthesis of 12(S)-hydroxyeicosatetraenoic acid by B16 amelanotic melanoma cells is a 107. MacAllister RJ, Whitley GSJ, Vallance 84. Nicholson JK, Connelly J, Lindon JC, determinant of their metastatic potential. Lab P (1994). Effects of guanidino and uremic Holmes E (2002). Metabonomics: a platform Invest, 70:314–323. PMID:8145526 compounds on nitric oxide pathways. Kidney for studying drug toxicity and gene function. Int, 45:737–742.doi:10.1038/ki.1994.98 PMID: Nat Rev Drug Discov, 1:153–161.doi:10.1038/ 97. Winer I, Normolle DP, Shureiqi I et al. (2002). 7515129 nrd728 PMID:12120097 Expression of 12-lipoxygenase as a biomarker for melanoma carcinogenesis. Melanoma 108. Zoccali C, Bode-Böger SM, Mallamaci 85. Shockcor JP, Holmes E (2002). Res, 12:429–434.doi:10.1097/00008390- F et al. (2001). Plasma concentration of Metabonomic applications in toxicity screening 200209000-00003 PMID:12394183 asymmetrical dimethylarginine and mortality and disease diagnosis. Curr Top Med Chem, in patients with end-stage renal disease: 2:35–51.doi:10.2174/1568026023394498 a prospective study. Lancet, 358:2113 – PMID:11899064 2117.doi:10.1016/S0140-6736(01)07217-8 PMID:11784625

140 109. Miyazaki H, Matsuoka H, Cooke JP et 121. Azuaje F. Bioinformatics and biomarker 133. Perou CM, Sørlie T, Eisen MB et al. al. (1999). Endogenous nitric oxide synthase discovery: “omic” data analysis for (2000). Molecular portraits of human breast inhibitor: a novel marker of atherosclerosis. personalized medicine. Hoboken (NJ): Wiley- tumours. Nature, 406:747–752.doi:10.1038/35 Circulation, 99:1141–1146. PMID:10069780 Blackwell; 2010. 021093 PMID:10963602

110. Salek RM, Maguire ML, Bentley E et al. 122. Ginsburg GS, Haga SB (2006). 134. Li L, Umbach DM, Terry P, Taylor JA (2007). A metabolomic comparison of urinary Translating genomic biomarkers into clinically (2004). Application of the GA/KNN method changes in type 2 diabetes in mouse, rat, useful diagnostics. Expert Rev Mol Diagn, to SELDI proteomics data. Bioinformatics, and human. Physiol Genomics, 29:99–108. 6:179–191.doi:10.1586/14737159.6.2.179 20:1638–1640.doi:10.1093/bioinformatics/ PMID:17190852 PMID:16512778 bth098 PMID:14962943

111. Ringeissen S, Connor SC, Brown 123. Rhodes DR, Yu J, Shanker K et al. 135. Li L, Weinberg CR, Darden TA, Pedersen HR et al. (2003). Potential urinary and (2004). ONCOMINE: a cancer microarray LG (2001). Gene selection for sample plasma biomarkers of peroxisome database and integrated data-mining classification based on gene expression data: proliferation in the rat: identification of platform. Neoplasia, 6:1–6. PMID:15068665 study of sensitivity to choice of parameters of N-methylnicotinamide and N-methyl-4- the GA/KNN method. Bioinformatics, 17:1131– pyridone-3-carboxamide by 1H nuclear 124. Baggerly KA, Coombes KR, Morris JS 1142.doi:10.1093/bioinformatics/17.12.1131 magnetic resonance and high performance (2005). Bias, randomization, and ovarian PMID:11751221 liquid chromatography. Biomarkers, 8:240– proteomic data: a reply to “producers and 271.doi:10.1080/1354750031000149124 consumers.”. Cancer Inform, 1:9–14. 136. Peng Y (2006). A novel ensemble for robust microarray data PMID:12944176 125. Liu J, Zheng S, Yu JK et al. (2005). classification. Comput Biol Med, 36:553–573. Unit 2 112. Connor SC, Hodson MP, Ringeissen S Serum protein fingerprinting coupled with doi:10.1016/j.compbiomed.2005.04.001 et al. (2004). Development of a multivariate artificial neural network distinguishes glioma PMID:15978569 Chapter 7 Chapter statistical model to predict peroxisome from healthy population or brain benign tumor. proliferation in the rat, based on urinary J Zhejiang Univ Sci B, 6:4–10.doi:10.1631/ 137. Fushiki T, Fujisawa H, Eguchi S (2006). 1H-NMR spectral patterns. Biomarkers, jzus.2005.B0004 PMID:15593384 Identification of biomarkers from mass spectrometry data using a “common” peak 9:364–385.doi:10.1080/13547500400006005 126. Yang YH, Dudoit S, Luu P et al. (2002). PMID:15764299 approach. BMC Bioinformatics, 7:358.doi:10. Normalization for cDNA microarray data: a 1186/1471-2105-7-358 PMID:16869977 113. Chalmers RA, Jones MG, Goodwin CS, robust composite method addressing single Amjad S (2006). CFSUM1 and CFSUM2 and multiple slide systematic variation. 138. Qiu P, Wang ZJ, Liu KJ et al. (2007). in urine from patients with chronic fatigue Nucleic Acids Res, 30:e15.doi:10.1093/nar/30. Dependence network modeling for biomarker syndrome are methodological artefacts. Clin 4.e15 PMID:11842121 identification. Bioinformatics, 23:198–206. Chim Acta, 364:148–158.doi:10.1016/j.cccn. doi:10.1093/bioinformatics/btl553 PMID:1707 127. Irizarry RA, Hobbs B, Collin F et al. (2003). 7095 2005.05.036 PMID:16095585 Exploration, normalization, and summaries of 114. Jones MG, Cooper E, Amjad S et al. high density oligonucleotide array probe level 139. Chu W, Ghahramani Z, Falciani F, Wild (2005). Urinary and plasma organic acids and data. Biostatistics, 4:249–264.doi:10.1093/ DL (2005). Biomarker discovery in microarray amino acids in chronic fatigue syndrome. Clin biostatistics/4.2.249 PMID:12925520 gene expression data with Gaussian processes. Bioinformatics, 21:3385–3393. Chim Acta, 361:150–158.doi:10.1016/j.cccn. 128. Bolstad BM, Irizarry RA, Astrand 2005.05.023 PMID:15992788 doi:10.1093/bioinformatics/bti526 PMID:1593 M, Speed TP (2003). A comparison of 7031 115. Brindle JT, Antti H, Holmes E et al. normalization methods for high density (2002). Rapid and noninvasive diagnosis of oligonucleotide array data based on variance 140. Tan Y, Shi L, Hussain SM et al. (2006). the presence and severity of coronary heart and bias. Bioinformatics, 19:185–193. Integrating time-course microarray gene disease using 1H-NMR-based metabonomics. doi:10.1093/bioinformatics/19.2.185 PMID:12 expression profiles with cytotoxicity for Nat Med, 8:1439–1445.doi:10.1038/nm802 538238 identification of biomarkers in primary PMID:12447357 rat hepatocytes exposed to cadmium. 129. Chou JW, Paules RS, Bushel PR Bioinformatics, 22:77–87.doi:10.1093/ 116. Kirschenlohr HL, Griffin JL, Clarke SC et (2005). Systematic variation normalization bioinformatics/bti737 PMID:16249259 al. (2006). Proton NMR analysis of plasma is in microarray data to get gene expression a weak predictor of coronary artery disease. comparison unbiased. J Bioinform Comput 141. Zuber V, Strimmer K (2009). Gene ranking Nat Med, 12:705–710.doi:10.1038/nm1432 Biol, 3:225–241.doi:10.1142/S021972000500 and biomarker discovery under correlation. PMID:16732278 1028 PMID:15852502 Bioinformatics, 25:2700–2707.doi:10.1093/ bioinformatics/btp460 PMID:19648135 117. Dunn WB (2008). Current trends 130. Petricoin EF 3rd, Ardekani AM, Hitt BA et and future requirements for the mass al. (2002). Use of proteomic patterns in serum 142. Joyce AR, Palsson BO (2006). The model spectrometric investigation of microbial, to identify ovarian cancer. Lancet, 359:572– organism as a system: integrating ‘omics’ mammalian and plant . Phys 577.doi:10.1016/S0140-6736(02)07746-2 data sets. Nat Rev Mol Cell Biol, 7:198–210. Biol, 5:011001.doi:10.1088/1478-3975/5/1/01 PMID:11867112 doi:10.1038/nrm1857 PMID:16496022 1001 PMID:18367780 131. Tan CS, Ploner A, Quandt A et al. 143. Waters M, Stasiewicz S, Merrick BA et al. 118. Luscombe NM, Greenbaum D, Gerstein (2006). Finding regions of significance in (2008). CEBS–Chemical Effects in Biological M (2001). What is bioinformatics? A proposed SELDI measurements for identifying protein Systems: a public data repository integrating definition and overview of the field. Methods biomarkers. Bioinformatics, 22:1515–1523. study design and toxicity data with microarray Inf Med, 40:346–358. PMID:11552348 doi:10.1093/bioinformatics/btl106 PMID:1656 and proteomics data. Nucleic Acids Res, 36 7365 Database;D892–D900.doi:10.1093/nar/gkm 119. Mount DW, Pandey R (2005). Using 755 PMID:17962311 bioinformatics and genome analysis for new 132. Shi L, Campbell G, Jones WD et al.; therapeutic interventions. Mol Cancer Ther, MAQC Consortium (2010). The MicroArray 144. Muthusamy B, Hanumanthu G, Suresh 4:1636–1643.doi:10.1158/1535-7163.MCT- Quality Control (MAQC)-II study of common S et al. (2005). Plasma Proteome Database 05-0150 PMID:16227414 practices for the development and validation as a resource for proteomics research. of microarray-based predictive models. Nat Proteomics, 5:3531–3536.doi:10.1002/pmic. 120. Li J, Zhang Z, Rosenzweig J et al. (2002). Biotechnol, 28:827–838.doi:10.1038/nbt.1665 200401335 PMID:16041672 Proteomics and bioinformatics approaches for PMID:20676074 identification of serum biomarkers to detect breast cancer. Clin Chem, 48:1296–1304. PMID:12142387

Unit 2 • Chapter 7. Platforms for biomarker analysis using high-throughput approaches 141 in genomics, transcriptomics, proteomics, metabolomics, and bioinformatics 145. Amin RP, Vickers AE, Sistare F et al. 148. Oh P, Li Y, Yu J et al. (2004). Subtractive 150. Tohgi H, Abe T, Yamazaki K et al. (2004). Identification of putative gene based proteomic mapping of the endothelial surface (1999). Remarkable increase in cerebrospinal markers of renal toxicity. Environ Health in lung and solid tumours for tissue-specific fluid 3-nitrotyrosine in patients with Perspect, 112:465–479.doi:10.1289/ehp.6683 therapy. Nature, 429:629–635.doi:10.1038/ sporadic amyotrophic lateral sclerosis. PMID:15033597 nature02580 PMID:15190345 Ann Neurol, 46:129–131.doi:10.1002/1531- 8249(199907)46:1<129::AID-ANA21>3.0.CO; 146. Searfoss GH, Jordan WH, Calligaro 149. Sawada H, Takami K, Asahi S (2005). 2-Y PMID:10401792 DO et al. (2003). Adipsin, a biomarker A toxicogenomic approach to drug-induced of gastrointestinal toxicity mediated by a phospholipidosis: analysis of its induction functional gamma-secretase inhibitor. J Biol mechanism and establishment of a novel Chem, 278:46107–46116.doi:10.1074/jbc. in vitro screening system. Toxicol Sci, M307757200 PMID:12949072 83:282–292.doi:10.1093/toxsci/kfh264 PMID: 15342952 147. McDonough JL, Arrell DK, Van Eyk JE (1999). Troponin I degradation and covalent complex formation accompanies myocardial ischemia/reperfusion injury. Circ Res, 84:9– 20. PMID:9915770

142