Statistical Methods for Comparative Phenomics Using High-Throughput Phenotype Microarrays
Total Page:16
File Type:pdf, Size:1020Kb
The International Journal of Biostatistics Volume 6, Issue 1 2010 Article 29 Statistical Methods for Comparative Phenomics Using High-Throughput Phenotype Microarrays Joseph Sturino, Texas A&M University Ivan Zorych, Texas A&M University Bani Mallick, Texas A&M University Karina Pokusaeva, Texas A&M University Ying-Ying Chang, Texas A&M University Raymond J. Carroll, Texas A&M University Nikolay Bliznuyk, Texas A&M University Recommended Citation: Sturino, Joseph; Zorych, Ivan; Mallick, Bani; Pokusaeva, Karina; Chang, Ying-Ying; Carroll, Raymond J.; and Bliznuyk, Nikolay (2010) "Statistical Methods for Comparative Phenomics Using High-Throughput Phenotype Microarrays," The International Journal of Biostatistics: Vol. 6: Iss. 1, Article 29. DOI: 10.2202/1557-4679.1227 Statistical Methods for Comparative Phenomics Using High-Throughput Phenotype Microarrays Joseph Sturino, Ivan Zorych, Bani Mallick, Karina Pokusaeva, Ying-Ying Chang, Raymond J. Carroll, and Nikolay Bliznuyk Abstract We propose statistical methods for comparing phenomics data generated by the Biolog Phenotype Microarray (PM) platform for high-throughput phenotyping. Instead of the routinely used visual inspection of data with no sound inferential basis, we develop two approaches. The first approach is based on quantifying the distance between mean or median curves from two treatments and then applying a permutation test; we also consider a permutation test applied to areas under mean curves. The second approach employs functional principal component analysis. Properties of the proposed methods are investigated on both simulated data and data sets from the PM platform. KEYWORDS: functional data analysis, principal components, permutation tests, phenotype microarrays, high-throughput phenotyping, phenomics, Biolog Author Notes: Zorych and Bliznyuk were supported by the Texas A&M Postdoctoral Training Program of the National Cancer Institute (CA90301). The research of Carroll was supported by a grant from the National Cancer Institute (CA57030). Carroll and Mallick were also supported by Award Number KUS-CI-016-04, made by King Abdullah University of Science and Technology (KAUST). The research of Sturino was supported by the United States Department of Agriculture, Cooperative State Research, Education and Extension Service, Hatch project TEX 09436. Acquisition of the Biolog Omnilog Phenotype Microaarray was supported by the State of Texas Permanent University Fund with matching funds from Texas AgriLife Research and Texas A&M University. In addition, the authors would like to thank the editor and anonymous reviewers for their valuable comments. Sturino et al.: Statistical Methods for Comparative Phenomics 1 Introduction Phenomics is the systematic study of global cellular phenotypes that arise as a function of genotype (or metagenotype) and its environmental context (Gowen and F o n g , 2009). Differentiation of biological systems as a function of observ- able phenotype predates the discovery of their molecular components, which includes DNA (and its systems biology sub-discipline, genomics), epigenetic heritability (epigenomics), RNA (transcriptomics), proteins (proteomics), and metabolites (metabolomics). Nevertheless, due to the repetitive and labor intensive nature of phenotypic studies, cellular phenomics has struggled to be- come a vibrant functional discipline (Bochner, 2009; Joyce and P a l s s o n , 2006). Assay miniaturizations coupled with process automation have proven to befoundational advances that have enabled the rational design and imple- mentation of high-throughput phenotyping (HTP) platform chemistries. Auto- mated liquid handling systems can beused to accurately and precisely prepare low-volume microplate-based assays, while high-capacity and temperature- controlled automated microplate readers enable effectively parallel data col- lection (Gabrielson et al., 2002; Bochner, 2003). These technological advance- ments have paved the w a y for the development and commercialization of stan- dardized platform chemistries for HTP; see Gowen and F o n g (2009) for a review. The Phenotype MicroArray (PM) platform for HTP w a s developed b y Bochner et al. (2001) and commercialized b y Biolog, Inc., whose w e b site is http://www.biolog.com/; for reviews, see Bochner et al. (2009) and Bochner (2003a). The complete Phenotype MicroArray for microbial cells is comprised of t w e n t y pre-formulated 96-well microplates (PM1 to PM20). When used together, they enable researchers to simultaneously assay up to 1,920 different cellular phenotypes as a function of time (i.e., kinetic response). Individual PM microplates contain a large and heterogeneous collection of functionally- related c h e m i c a l compounds or combinations thereof (e.g., up to 96 perPM microplate). These compounds may serve as a source of carbon (PM1-2), nitrogen (PM3, PM6-PM8), and phosphorus or sulfur (PM4). Other PM microplates are used to determine sensitivity to environmental stresses, such as ions or osmolytes stress (PM9), pH (PM10), and c h e m i c a l agents (PM11-20) (Bochner, 2009). Using this platform, microbial strains may beassayed individually or in parallel (e.g., a wild-type strain v e r s u s an isogenetic m u t a n t ) and subse- quently compared for quantitative phenotypic differences as a function of time (i.e., kinetic activity). Many microplate-based HTP platforms monitor the accumulation of biomass directly as a function of time, which necessitates 1 The International Journal of Biostatistics, Vol. 6 [2010], Iss. 1, Art. 29 cellular growth and division. In contrast, the PM platform employs a univer- sal colorimetric reporter system to monitor cellular metabolism that is effec- tive even in the absence of the accumulation of biomass. This colorimetric reporter indirectly measures cellular metabolism b y directly measuring the irreversible reduction of tetrazolium-based dyes (colorless) to formazan (pur- ple, Gabrielson et al., 2002; Bochner, 2003a). Conceptually, maintenance of a respiration-competent state requires fewer physiological systems than does the maintenance of a replication-competent state. As such, this platform may afford a distinct advantage o v e r cultivation based platforms for the study of mi- croorganisms that may beviable, but not cultivatable ex vivo, which includes most of the microorganisms present in the gastrointestinal tract of h u m a n s and animals (Savage, 1977; Xu and Gordon, 2003). Standardized commercial chemistries for high-throughput phenotyping (HTP), such as the Biolog Phenotype Microarray (PM) platform, provide a universal platform to facilitate meta-analysis of phenomics data generated from the query of disparate biological systems. These recent advancements have resulted in the generation of novel high-dimension phenomic data sets. The OmniLog PM system software is used to visualize phenotype curves and provides several basic analytical func- tions. F o r example, the PM Kinetic Analysis Module can beused to generate a mean kinetic phenotype curve and to (optionally) amend it (e.g., subtract background signal, crop/trim early or late time points,and several others.). The PM P a r a m e t r i c Analysis Module calculates summary v a l u e s for each mean phenotype curve (e.g., area under the curve, min/maximum signal intensity, maximum slope, lag time, etc.), which enables t w o microorganisms to becom- pared. F o r example, the software allows users to select an ad hoc threshold (e.g., fixed cut-off k) that distinguishes w e l l s that show "striking" phenotypic relative differences betweenmicroorganisms with respect to a given summary v a l u e (e.g., a k-fold c h a n g e ) . The n u m b e r of w e l l s that differ under such an approach depends on the summary v a l u e and the stringency of the of selected threshold. Analysis of PM data using ad hoc summary v a l u e s without sta- tistical support has beenreported extensively, for example (Bochner, et al., 2001; Bochner 2003 (2); Zhou, et al., 2003; Mukherjee, et al., 2006). Although this approach may beused judiciously to guide biological research, ad hoc experimenter-selected thresholds have m u c h the same flavor of the 2-fold ex- pression c h a n g e s that used to occurin microarrays, and in that context the ad hoc threshold has given w a y to robust statistical analysis (Slonim, 2002). As such, there is an urgent need to design robust statistical method- ologies to interrogate these data and to enable sound biological inferences to bemade therefrom. In this paper, w e propose several simple y e t effective h y - DOI: 10.2202/1557-4679.1227 2 Sturino et al.: Statistical Methods for Comparative Phenomics pothesistesting frameworks to compare phenomic data that w e r e generated using the PM platform for high-throughput phenotyping. The paper is organized as follows. In Section 2, w e describe the experi- ment undertaken. Section 3 gives the methods used. Section 4 gives the results of simulation studies, while Section 5 describes the analysis of the experimental data. Section 6 has concluding remarks. 2 Experimental Design Sodium (Na+)/proton (H+) antiporter proteins are critical for maintaining in- tracellular pH, cell v o l u m e , and osmotic homeostasis (Padan et al., 2001). The Escherichia c o l i (E. c o l i ) nhaA gene encodes the NhaA Na+/H+ antiporter, which protects the cell from sodium ion toxicity in high-sodium environment (Padan et al., 1989). The Escherichia c o l i K-12 BW25113 (relevant genotype: nhaA+) and its isogenetic nhaA-deletion derivative E. c o l i ECK0020 (relevant genotype: ∆nhaA) w e r e obtained from the Keio Collection (Baba et al., 2006) curated b y the Escherichia coli Genetic Stock Center at Y a l e University. All PM procedures w e r e performedas indicated b y the manufacturer. In brief, strains w e r e cultivated at 37◦C on R2A agar.