Integration of Metabolomic and Other Omics Data in Population-Based Study Designs: an Epidemiological Perspective

H OH metabolites OH Review Integration of Metabolomic and Other Omics Data in Population-Based Study Designs: An Epidemiological Perspective 1, , 1, 1 2 3 Su H. Chu y * , Mengna Huang y, Rachel S. Kelly , Elisa Benedetti , Jalal K. Siddiqui , Oana A. Zeleznik 1, Alexandre Pereira 4, David Herrington 5, Craig E. Wheelock 6 , Jan Krumsiek 2 , Michael McGeachie 1, Steven C. Moore 7, Peter Kraft 8, Ewy Mathé 3 , 1, Jessica Lasky-Su y and on behalf of the Consortium of Metabolomics Studies Statistics Working Group 1 Channing Division of Network Medicine, Brigham and Women’s Hospital and Harvard Medical School, Boston, MA 02115, USA; [email protected] (M.H.); [email protected] (R.S.K.); [email protected] (O.A.Z.); [email protected] (M.M.); [email protected] (J.L.-S.) 2 Institute for Computational Biomedicine, Englander Institute for Precision Medicine, Department of Physiology and Biophysics, Weill Cornell Medicine, New York, NY 10021, USA; [email protected] (E.B.); [email protected] (J.K.) 3 Department of Biomedical Informatics, College of Medicine, The Ohio State University, Columbus, OH 43210, USA; [email protected] (J.K.S.); [email protected] (E.M.) 4 Department of Genetics and Molecular Medicine, University of Sao Paulo Medical School, Sao Paulo 01246-903, Brazil; [email protected] 5 Department of Internal Medicine, Wake Forest School of Medicine, Winston-Salem, NC 27101, USA; [email protected] 6 Division of Physiological Chemistry 2, Department of Medical Biochemistry and Biophysics, Karolinska Institute, 171 77 Stockholm, Sweden; [email protected] 7 Division of Cancer Epidemiology and Genetics, National Cancer Institute, Rockville, MD 20850, USA; [email protected] 8 Department of Epidemiology, Harvard School of Public Health, Boston, MA 02115, USA; [email protected] * Correspondence: [email protected]; Tel.: +1-617-525-0997 These authors contributed equally to this manuscript. y Received: 8 May 2019; Accepted: 14 June 2019; Published: 18 June 2019 Abstract: It is not controversial that study design considerations and challenges must be addressed when investigating the linkage between single omic measurements and human phenotypes. It follows that such considerations are just as critical, if not more so, in the context of multi-omic studies. In this review, we discuss (1) epidemiologic principles of study design, including selection of biospecimen source(s) and the implications of the timing of sample collection, in the context of a multi-omic investigation, and (2) the strengths and limitations of various techniques of data integration across multi-omic data types that may arise in population-based studies utilizing metabolomic data. Keywords: multi-omic integration; systems biology; epidemiology; study design; integrative analysis Metabolites 2019, 9, 117; doi:10.3390/metabo9060117 www.mdpi.com/journal/metabolites Metabolites 2019, 9, 117 2 of 27 1. Introduction Metabolites 2019, 9, x FOR PEER REVIEW 2 of 27 Long considered a ‘link between genotype and phenotype’ [1], the metabolome can offer a unique viewindirectly, not only intowith the the catalogue small molecules of end products comprising of biochemical the metabolome reactions underlying[2,3]. In this complex sense, traits,as both butendogenous also into the potentialand exogenous environmental perturbations contributors can whichbe captured may interact, by metabolomic directly or indirectly, snapshots, with the the smallmetabolome molecules can comprising serve as a the rich metabolome resource for [2, 3identifying]. In this sense, biomarkers as both endogenous for the prediction, and exogenous prognosis, perturbationsdiagnosis canor subtyping be captured of byhuman metabolomic disease. snapshots,If, however, the the metabolome goal of a study can serve is to as identify a rich resourceor estimate for identifyingbiological mechanisms biomarkers forand the relationships, prediction, prognosis,the study diagnosisof metabolomics or subtyping alone ofmay human produce disease. limited If, however,insights. the goal of a study is to identify or estimate biological mechanisms and relationships, the studyThe of metabolomicsnumber of studies alone mayintegrating produce multiple limited omic insights. data types continues to grow, increasingly demonstratingThe number of that studies including integrating metabolomic multiple informat omic dataion can types contribute continues important to grow, functional increasingly insight demonstratinginto the interactions that including between metabolomic the preceding information omic levels can (Figure contribute 1), in important addition to functional revealing insightinfluences intoexerted the interactions by the betweenenvironment the preceding [2]. How omic these levels inte (Figurer-omic1), relationships in addition to are revealing revealed, influences however, exertednecessarily by the environment depends on [the2]. study How thesedesign inter-omic used to re relationshipscruit subjects, are source revealed, and however, collect biosamples, necessarily and dependswhich on omics the study are measured design used [4]. to While recruit current subjects, integrative source and studies collect biosamples,are frequently and limited which by omicsconvenience are measured with [respect4]. While to the current availability integrative of multi-omic studies aremeasurements, frequently limitedthe increasing by convenience turn towards withsystems respect biology to the availability approaches of warrants multi-omic both measurements, careful consideration the increasing of how turn complex towards layers systems of omics biologyshould approaches be integrated warrants and thoughtful both careful discussion consideration regarding of how thecomplex limits of layersinference ofomics exerted should by different be integratedstudy designs. and thoughtful Previous discussion review regardingarticles on the multi- limitsomics of inference integration exerted have by focused different on study its designs.advantages Previousover single-omic review articles investigation on multi-omics in elucidating integration disease have focused mechanism on its [5], advantages potential over impact single-omic on medical investigationactionability in elucidating [6], and experimental disease mechanism design, [5 ],analytical potential methods, impact on tools, medical and actionability challenges [ 6[7],], and while experimentaldiscussion design, of the analytical importance methods, of epidemiologic tools, and challenges study design [7], while and discussionits implications of the importancein the context of of epidemiologicintegrative study omics design has been and its inadequate. implications Therefore, in the context in ofthis integrative review we omics discuss has been (1) inadequate.epidemiologic Therefore,principles in this of study review design, we discuss including (1) epidemiologic selection of the principles biospecimen of study source(s) design, and including the implications selection of of thethe biospecimen timing of sample source(s) collection, and the in implications the context ofof thea multi-omic timing of sampleinvestigation, collection, and in (2) the the context strengths of a and multi-omiclimitations investigation, of various and multi-omic (2) the strengths data integration and limitations techniques of various that have multi-omic been used data integrationin population- techniquesbased studies that have using been metabolomics used in population-based data. The studiesgeneral usingoutline metabolomics of the multi-omic data. The investigative general outlineframework of the multi-omic we propose investigative is illustrated framework in Figure we2. propose is illustrated in Figure2. Figure 1. A systems biology view of complex trait etiology. Red arrows reflect all potential causal mechanistic pathways that may be captured by the metabolome assuming a central dogma framework. Gray arrows depict biological pathways that do not act through changes to the metabolome. Blue Figurearrows 1. depict A systems potential biology sources view of reverseof complex causation, trait etiology. or mechanisms Red arrows that involve reflect all time-dependent potential causal mechanisticfeedback between pathways omics andthat do may not strictlybe captured adhere toby the the central metabolome dogma; the assuming arrows depicted a central here dogma are non-exhaustive of all potential reverse causation/time-dependent paths. The environment is depicted framework. Gray arrows depict biological pathways that do not act through changes to the as a potential force across all of the omic stages; the microbiome is included as a component of the metabolome. Blue arrows depict potential sources of reverse causation, or mechanisms that environment, but does not necessarily exert its effects across all omic stages. Image made with Biorender. involve time-dependent feedback between omics and do not strictly adhere to the central dogma; the arrows depicted here are non-exhaustive of all potential reverse causation/time- dependent paths. The environment is depicted as a potential force across all of the omic stages; the microbiome is included as a component of the environment, but does not necessarily exert its effects across all omic stages. Image made with Biorender. Metabolites 2019, 9, x 117 FOR PEER REVIEW 33 of of 27 Figure 2. Framework for multi-omic study design and analysis. Explicitly defined research questions Figurein a2.

Integration of Metabolomic and Other Omics Data in Population-Based Study Designs: an Epidemiological Perspective

Serum N-Glycomics Stratifies Bacteremic Patients Infected With

Glycomics Meets Genomics, Epigenomics and Other High Throughput Omics for System Biology Studies

Phylogenetic Analysis of the Human Antibody Repertoire Reveals Quantitative Signatures of Immune Senescence and Aging

The 'Omics Revolution: How an Obsession with Compiling Lists Is

Technology Development for Single-Molecule Protein Sequencing

From Proteomics to Systems Biology Outline and Learning Objectives

Impacts of Genomics and Other 'Omics' for the Crop, Forestry, Livestock

Computational Proteomics: High-Throughput Analysis for Systems Biology

New Technologies and Their Impact on 'Omics' Research

Insights of New Tools in Glycomics Research Denong Wang1 * and Srinubabu Gedela2,3

Molecular Biologist's Guide to Proteomics

Proteome-By-Phenome Mendelian Randomisation Detects 38 Proteins with Causal