Big Challenges for Statisticians
Total Page:16
File Type:pdf, Size:1020Kb
Big Challenges for Statisticians Hongtu Zhu, Ph.D Department of Biostatistics† and Biomedical Research Imaging Center‡ The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Thank NSF and SAMSI! Thank organizers! Thank you! The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Science Statistics The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Part 1. Technical Challenges The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Imaging Science From Wikipedia, the free encyclopedia Imaging Science is a multidisciplinary field concerned with the generation, collection, duplication, analysis, modification, and visualization of images. As an evolving field, it includes research and researchers from Physics, Mathematics, Statistics, Electrical Engineering, Computer Vision, Computer Science and Perceptual Psychology. The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Three key components •Image acquisition: studies the physical mechanisms and mathematical models and algorithms by which imaging devices generate image observations. •Image interpretation/application: is to see, monitor, and interpret the targeted world/patterns being imaged. •Image processing: is any linear or nonlinear operator that operates on the images and produces targeted patterns. The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Level 1: Imaging Data StrOuvcerturavil ew Functional MRI MRI (task) - Variety of acquisitions - Measurement basics • Structural MRI - Limitations & artefacts Diffusion MRI - Analysis principles • Functional Diffusion - Acquisition tips Functional MRI MRI • MRI • Complementary techniques (resting) PET EEG/MEG CT Calcium The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Image Processing Image Signal Models Acquisition & Noise Sources Image Representation Preprocessing Mathematics Segmentation Registration & Statistics Data Analysis Statistical & Computer Modeling Science/Engineer Interpretation & Inference The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Individual Imaging Analysis Imaging Construction Image Segmentation Multimodal Analysis DTI FLAIR Marc The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Group Imaging Analysis Registration Prediction NC/Diseased Group Differences Longitudinal/Family Brain Imaging Genetics Hibar, Dinggang, Martin The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL FDA: Functional Data Analysis f Fˆ = T[ f ] T The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL FDA: Functional Data Analysis Registration Images Estimation Prediction Voxel-wise Multiple Smoothing Statistical Comparisons Models The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL ill-posed inverse problems f T Fˆ = T[ f ] F d(F,Fˆ) ® 0? The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Level 2: A Multiscale Physical System stimulus – activity – measurement chain The van Essen diagram Robinson The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL A Multi-modal Approach • Different models at different scales. • Ladder of overlapping models. • Must be testable against multiple phenomena. The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL REVIEWS Meta-dimensional analysis In this Review, we describe the principles of meta- reflecting the complexity within biological systems. An approach whereby all dimensional analysis and multi-staged analysis, and The primary motivation behind integrated data analy- scales of data are combined provide an overview of some of the approaches that sis is to identify key genomic factors, and importantly simultaneously to produce are used to predict a given quantitative or categorical their interactions, that explain or predict disease risk or complex models defined as multiple variables from outcome, the tools available to implement these analy- other biological outcomes. The success in understand- multiple scales of data. ses, and the various strengths and weaknesses of these ing the genetic and genomic architecture of complex st rategies. In addition, we describe the analytical chal- phenotypes has been modest , and this could be due to Multi-staged analysis lenges that emer ge with data sets of this magnitude, and our limited exploration of the interactions among the A stepwise or hierarchical analysis method that reduces provide our per spective on how such systems genomic genome, transcriptome, metabolome and so on. Data the search space through analyses might develop in the future. integration may provide improved power to identify different stages of analysis. the important genomic factors and their interactions Why integrate data? (BOX 1). In addition, modelling the complexity of, and Systems genomics Data integration can have numerous meanings; however, the interactions between, variation in DNA, gene An analysis approach that models the complex inter- and in this Review, we use it to mean the process by which expression, methylation, metabolites and proteins intra-individual variations differ ent types of omic data are combined as predictor may improve our understanding of the mechanism of traits and diseases using variables to allow more thorough and comprehensive or causal relationships of complex-trait architecture. data from next-generation modelling of complex traits or phenotypes — which are There are two main approaches to data integration: omic data. likely to be the result of an elaborate interplay among multi-staged analysis, which involves integrating Data integration biological variation at various levels of regulation — information using a stepwise or hierarchical analysis The incorporation of through the identification of more informative models. approach; and meta-dimensional analysis, which refers multi-omic information in Data integration methods are now emerging that aim to the concept of integrating multiple different data a meaningful way to provide a more comprehensive analysis to bridge the gap between our ability to generate vast types to build a multi variate model associated with a of a biological point of interest. amounts of data aLevelnd our understan di3:ng of b ioDatalogy, thus gi vIntegrationen outcome16–18. • SNP • DNA methylationa • G ene expression • Protein • Metaboilite • CNV • Histone modifiction • Alternative splicing expresssion profilng in • LOH • Chromatin • Long non-coding • Post-translatioanal serum, plasma, Ritchie et al. (2015). • Genomic accessibility RNA modifiction urine, CSF, etc. rearrangement • TF binding • Small RNA • Cytokine array Nature Review Genetics • Rare variant • miRNA Genome Epigenome Transcriptome Proteome Metabolome Phenome DNA Gene mRNA TF Metabolites • Cancer TFbs Me • Metabolic TFbs Alternative splicing syndrome Histone Protein TFbs • Psychiatric miRNA disease Transcription Expression Translation Function Figure 1 | Biological systems multi-omics from the genome, epigenome, TheepigeUNIVERSITYnome level; gene exp ofres sNORTHion and alte rCAROLINAnative splicing at tathe CHAPEL HILL transcriptome, proteome and metabolome to the phenome. transcriptome level; protein expression and postN-taratunrsela Rtieovniaelw mso | dGifeicnaettiiocns Heterogeneous genomic data exist within and between levels, for example, at the proteome level; and metabolite profiling at the metabolome level. single-nucleotide polymorphism (SNP), copy number variation (CNV), loss Arrows indicate the flow of genetic information from the genome level to of heterozygosity (LOH) and genomic rearrangement, such as translocation, the metabolome level and, ultimately, to the phenome level. The red crosses at the genome level; DNA methylation, histone modification, chromatin indicate inactivation of transcription or translation. CSF, cerebrospinal accessibility, transcription factor (TF) binding and micro RNA (miRNA) at the fluid; Me, methylation; TFBS, transcription factor-binding site. 2 | ADVANCE ONLI NE PUBLI CATI ON www.nature.com/reviews/genetics © 2015 Macmillan Publishers Limited. All rights reserved Endophenotypes+ feedback) feedback) feedback) feedback) Genes+ Expression) Molecules+ Cells) Brain+ Symptoms+ RNA)genes,) neuron) Structure,) RNA,)proteins,) protein4coding) development,) circuits,) Behavioral) metabolites) genes) organelle) physiology) tests) Genomics) Transcriptomics) Cell)biology) Neuroscience) Diagnosis) Epigenomics) Proteomics) Neuroscience) Imaging) Self4report) Metabolomics) Brain)interactome) Interactomics) Environmental,+social+and+psychological+factors+ Figure+1.+A)simplified)flow)chart)for)psychiatric)disorders:)from)genes)to)symptoms) Zhao and Castellanos (2016) Discovery science strategies in studies of the pathophysiology of child and adolescent psychiatric disorders: promises and limitations The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Big Data Integration in Health Informatics E I E: environmental factors I: imaging/device G D Selection G: genetic/genomics D: disease http://en.wikipedia.org/wiki/DNA_sequence The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Part 2. Career Challenges The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Career Development Start with simple projects Learn from others Try hard to get involved in some large studies Think about how to do it better, in what sense? More papers. Develop new tools and packages. Write more grants The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Training SAMSI videos and slides for summer schools and lectures. Short Courses in major conferences. New Graduate Courses The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Collaborations Good Mentors: Theory and Applications. Good Collaborators: Radiology, Neuroscience, Psychiatry, Psychology, Computer Science, … The UNIVERSITY