Big Challenges for Statisticians
Hongtu Zhu, Ph.D Department of Biostatistics† and Biomedical Research Imaging Center‡ The University of North Carolina at Chapel Hill, Chapel Hill, NC 27599, USA
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Thank NSF and SAMSI! Thank organizers! Thank you!
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Science
Statistics
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Part 1. Technical Challenges
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Imaging Science
From Wikipedia, the free encyclopedia
Imaging Science is a multidisciplinary field concerned with the generation, collection, duplication, analysis, modification, and visualization of images.
As an evolving field, it includes research and researchers from
Physics, Mathematics, Statistics, Electrical Engineering, Computer Vision, Computer Science and Perceptual Psychology.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Three key components
•Image acquisition: studies the physical mechanisms and mathematical models and algorithms by which imaging devices generate image observations.
•Image interpretation/application: is to see, monitor, and interpret the targeted world/patterns being imaged.
•Image processing: is any linear or nonlinear operator that operates on the images and produces targeted patterns.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Level 1: Imaging Data
StrOuvcerturavil ew Functional MRI MRI (task) - Variety of acquisitions - Measurement basics • Structural MRI - Limitations & artefacts Diffusion MRI - Analysis principles • Functional Diffusion - Acquisition tips Functional MRI MRI • MRI • Complementary techniques (resting)
PET EEG/MEG CT Calcium
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Image Processing
Image Signal Models Acquisition & Noise Sources
Image Representation Preprocessing Mathematics Segmentation Registration & Statistics
Data Analysis Statistical & Computer Modeling Science/Engineer Interpretation & Inference
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Individual Imaging Analysis Imaging Construction Image Segmentation
Multimodal Analysis
DTI FLAIR
Marc The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Group Imaging Analysis Registration Prediction
NC/Diseased
Group Differences Longitudinal/Family Brain Imaging Genetics
Hibar, Dinggang, Martin
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL FDA: Functional Data Analysis f Fˆ = T[ f ] T
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL FDA: Functional Data Analysis
Registration Images
Estimation Prediction
Voxel-wise Multiple Smoothing Statistical Comparisons Models
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL ill-posed inverse problems
f T Fˆ = T[ f ]
F d(F,Fˆ) ® 0?
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL
Level 2: A Multiscale Physical System
stimulus – activity – measurement chain The van Essen diagram Robinson The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL A Multi-modal Approach
• Different models at different scales. • Ladder of overlapping models. • Must be testable against multiple phenomena.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL REVIEWS
Meta-dimensional analysis In this Review, we describe the principles of meta- reflecting the complexity within biological systems. An approach whereby all dimensional analysis and multi-staged analysis, and The primary motivation behind integrated data analy- scales of data are combined provide an overview of some of the approaches that sis is to identify key genomic factors, and importantly simultaneously to produce are used to predict a given quantitative or categorical their interactions, that explain or predict disease risk or complex models defined as multiple variables from outcome, the tools available to implement these analy- other biological outcomes. The success in understand- multiple scales of data. ses, and the various strengths and weaknesses of these ing the genetic and genomic architecture of complex st rategies. In addition, we describe the analytical chal- phenotypes has been modest , and this could be due to Multi-staged analysis lenges that emer ge with data sets of this magnitude, and our limited exploration of the interactions among the A stepwise or hierarchical analysis method that reduces provide our per spective on how such systems genomic genome, transcriptome, metabolome and so on. Data the search space through analyses might develop in the future. integration may provide improved power to identify different stages of analysis. the important genomic factors and their interactions Why integrate data? (BOX 1). In addition, modelling the complexity of, and Systems genomics Data integration can have numerous meanings; however, the interactions between, variation in DNA, gene An analysis approach that models the complex inter- and in this Review, we use it to mean the process by which expression, methylation, metabolites and proteins intra-individual variations differ ent types of omic data are combined as predictor may improve our understanding of the mechanism of traits and diseases using variables to allow more thorough and comprehensive or causal relationships of complex-trait architecture. data from next-generation modelling of complex traits or phenotypes — which are There are two main approaches to data integration: omic data. likely to be the result of an elaborate interplay among multi-staged analysis, which involves integrating Data integration biological variation at various levels of regulation — information using a stepwise or hierarchical analysis The incorporation of through the identification of more informative models. approach; and meta-dimensional analysis, which refers multi-omic information in Data integration methods are now emerging that aim to the concept of integrating multiple different data a meaningful way to provide a more comprehensive analysis to bridge the gap between our ability to generate vast types to build a multi variate model associated with a of a biological point of interest. amounts of data aLevelnd our understan di3:ng of b ioDatalogy, thus gi vIntegrationen outcome16–18.
• SNP • DNA methylationa • G ene expression • Protein • Metaboilite • CNV • Histone modifiction • Alternative splicing expresssion profilng in • LOH • Chromatin • Long non-coding • Post-translatioanal serum, plasma, Ritchie et al. (2015). • Genomic accessibility RNA modifiction urine, CSF, etc. rearrangement • TF binding • Small RNA • Cytokine array Nature Review Genetics • Rare variant • miRNA
Genome Epigenome Transcriptome Proteome Metabolome Phenome
DNA Gene mRNA TF Metabolites • Cancer TFbs
Me • Metabolic TFbs Alternative splicing syndrome Histone Protein
TFbs • Psychiatric miRNA disease
Transcription Expression Translation Function Figure 1 | Biological systems multi-omics from the genome, epigenome, TheepigeUNIVERSITYnome level; gene exp ofres sNORTHion and alte rCAROLINAnative splicing at tathe CHAPEL HILL transcriptome, proteome and metabolome to the phenome. transcriptome level; protein expression and postN-taratunrsela Rtieovniaelw mso | dGifeicnaettiiocns Heterogeneous genomic data exist within and between levels, for example, at the proteome level; and metabolite profiling at the metabolome level. single-nucleotide polymorphism (SNP), copy number variation (CNV), loss Arrows indicate the flow of genetic information from the genome level to of heterozygosity (LOH) and genomic rearrangement, such as translocation, the metabolome level and, ultimately, to the phenome level. The red crosses at the genome level; DNA methylation, histone modification, chromatin indicate inactivation of transcription or translation. CSF, cerebrospinal accessibility, transcription factor (TF) binding and micro RNA (miRNA) at the fluid; Me, methylation; TFBS, transcription factor-binding site.
2 | ADVANCE ONLI NE PUBLI CATI ON www.nature.com/reviews/genetics
© 2015 Macmillan Publishers Limited. All rights reserved Endophenotypes+
feedback) feedback) feedback) feedback)
Genes+ Expression) Molecules+ Cells) Brain+ Symptoms+ RNA)genes,) neuron) Structure,) RNA,)proteins,) protein4coding) development,) circuits,) Behavioral) metabolites) genes) organelle) physiology) tests) Genomics) Transcriptomics) Cell)biology) Neuroscience) Diagnosis) Epigenomics) Proteomics) Neuroscience) Imaging) Self4report) Metabolomics) Brain)interactome) Interactomics)
Environmental,+social+and+psychological+factors+
Figure+1.+A)simplified)flow)chart)for)psychiatric)disorders:)from)genes)to)symptoms)
Zhao and Castellanos (2016) Discovery science strategies in studies of the pathophysiology of child and adolescent psychiatric disorders: promises and limitations
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Big Data Integration in Health Informatics
E I
E: environmental factors
I: imaging/device
G D Selection G: genetic/genomics D: disease http://en.wikipedia.org/wiki/DNA_sequence The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Part 2. Career Challenges
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Career Development Start with simple projects
Learn from others
Try hard to get involved in some large studies
Think about how to do it better, in what sense?
More papers.
Develop new tools and packages.
Write more grants
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Training
SAMSI videos and slides for summer schools and lectures.
Short Courses in major conferences.
New Graduate Courses
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Collaborations
Good Mentors: Theory and Applications.
Good Collaborators: Radiology, Neuroscience, Psychiatry, Psychology, Computer Science, …
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Data Sets Big Public Data Sets:
• Alzheimer’s Disease Neuroimaging Initiative (ADNI)
• NIH MRI Study of Normal Brain Development
• National Database for Autism Research
• Human Connectome Project
• The Cancer Genome Atlas (TCGA)
• UK Biobank https://en.wikipedia.org/wiki/List_of_neuroscience_databases
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL UK Biobank Project
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL The Human Connectome Project The HCP is to elucidate the neural pathways that underlie brain function and behavior. The Heavily Connected Brain Peter Stern, “Connection, connection, connection…”, Science, Nov. 1 2013: Vol. 342 no. 6158 P.577
Resting-state fMRI (rfMRI) and dMRI provide information about brain connectivity. Task-evoked fMRI reveals much about brain function. Structural MRI captures the shape of the highly convoluted cerebral cortex. Behavioral data relate brain circuits to individual differences in cognition, perception, and personality. Magnetoencephalography (MEG) combined with electroencephalography (EEG) yield information about brain function on a milisecond time scale.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Softwares http://www.nitrc.org/
NITRC = The Source for Neuroimaging Tools and Resources
Statistical Parametric Mapping (SPM) FMRIB Software Library (FSL) Analysis of Functional NeuroImages (Afni) 3D Slicer FreeSurfer ……
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Conferences
Human Brain Mapping (HBM) ISMRM conference SNF conference.
Information Processing in Medical Imaging (IPMI) SIAM Conference on Imaging Science (IS)
Medical Image Computing and Computer Assisted Intervention (MICCAI) International Symposium on Biomedical Imaging (ISBI)
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Neural Information Processing Systems Foundation (NIPS)
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Publications NeuroImage Medical Imaging Analysis IEEE Transactions on Medical Image
Human Brain Mapping
IEEE Transactions on Signal Processing IEEE Transactions on Image Processing IEEE Transactions on Signal Processing Magazine
SIAM Journal on Imaging Sciences IEEE Pattern Analysis and Machine Intelligence
Annals of Applied Statistics, Biometrics Biostatistics Journal of American Statistical Association ACS The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Part 3. Software Challenges
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Software Development http://www.nitrc.org/
NITRC = The Source for Neuroimaging Tools and Resources
Lack a good and popular statistical software for Neuroimaging Data Analysis from our community
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Software Development Start a Neuroconduct project
• Share responsibilities and information
• Common input and output files compatible with major packages
• Build small Rcpp and Matlab packages
• Release them through your own websites, our neuroconduct website and http://www.nitrc.org/
• Focus on a few key tools and expand from them
• Encourage other groups to download and use them.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Software Development 1. Simulators for different imaging modalities • Evaluate image processing tools • Evaluate statistical methods (group analysis, reliability) 2. Standardize all image processing and analysis pipelines • fMRI and resting fMRI • EEG/MEG • DTI • CT • Calcuim • PET 3. Develop new tools to do multi-modal analysis 4. Develop new tools to integrate imaging, genetic, and clinical data
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL