Bayesian Analysis of fMRI Data and RNA-Seq Time Course Experiment Data

A Dissertation presented to the Faculty of the Graduate School at the University of Missouri

In Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy

by YUAN CHENG Advisor: Marco A. R. Ferreira December 2015 The undersigned, appointed by the Dean of the Graduate School, have examined the dissertation entitled:

Bayesian Analysis of fMRI Data and RNA-Seq Time Course Experiment Data

presented by Yuan Cheng, a candidate for the degree of Doctor of Philosophy and hereby certify that, in their opinion, it is worthy of acceptance.

Dr. Marco A. R. Ferreira

Dr. Paul Speckman

Dr. TieMing Ji

Dr. Subharup Guha

Dr. Jeff Rouder ACKNOWLEDGMENTS

I would never have been able to finish my dissertation without the guidance of my advisor and my committee members, help from friends, and support from my family and husband. I would like to express my sincere gratitude and thanks to my advisor, Marco A. R. Ferreira, for introducing me to the research of Bayesian statistics and all the challenging yet interesting topics. Without his continuous encouragement and inspi- ration, all the work will never be possible. I also thank him for his kind help and wise guidance through out my PhD study, which make my research life here much easier and more enjoyable. I am grateful to Dr. Rouder for his help in conquering hard background materials of functional MRI and his suggestions in the development of this work. I extend my thanks to my committee members: Dr. Paul Speckman, Dr. Subharup Guha and Dr. Tieming Ji for their insightful comments and suggestions on this work. I would like to thank Dr. Shiqi Cui for his tremendous help in my research related to RNA-Seq Time Course experiments. I would also like to thank my parents. They were always supporting me and encouraging me with their best wishes. Finally, I would like to thank my husband, Yuelei Sui. He was always there cheering me up and stood by me through the good times and bad.

ii TABLE OF CONTENTS

ACKNOWLEDGMENTS ...... ii

LIST OF TABLES ...... iv

LIST OF FIGURES ...... v

ABSTRACT ...... vi

CHAPTER

1 Introduction ...... 1

1.1 Introduction of Functional MRI ...... 2

1.1.1 Background ...... 4

1.2 Introduction of Time Course RNA-Seq Experiments ...... 9

1.2.1 Background ...... 9

Part I: Inference

2 A New Hemodynamic Response Function for Modeling Functional MRI Data ...... 13

2.1 Modeling The Hemodanymic Response Function ...... 14

2.1.1 fMRI Experiment Designs and Hemodanymic Response Function 14

2.1.2 Reviews on Hemodanymic Response Function Models . . . . . 15

2.2 New HRF Model ...... 18

2.2.1 General Linear Model and Assumptions ...... 18

2.2.2 The Triple Gamma Hemodynamic Response Function . . . . . 21

2.3 Inference ...... 24

iii 2.3.1 Mixture Priors for β ...... 24

2.3.2 Model Selection on β by Zellner’s-g prior ...... 25

2.3.3 The MCMC Algorithm for Estimating HRF Parameters . . . 27

3 Bayesian Model Selection ...... 32

3.1 Bayesian Model Selection and Nonlocal Priors ...... 32

3.2 Model Selection on β ...... 33

3.2.1 Nonlocal Positive Truncated pMOM ...... 33

3.2.2 Bayesian Model Selection for β ...... 36

3.2.3 The MCMC Algorithm for Posterior Simulations ...... 41

3.3 Multiple Tests and FDR Control ...... 46

3.4 Simulation Study ...... 47

3.5 Real Data Applications ...... 50

3.5.1 A Preliminary Experiment ...... 51

3.5.2 Main Experiment ...... 53

4 Timing Brain Activation ...... 73

4.1 Model Selection Strategy for Timing Brain Activation ...... 74

4.1.1 Model Selection on β and c2 ...... 74

4.1.2 Model Selection Criteria ...... 80

4.1.3 Comparing DIC and Marginal Likelihood ...... 83

4.2 Testing the Model Selection Results ...... 92

4.2.1 Linear Model Fitting Exploring ...... 93

4.3 One Model Fitting Procedure of Selected Voxels ...... 95

iv 5 Bayesian Model Selection by Using Nonlocal Prior in Time Course RNA-seq Experiments ...... 107

5.1 Introduction ...... 107

5.2 Principal Component Regression Model ...... 109

5.3 Prior Specification ...... 111

5.4 Posterior Inference ...... 111

5.4.1 Full Conditionals and MCMC Algorithm ...... 111

5.5 Application to Vaccination Data ...... 115

5.5.1 Preprocessing and Empirical Priors ...... 115

5.5.2 Differentially expressed identification ...... 117

6 Summary ...... 139

BIBLIOGRAPHY ...... 141

VITA ...... 150

v LIST OF TABLES

Table Page

3.1 Comparison MSE of γ ...... 49 3.2 Comparison MSE of β ...... 49

4.1 Different β Combined Models ...... 79 4.2 Model Selection Result 1 ...... 85 4.3 Model Selection Result 2 ...... 86 4.4 Model Selection Result 3 ...... 87 4.5 Linear Fixed Model Fitting Result ...... 94 4.6 Linear Mixed Model Fitting Result ...... 94

4.7 Estimated d3 of Visual and Motor of Subject 7 ...... 97

4.8 Estimated d3 of Visual and Motor of Subject 8 ...... 99

4.9 Estimated d3 of Visual and Motor of Subject 9 ...... 101

4.10 Estimated d3 of Visual and Motor of subject 10 ...... 103

4.11 Estimated d3 of Visual and Motor of subject 11 ...... 105

5.1 DE Genes for Vaccinated Subjects 1 ...... 133 5.2 DE Genes for Vaccinated Subjects 2 ...... 134 5.3 DE Genes for Vaccinated Subjects 3 ...... 135

vi 5.4 DE Genes for Non-vaccinated Subjects ...... 138

vii LIST OF FIGURES

Figure Page

1.1 fMRI terminology ...... 6 1.2 fMRI experiment set-up and BOLD response data ...... 7

2.1 canonical HRF ...... 18 2.2 visual and motor HRF ...... 19 2.3 Linear Invariant System(convolution model) ...... 20 2.4 Triple Gamma HRF ...... 23

3.1 nonlocal pMOM ...... 35 3.11 BIC-scanner ...... 51 3.2 Noisy Synthetic Data ...... 56 3.3 Map of Posterior Probability of Alternative Model ...... 56 3.4 Simulated Left HRF ...... 57 3.5 Simulated MiddleBack HRF ...... 57 3.6 Simulated Right HRF ...... 57 3.7 standardized γ for canonical HRF GLM ...... 58 3.8 standardized β for canonical HRF GLM ...... 59 3.9 standardized γ for new method ...... 60

viii 3.10 standardized β for new method ...... 61 3.12 PCA Result on HRF parameters’ estimations from angle 1 ...... 62 3.13 PCA Result on HRF parameters’ estimations from angle 2 ...... 62 3.14 PCA Result on HRF parameters’ estimations from angle 3 ...... 63 3.15 PCA Result on HRF parameters’ estimations from angle 4 ...... 63 3.16 HRF Shapes Correspond to PCA Results ...... 64

3.17 First Session Histograms of d3 and c2 of Subject 7 ...... 65

3.18 3D Plot of First Session c2 of Subject 7 ...... 65

3.19 3D Plot of First Session d3 of Subject 7 ...... 65

3.20 Second Session Histograms of d3 and c2 of Subject 7 ...... 66

3.21 3D Plot of Second Session c2 of Subject 7 ...... 66

3.22 3D Plot of Second Session d3 of Subject 7 ...... 66

3.23 First Session Histograms of d3 and c2 of Subject 8 ...... 67

3.24 3D Plot of First Session c2 of Subject 8 ...... 67

3.25 3D Plot of First Session d3 of Subject 8 ...... 67

3.26 Second Session Histograms of d3 and c2 of Subject 8 ...... 68

3.27 3D Plot of Second Session c2 of Subject 8 ...... 68

3.28 3D Plot of Second Session d3 of Subject 8 ...... 68

3.29 First Session Histograms of d3 and c2 of Subject 9 ...... 69

3.30 3D Plot of First Session c2 of Subject 9 ...... 69

3.31 3D Plot of First Session d3 of Subject 9 ...... 69

3.32 Second Session Histograms of d3 and c2 of Subject 9 ...... 70

3.33 3D Plot of Second Session c2 of Subject 9 ...... 70

3.34 3D Plot of Second Session d3 of Subject 9 ...... 70

ix 3.35 First Session Histograms of d3 and c2 of Subject 10 ...... 71

3.36 3D Plot of First Session c2 of Subject 10 ...... 71

3.37 3D Plot of First Session d3 of Subject 10 ...... 71

3.38 Second Session Histograms of d3 and c2 of Subject 10 ...... 72

3.39 3D Plot of Second Session c2 of Subject 10 ...... 72

3.40 3D Plot of Second Session d3 of Subject 10 ...... 72

4.1 DIC Selection of Sub7 Session A ...... 88 4.2 BF Selection of Sub7 Session A ...... 88 4.3 DIC Selection of Sub7 Session B ...... 88 4.4 BF Selection of Sub7 Session B ...... 88 4.5 DIC Selection of Sub8 Session A ...... 89 4.6 BF Selection of Sub8 Session A ...... 89 4.7 DIC Selection of Sub8 Session B ...... 89 4.8 BF Selection of Sub8 Session B ...... 89 4.9 DIC Selection of Sub9 Session A ...... 90 4.10 BF Selection of Sub9 Session A ...... 90 4.11 DIC Selection of Sub9 Session B ...... 90 4.12 BF Selection of Sub9 Session B ...... 90 4.13 DIC Selection of Sub10 Session A ...... 91 4.14 BF Selection of Sub10 Session A ...... 91 4.15 DIC Selection of Sub10 Session B ...... 91 4.16 BF Selection of Sub10 Session B ...... 91 4.17 DIC Selection of Sub11 Session A ...... 92 4.18 BF Selection of Sub11 Session A ...... 92

x 4.19 DIC Selection of Sub11 Session B ...... 93 4.20 BF Selection of Sub11 Session B ...... 93

4.21 Boxplots of selected voxels d3 of subject 7 ...... 97

4.22 Histograms of posterior distributions of d3 for subject 7 ...... 97 4.23 Estimated HRF of Subject 7 ...... 98

4.24 Boxplots of selected voxels d3 of subject 8 ...... 99

4.25 Histograms of posterior distributions of d3 for subject 8 ...... 99 4.26 Estimated HRF of Subject 8 ...... 100

4.27 Boxplots of selected voxels d3 of subject 9 ...... 101

4.28 Histograms of posterior distributions of d3 for subject 9 ...... 101 4.29 Estimated HRF of Subject 9 ...... 102

4.30 Boxplots of selected voxels d3 of subject 10 ...... 103

4.31 Histograms of posterior distributions of d3 for subject 10 ...... 103 4.32 Estimated HRF of Subject 10 ...... 104

4.33 Boxplots of selected voxels d3 of subject 11 ...... 105

4.34 Histograms of posterior distributions of d3 for subject 11 ...... 105 4.35 Estimated HRF of Subject 11 ...... 106

5.1 Density Plot of a Mixture Prior ...... 122 5.2 Time Plot of First and Second Principal Components ...... 123

5.3 Empirical plot of βg ...... 124

5.4 Histogram of FDRd sg ...... 125 5.5 Empirical density plot of posterior FDR ...... 126

5.6 Scatter plots of β1 vs β2 ...... 127 5.7 Venn Diagram of DE Genes ...... 128

xi 5.8 Estimated Temporal Trends of Top 10 Genes ...... 129

5.9 Scatter plot of average β1 vs average β2 ...... 131

5.10 Scatter plots of β1 vs β2 ...... 132

5.11 Scatter plots of β1 vs β2 ...... 136

5.12 Scatter plots of β1 vs β2 ...... 137

xii Bayesian Analysis of fMRI Data and RNA-Seq Time Course Experiment Data

Yuan Cheng

ABSTRACT

The present dissertation contains two parts. In the first part, we develop a new Bayesian analysis of functional MRI data. We propose a novel triple gamma Hemody- namic Response Function (HRF) including the component to describe the initial dip. We use HRF to inform voxel-wise neuronal activities. Then we devise a new model selection procedure with a nonlocal pMOM prior for joint detection of neuronal acti- vation and estimation of HRF, in order to time the activation time difference between visual and motor areas in the brain. In the second part, we develop a new Bayesian analysis of RNA-Seq Time Course experiments data. We propose to use Bayesian Principal Component regression model and based on that, devise a model selection procedure by using nonlocal piMOM prior in order to identify differentially expressed genes. Most current existing methods for RNA-Seq Time Course experiments data are from static view of point and cannot predict temporal patterns. Our method estimate the posterior differentially expressed probability for each by borrowing information across all subjects. Use of nonlocal prior in the model selection procedure reduces false discovered differentially expressed genes.

xiii Chapter 1

Introduction

This dissertation is divided into two parts. The first part of the dissertation is ded- icated to Bayesian modeling of functional magnetic resonance imaging (fMRI) data. The second part is dedicated to Bayesian modeling of RNA-seq Time Course exper- iment data. Research on both of these two topics has been growing fast in recent years. For both of these two studies, statistics plays a very important role to analyze and explain the results. Because of their special and complicated data structures, classical statistical models and existing methods are not easy to adapt and apply. Obviously, high dimensions and complexity of both of these two special data require specifically developed statistical methods to analyze them. Bayesian methods have ways to predict or estimate temporal patterns in both of these two studies. Bayesian inference provides possibilities to infer on more complicated models, includes some prior information and interprets results in more direct and intuitive ways. As Ker- shaw et al. pointed in (Kershaw et al., 1999) “With [Bayesian] methodology it is possible to derive a relevant statistical test for activation in an fMRI time series no

1 matter how complicated the parameters of the model are. The derivation is usually quite straightforward and results may be extracted from Bayesian models without first having to find estimates for all the parameters.” We find that similar benefits of Bayesian methods hold for the RNA-seq Time Course experiment data. This dissertation has six chapters. The first chapter introduces some background knowledge and information about the fMRI and RNA-seq Time Course experiment. The second chapter develops a new triple gamma hemodynamic response function (HRF) with a component - initial dip to account for the hemodynamic variability in the activation detection. The third chapter performs joint activation detection and parameter estimation based on our new HRF and the non-local prior. The fourth chapter develops a more complete model selection scheme on both types of neuronal response and initial dip in order to differentiate the time difference between visual and motor activations for ROI analysis. The fifth chapter develops a new Bayesian model and a model selection scheme for RNA-seq Time Course experiment data. Finally, chapter 6 concludes with a short discussion of the main findings of this work.

1.1 Introduction of Functional MRI

We focus on Bayesian modeling of functional MRI data from Chapter 2 to Chapter 4. We will describe the structure of the first part of this dissertation from Chapter 2 to Chapter 4 and introduce some related background knowledge of functional MRI. Functional MRI is based on the fact that vascular response in the brain correlates to neuronal activities. Specifically, fMRI measures blood oxygenation level-dependent (BOLD) (Ogawa et al., 1992) contrast to study local changes in deoxyhemoglobin

2 concentration in the brain. The primary goal of fMRI research is to use information provided by the BOLD signal to make inferences about the underlying unobserved neuronal activities. Therefore, the ability to accurately model and estimate the evoked hemodynamic response to a neural activation is a key point in the fMRI data analysis. Chapter 2 of this dissertation proposes a new triple gamma hemodynamic response function (HRF) which models three important characteristics of HRF: the initial dip, the peak and the undershoot. In our new HRF, all the parameters may be directly interpreted in terms of changes of the neuronal activity. In addition, we develop an MCMC algorithm to estimate all these parameters assuming they vary across different voxels. Chapter 3 develops a model under a Bayesian framework for jointly detecting activation and estimating parameters by using a non-local product moment (pMOM) prior. Then we design a model selection scheme in the MCMC algorithm by using marginal likelihood ratios between activation and non-activation. In the MCMC, we choose the best model jumping among different models. We perform this algorithm on both a simulation study and a real data set. Furthermore, in Chapter 4, we devise a more complete model selection and esti- mation scheme based on regions of interest (ROI) analysis. It improves the temporal resolution which cannot be done by the scanner. It allows us to compare the activa- tion time between some areas of interest in the brain. We apply this method on a real data set. The biggest difference between Chapter 3 and Chapter 4 is that while in Chapter 3 we perform model selection using a reversible jump MCMC approach, in Chapter 4 we fit the several models individually and compare models using Bayes factors computed with Metropolis-Laplace approximation in the regions of interest.

3 Then we pool all the selected voxels for a specific region in order to estimate one HRF of a specific region and compare activation time difference by using estimated initial dips between visual and motor areas.

1.1.1 Background

In this section, we provide some background for the fMRI techniques, introduce the terminology of fMRI study, fMRI data structure and the preprocessing of fMRI data.

Nature of fMRI Study

There exist different neuroimaging techniques for researchers to study brain physio- logical activities. Functional magnetic resonance imageing (fMRI) is one of them. It can provide information about the brain with relatively high spatial resolution but relatively low temporal resolution because of slower brain vascular response. In recent years, interest in using fMRI to perform neuroimaging studies is growing quickly. It is an advanced but noninvasive technique to study various functions of the brain. It is a powerful tool that allows us to learn dynamic mental processes and cog- nitive activities of the human brain. The main way to learn about brain function is to detect neuronal activation of various parts of the brain given different types of ex- ternal stimuli. Visual stimuli can be series of different pictures and motor stimuli can be some orders to push the side button. From physiology, neuronal activities are ac- companied by increased oxygenated cerebral blood flow around the activated regions, which can supply energy to the activated neurons. Functional MRI measures blood oxygenation level-dependent (BOLD) contrast response (Ogawa et al., 1992) to study local changes of deoxy-hemoglobin concentration in the brain. Because deoxygenated

4 hemoglobin has a paramagnetic property, in the fMRI scanner it would suppress MR signal. On the other hand, oxygenated hemoglobin has a diamagnetic property and it won’t suppress the MR signal in the fMRI scanner. Functional MRI technology takes advantage of this difference of local magnetic susceptibility between oxygenated blood and deoxygenated blood and measures MR signals to produce the fMRI BOLD re- sponse data. Therefore, we can learn the activation occurring in the brain during the execution of different mental tasks. More importantly, the relationship between un- derlying unobserved neuronal activities and the changes in BOLD response data can be expressed as hemodynamic response (HDR) or modeled as hemodynamic response function (HRF). The BOLD response is relatively slower than the actual temporal rate of neuronal activities.

Terminology of fMRI Experiment and fMRI Data Structure

During an fMRI experiment, the participant lays down and puts his/her head inside the fMRI scanner and performs a set of designed tasks. During those tasks, the scanner scans the whole brain every 2 seconds to acquire a series of brain images. For the fMRI experiment, we may have single or multiple participants. Each of them would go through several scanning sessions and each session consists of multiple runs or blocks. Within each block or run, the scanner scans the whole brain several times. Usually, the time interval between two successive scans is called repetition time (TR). At each time point, the whole brain scan is a volume which contains the BOLD response of the brain at that time. Each volume consists of a number of uniformly spaced equal sized cubes or voxels. The volume can be divided into multiple slices and each slice contains a matrix of

5 Figure 1.1: fMRI terminology (from http : //www.fil.ion.ucl.ac.uk/spm/...... course/slides08 − zurich/Ged preproc.ppt) voxels. More specifically, each voxel has its imaging intensity at each time point. With TR and specific number of slices of each volume, it is obvious to get the acquisition time of each slice in a volume which is important in the later modeling of fMRI BOLD response data. Typically, in our experiment, brain volumes are 64 × 64 × 32 with 246 scans in each block and 2 different designed sessions with 8 blocks for each participant. In another experiment, brain volumes are 64 × 64 × 32 with 246 scans in each block and 2 different designed sessions with 4 blocks for each participant. So, we can see that fMRI data set may be as large as hundreds of gigabytes. Figure 1.2 shows a simple experimental set-up where a participant alternates be- tween periods of looking at a visual stimulus for 30 seconds and resting with closed

6 Figure 1.2: the left top is a graphic description about fMRI experiment set-up; right top is the example of visual stimulus activation area of the brain for closed eyes and open eyes; middle bottom is the BOLD contrast response data of that activated visual cortex area. (Fox and Raichle, 2007) eyes for another 30 seconds.

Preprocessing on fMRI Data

The fMRI data is really noisy. Participants may move their heads during experiments and there may exist some uncontrollable artifacts in our experiments. The preprocess- ing attempts to increase the signal to noise ratio (SNR) and remove artifacts in order to validate model assumptions. Typical fMRI preprocessing steps include slice time correction, motion correction (realignment), spatial normalization and smoothing.

• Slice time correction accounts for sequential acquisition timing. It consists in

7 shifting the signal phase by a given amount to temporally align data. The mandatory selected reference slice is usually the slice acquired in the middle of the sequence which is the maximum interpolation of TR/2.

• The motion correction is to determine the rigid body transformation that min- imizes some cost functions and is defined by 3 translations in X, Y and Z directions and 3 rotations around the X, Y and Z axes. After applying motion correction, we would have 6 motion correction parameters values for each block of each subject.

• The spatial normalization step warps each individual subject into standard space based on templates. More than one subject can be entered into 1 nor- malization step.

• The smoothing process applies a smoothing filter to the images. This helps with subject spatial differences especially when it comes to group analysis.

We don’t do any slice timing correction because we will use a new HRF model to increase the temporal resolution and our goal is to test the time differences between areas of interest. We use cubic splines as the smoothing filter to eliminate or minimize low-frequency drifts due to physiological artifacts (Zarahn et al., 1997) and average out the random thermal noise. Furthermore, we perform the motion correction step for each participant and standardize the locations of brain regions. All functional im- ages from different sessions and runs can be registered with its anatomical structural images by co-registration to make each data point in a specific voxel’s time series only consisting of a signal from that voxel (i.e., to correct for movements of participant between measurements). And for group analysis of multiple participants, we can reg-

8 ister structural images of different participants with a same template and normalized functional images to make each voxel locating in the same anatomical position for all participants. Comparing different sessions and runs for one participant and among different participants can be achieved by co-registration and normalization.

1.2 Introduction of Time Course RNA-Seq Exper- iments

The second part of the dissertation is dedicated to a new project about Bayesian modeling and analysis of RNA-seq Time Course experiments data. In Chapter 5, we develop a Bayesian model selection framework by using non-local piMOM prior (Johnson and Rossell, 2012) for a Bayesian Principal Component regression model of RNA-seq Time Course experiments data.

1.2.1 Background

Research about profiling of gene expression via high-throughput methods has an important role in bio-medical research studies. Microarrays have already been inten- sively used for the profiling of gene expression in the last two decades. Microarrays contain thousands of DNA sequences (probe sets) that potentially match comple- mentary sequences in the sample, making available a profile of all transcripts being expressed. So Microarrays rely on a good knowledge of an organism’s genome and have very limited base coverage. However, the transcriptome of a cell changes all the time and variations of whole genome are different among different individuals even for the same species. The resequencing is necessary. The high demand for low-cost

9 sequencing has driven the development of high-throughput sequencing (NGS) tech- nologies which can parallelize the sequencing process, producing thousands or millions of sequences simultaneously (Hall, 2007). Next-generation sequencing (NGS) attracts a lot of attention these years and it applies to genome sequencing, genome resequencing, transcriptome profiling (RNA- Seq), DNA- interactions (ChIP-sequencing), and epigenome characterization (de Magalh˜aeset al., 2010). The recent developments of next-generation sequencing (NGS) allow us to increase base coverage of a DNA sequence, as well as higher sample throughput. It facilitates sequencing of the RNA transcripts in a cell, providing the ability to look at alternative gene spliced transcripts, post-transcriptional modifica- tions, gene fusion, mutations/SNPs and changes in gene expression (Maher et al., 2009). In addition to mRNA transcripts, the advancement of RNA-seq (Nagalakshmi et al., 2008) technology now brings us more powerful tools to study the whole genome and transcriptome. This new massively parallel sequencing method is named RNA- sequencing (RNA-seq) and is extensively improving our understanding of gene regu- lation and signaling networks. It can reveal a snapshot of RNA presence and quantity from a genome at a given time point. RNA-seq allows us to assess the whole transcriptome during different conditions, but it has lower cost and higher possibility to reproduce comparing to microarray methods. When we study a dynamic biological process such as responses to the treatment, data can be acquired at different time points from a Time Course exper- iment. In other words, we sample data at several time points in order to capture involved regulatory network and identify responsible genes of cells. There are differ-

10 ent Time Course experiment designs. In our work, we apply our method on (Henn et al., 2013) a dataset from control Time Course experiments (Oh et al., 2013), which are one group of subjects vaccinated and another group of subjects non-vaccinated. The data in the RNA-seq Time Course experiment are always complex by high di- mensions. There are tens of thousands of gene expressions measured at multiple time points for several subjects. So, specific statistical analysis methods for the RNA-seq Time Course data are needed. Although there are already some methods success- fully applied to microarray data, only a few have been modified and implemented on RNA-seq Time Course data due to the complexity (Oh et al., 2014).

Methods

Most published studies, some statistical methods and corresponding R packages for RNA-seq Time Course experiments are from a static point of view and cannot predict dynamic temporal pattern, such as DESeq (Anders and Huber, 2010) and edgeR (Oshlack et al., 2010). Differential expressed genes have been identified separately by pairwise comparison between two neighboring time points and even pooling all possible pairwise comparisons into either union sets or intersection set. To overcome the small sample size and large number of variables issue in RNA- seq Time Course data, we propose an Empirical Bayes approach. First, we reduce the high dimension of the RNA-seq Time Course data by using Bayesian Principal Component regression and reflect major variation in the temporal patterns of all the gene expression profiles across subjects. Given a gene expression profile across all the subjects and prior knowledge, we can estimate the posterior probability of whether the gene is differentially expressed for the temporal pattern. Also, we introduce the

11 non-local prior (piMOM) in the mixture prior for the gene expression model. By using a non-local prior, the Bayesian model selection method has been improved to reduce false positively identified differentially expressed genes.

12 Chapter 2

A New Hemodynamic Response Function for Modeling Functional MRI Data

This chapter proposes a novel methodology for fMRI analysis based on a triple gamma Hemodynamic Response function (HRF) and Bayesian model selection. This chapter is divided into three sections. Section 2.1 reviews different HRF models that are most commonly used. Section 2.2 proposes our new voxel-wised HRF model for event-related fMRI experiments. Section 2.3 shows how to estimate this new HRF model by using a spike and slab prior for each voxel.

13 2.1 Modeling The Hemodanymic Response Func- tion

In this section, we introduce different fMRI experiment designs, review different HRF models, and give some intuitive ideas about our new HRF model.

2.1.1 fMRI Experiment Designs and Hemodanymic Response Function

Before we review different current methods to model hemodynamic response function, it is necessary to introduce the fMRI experimental designs. There are two major types of statistical designs utilized in fMRI cognitive ex- periments: block designs and event-related designs (Donaldson and Buckner, 2001; Buckner et al., 1996). Sometimes, they can be mixed in one design. Block designs present only one type of stimuli in a block (epoch) and alternate the different types of stimuli in other blocks of the same length. Within one block, it can be assumed that each stimulus is presented for an extended period of time to the participant. An- other block design mixes periods of stimulus and rest (fixation). So, the block design has more statistical power to detect activation and may be appropriate if we want to find subtle differences in BOLD response across different types of stimuli (Friston et al., 1999). Event-related designs, on the other hand, permit randomized presenta- tions of different types of stimuli, and thus theoretically reduce confounding because of stimulus orders. Also, event-related designs allow different inter-stimulus interval (ISI) for different runs, which is the time between two successive stimulus presenta- tions. Unlike block design, event-related design allows the experimenter to estimate the hemodynamic response function from a single event type. The hemodynamic re-

14 sponse can be identified by averaging data acquired after many discrete events. This approach is more powerful than block design because it allows considerable flexibility for determining, for example, responses to novel or aperiodically presented stimuli, or exploring changes over time(Chee et al., 2003). For the statistical analysis of fMRI data, there are two primary goals: the detection of activation in the brain and estimation of the hemodynamic response function. In other words, we want to use the BOLD information to make inferences about the underlying unobserved evoked neural activities, which happens faster than the actually observed BOLD contrast responses. Therefore, it’s important to accurately model and estimate the hemodynamic response function, which may also increase the temporal resolution. We will use the linear time invariant (LTI) system and the convolution model of HRF, so block designs are not suited but event-related designs are more suitable. With long enough ISI, we can better estimate HRF since it would have time to return to the baseline. Moreover, we combine the detection and estimation in one model, and they could improve each other through voxel-wise analysis.

2.1.2 Reviews on Hemodanymic Response Function Models

There are plenty of ways to model the HRF. Mainly, there are multiple methods for parametric HRF, non-parametric HRF and semi-parametric HRF. The choice of HRF could be just a single canonical HRF or the combination of canonical HRF and its temporal derivative, even the combination of canonical HRF and its temporal and dispersion derivatives (Friston et al., 2002, 1998b). Also, the choice of HRF could be a basis set of smooth functions (Friston et al., 1998a), nonparametric model

15 such as finite impulse response basis set (FIR) (Glover, 1999) and a semi-parametric smooth FIR model (Goutte et al., 2000). Moreover, the choice could be nonlinear models with multiple parameters such as non-linear estimation of double gamma function and inverse logit function (Kruggel et al., 2001; Kruggel and Von Cramon, 1999; Lindquist and Wager, 2007; Miezin et al., 2000). Each of those methods have advantages and disadvantages. Fixed parametric models are simple to estimate but may not capture the real characteristics of HRF for different brain regions, e.g. initial dip. Much research shows that HRF’s are different from region to region in the brain or even voxel to voxel. Some neighboring areas may possess similar HRF pattern (Chaari et al., 2012). Since our goal is to model a flexible and accurate HRF to improve the temporal resolution, the assumption of a single HRF for all the voxels is not quite appropriate in our case. Our estimation of HRF is voxel-wise, so the non- parametric and semi-parametric models are too flexible and have too many degrees of freedom. That’s why most of them are performed on pre-determined regions. Since we desire to perform joint detection of activation and estimation, we propose a new parametric triple gamma function which includes a more scientific and flexible shape of HRF and limited number of parameters to estimate. Under Bayesian framework, this new HRF allows us to find some new scientific discoveries about neural activities and brain functions. Before we introduce our new HRF model, we need to take a look at where our idea comes from. It is common to use a single fixed standard shape of HRF, sometimes called a canonical HRF. Figure 2.1 shows the plot of the canonical HRF that is commonly used in the general linear model (GLM) implemented in SPM. SPM’s canonical HRF has 6 parameters (a1, a2, b1, b2, c1) and each of them has a default

16 fixed value (Glover, 1999). The expression of the canonical the HRF is

t t − d t t − d a1 1 a2 2 h(t) = ( ) exp(− ) − c1( ) exp(− ), (2.1) d1 b1 d2 b2

where d1 = a1 ×b1 and d2 = a2 ×b2. The neural activity would increase the metabolic process and those active areas need oxygen that comes from the oxygenated blood. But there is more oxygen supplied than needed. Therefore the deoxy-hemoglobin in those areas decreases and the MR signal in terms of intensity increases. The BOLD response starts to rise approximately 2 seconds after the onset time of the stimulus presentation, peaks at about 6 seconds (Aguirre et al., 1998) and it takes about 6 additional seconds to decrease below the baseline and reach the lowest bottom point. This effect, that sustains for 10 seconds until the BOLD response goes back to baseline, is called post-stimulus undershoot. The undershoot is caused by fast rate of decreasing blood flow and oxygen consumption. Then the larger amount of deoxy-hemoglobin makes the signal decrease rapidly. But some studies have shown evidence that there is an immediate decrease in oxygen after the onset of neural activity that lasts for about 2 seconds after which the BOLD response starts rising to the peak. This initial decrease is called initial dip or negative dip. Voxels that have the initial dip in HRF are more specific to activated areas than voxels that do not have the initial dip (Duong et al., 2000; Thompson et al., 2004). The initial dip possesses natural spatial structure due to distinct existences across different function areas on the brain. The ratio of depth of the dip compared to peak depends on the strength of magnet and is roughly 20% at 3 Tesla (Yacoub and Hu, 1999). Figure ?? B shows an empirical example of HRF for visual and motor cortices corresponding to visual and motor stimuli. Those two HRFs belong to most

17 Figure 2.1: Canonical HRF plot activated areas during an experiment. Figure 2.2 C shows in more detail the difference of their little initial dips (Lindquist et al., 2008).

2.2 New HRF Model

In this section, we describe the general linear model with model assumptions under a Bayesian framework and propose a new triple gamma HRF model.

18 Figure 2.2: (B) Example of empirical HRFs measured over the visual and motor cortices in response to a visual-motor task. (C)The initial 2 seconds of the empirical HRFs give strong indication of an initial decrease in signal immediately following activation.(Lindquist et al., 2008)

2.2.1 General Linear Model and Assumptions

First, we model the BOLD response for each voxel of each subject with a standard general linear model (Friston et al., 1994). The fMRI scanner scans the whole brain every 2 seconds which gives us a volume for multiple time points. The whole volume of the brain is sliced into small pixel cubes called voxels, and each voxel has an intensity at different time points which gives us a time series data. For a given subject, we

define the BOLD response for voxel Vj as yj = (yj,tn )n=1,2,...,N through all the runs

in a session from acquisition time point t1 to tN . Each voxel has a specific triple gamma HRF h(p) = h(d1, d2, d3, b1, b2, b3, c1, c2) and those parameters vary from voxel to voxel. In addition, we assume the linear time invariant (LTI) system for the relationship between stimuli and HRF. Each column of the design matrix X(p) corresponds to the convolution of HRF h(p) and dirac delta function delta(t) of

19 stimulus onset times as follows:

  1, if t = τ δ(t) = , (2.2)  0, if t 6= τ

where τ is the vector of onset times for one type of stimulus. For each type of stimulus, the ikth element of the design matrix is coded as

X xik = {τ k ∗ h(p)}(ti) = xk(ti) = h(ti − τk|p), (2.3)

τ k<=ti

where τ k is the vector of onset times for the kth type of stimulus.

Figure 2.3: Linear convolution with a canonical HRF illustration with random pre- sentations of stimuli (from (Friston, 2003))

Figure 2.3 illustrates a LTI system of convolution of a random sequence of stimulus presentations with a canonical HRF. Based on the matrix form, we model the BOLD

response of voxel Vj, j = 1, 2, ..., J as

yj = Zjγj + Xj(p)βj + j, (2.4)

20 where γj represent intercepts for different runs and movement corrections parameters values of each run, Xj(p) is an N × K matrix corresponding to N acquisition time points for all runs and a total of K types of stimuli. Let j model the Gaussian white

2 2 noise with voxel specific observation variance σj , i.e., j ∼ N(0, σj I). Here are some assumptions and notations made for this model.

1. βj is a k × 1 vector of regression coefficients that represents neural response corresponding to each type of stimulus or experimental condition. We will

utilize βj in the hypothesis test to determine if the voxel has been activated for that given type stimulus. Detecting activation is in terms of testing whether

βj = 0 or not.

2. Estimation of βj also affects estimation of the HRF. Estimating HRF is mean- ingless for non-activated voxels due to small SNR. Later, we will perform Bayesian model selection on it.

3. We apply a cubic spline smoothing filter on the BOLD response data to elimi- nate low-frequency drifts of physiological artifacts and systematic noise (Zarahn et al., 1997) in the preprocessing steps.

2 4. Here, we only consider the simplest noise in this model, i.e., j ∼ i.i.d.N(0, σj IN×N )

2 where σj is unknown, although some research show that fMRI data may be highly auto-correlated (Woolrich et al., 2001; Worsley et al., 2002). We focus on estimation of the HRF, which is not affected much by the temporal cor- relation structure of the noise (Marrelec et al., 2003). Thus, we estimate one specific noise variance for each voxel.

21 Therefore, the likelihood function of voxel Vj is

2 L(yj|p, γ, β, σ ) N Y 2 = N(yj|Zjγj, X(pj)βj, σj I) (2.5) t=1

N 1 2 − 2 T = (2πσj ) exp(− 2 (yj − Zjγj − X(pj)βj) (yj − Zjγj − X(pj)βj)) 2σj

2.2.2 The Triple Gamma Hemodynamic Response Function

In this work, we propose a new parametric form for HRFs h(t, p). This new HRF contains three gamma components and shows three desired characteristics of HRF shapes: initial dip, peak, and undershoot. Under a Bayesian framework, we can estimate this parametric HRF voxel by voxel. The new triple gamma HRF has more flexibility and variability than the single canonical HRF due to the natural characteristic of initial dip. But comparing to non-parametric and semi-parametric methods, parametric HRF has much less degrees of freedom in the estimation. Thus, we can cluster activated voxels based on those characteristics parameters of HRF, which may indicate some natural spatial structures of activations in the brain, e.g. initial dip (Lindquist et al., 2009). Our proposed triple gamma HRF is

h(t, d1, d2, d3, b1, b2, b3, c1, c2) t t − d t t − d t t − d (2.6) a1 1 a2 2 a3 3 = ( ) exp(− ) − c1( ) exp(− ) − c2( ) exp(− ), d1 b1 d2 b2 d3 b3

where t is the time in seconds. Here, dj = ajbj, j = 1, 2, 3, d1 is the time to the

peak, d2 is the time to the lowest point of the undershoot, and d3 is the time to the

22 lowest point of the initial dip. Parameters b1, b2, and b3 are reciprocals of dispersion

of peak, undershoot and initial dip, c1 is the ratio of the amplitude of undershoot to

the amplitude of the peak, and c2 is the ratio of the amplitude of the initial dip to the amplitude of the peak.

Figure 2.4: The new triple gamma HRF plot

Under a Bayesian framework, we choose informative priors for parameters of HRF implied by the default values of the canonical HRF (Glover, 1999). However, we constrain the parameters by assigning variances based on the physiological knowledge of our collaborators. Specifically, the priors for the HRF parameters are the following:

2 π(d1) ∼ NT (6, 1.5 )I(d3 + 3, d2 − 2), (2.7)

23 2 π(d2) ∼ NLT (12, 1.5 )I(d2 > d1 + 2), (2.8)

2 π(d3) ∼ NRT (2, 1 )I(d3 < d1 − 3), (2.9)

2 π(b1) ∼ N(0.9, 0.15 ), (2.10)

2 π(b2) ∼ N(0.9, 0.15 ), (2.11)

2 π(b3) ∼ N(0.9, 0.1 ), (2.12)

π(c1) ∼ Exp(5), (2.13)

π(c2) ∼ Exp(5), (2.14)

where NT stands for truncated normal distribution, NLT stands for left truncated normal distribution and NRT stands for right truncated normal distribution. We assign non-informative priors for σ2 and γ, and a special mixture prior for β which we will explain in details in the next section along with model selection.

1 π(σ2, γ) ∝ . (2.15) σ2

2.3 Inference

In order to jointly detect activation and estimate the parameters, we specify our prior probabilistic belief about the neural response β in the general linear model. There is no analytic method for estimation of all the parameters. Thus, we devise a MCMC algorithm for estimation and use the posterior probability of choosing the model with all non-zero βs as the measure to detect activation.

24 2.3.1 Mixture Priors for β

The voxel is defined as non-activated if no β in the model is different than 0. There are two types of models that we should choose from. If all the βs are different than 0, it is the full model. Conversely, if β = 0 then we have the null model. Here, detection of activation is equivalent to model selection between the null model and the full model. In model selection, selection of the prior is very important. The non-informative prior is not suitable in this situation. A number of theoretical works suggest that Zellner’s-g prior is a reliable choice, and it seems to have the right scale. We introduce a two-class multi-dimension mixture prior Zellner’s g-prior on the k-dimensional neural response vector β,

2 2 2 π(β|σ , λ) = λπ(β|MF , σ ) + (1 − λ)π(β|M0, σ ), (2.16)

2 where λ is the prior probability of activation, and π(β|M0, σ ) = δ(β = 0) indicates the point mass prior for non-activated voxels. In addition, we assume

2 2 T −1 π(β|MF , σ ) = N(0, nσ diag(X0 X0) ). (2.17)

We assume that different types of stimuli are independent of each others and we

T scale them by X0 X0, where X0 is generated by canonical HRF in SPM with default values of parameters and the same stimuli onset times.

25 2.3.2 Model Selection on β by Zellner’s-g prior

Now, we use following priors for the model selection:

1 π(σ2, γ) = , (2.18) σ2

and

2 2 T −1 π(β|σ , γ, λ) = (1 − λ)δ(β = 0) + λN(β|0, nσ diag(X0 X0) ). (2.19)

In our experiment, λ can be estimated by the empirical Bayes method and other reference papers. In order to simplify the notation, we drop j for the number of the voxel. The likelihood for a specific voxel is

L(y|p, γ, β, σ2)

N Y = N(y|Zγ, X(p)β, σ2I) (2.20) t=1

2 − N 1 T = (2πσ ) 2 exp(− (y − Zγ − X(p)β) (y − Zγ − X(p)β)) 2σ2

So the full conditional posterior of β under the full model is

XT X diag(XT X ) β ∼ N((XT X + 0 0 )−1XT (y − Zγ), σ2(XT X + 0 0 )−1), (2.21) n n

and the full conditional posterior of σ2 under the full model is

n + k 1 diag(XT X ) σ2 ∼ IG( , ((y − Zγ − Xβ)T (y − Zγ − Xβ) + βT 0 0 β)). (2.22) 2 2 n

26 The marginal likelihood for the null model M0 conditional on p and γ is

Z Z ∞ 2 2 2 2 L(M0|y, p, γ) = L(y|p, γ, σ , β = 0) × π(β|σ , p, γ, M0)π(σ , γ)dσ dβ RK 0 (π)−N/2Γ( N ) = 2 {(y − Zγ)T (y − Zγ)}N/2 (2.23)

The marginal likelihood for the full model MF conditional on p and γ is

−N/2 T 1 N S 1 − n+k L(M |y, p, γ) = (π) |diag(X X )| 2 Γ( )| | 2 S 2 , (2.24) F 0 0 2 nC

T T diag(X0 X0) T −1 T where C = X X + n and S = (y − Zγ) (I − XC X )(y − Zγ). Moreover, 2 the full conditional posterior of σ under the null model M0 is

N 1 p(σ2|M , γ, β, p) ∼ IG( , (y − Zγ)T(y − Zγ)), (2.25) 0 2 2

and it is the same as the full conditional posterior (2.31) when β = 0. In order to select the best model, we need to calculate the posterior probability of each model as follows:

L(M0|y, p, γ)p(M0) P r(M0|y, p, γ) = (2.26) L(M0|y, p, γ)p(M0) + L(MF|y, p, γ)p(MF)

and

P r(MF |y, p, γ) = 1 − P (M0|y, p, γ). (2.27)

In the next section, we will present details about the MCMC scheme we have devel- oped to estimate the HRF parameters.

27 2.3.3 The MCMC Algorithm for Estimating HRF Parame- ters

In the model, the underlying HRF is convolved to generate the design matrix in the general linear model under the LTI system. Exact and analytic methods are not applicable for estimation of the HRF, especially for all of its parameters. So, we must sample these parameters from their full conditional posteriors by using some MCMC techniques. More specifically, we can sample all the HRF parameters from their full conditional posteriors conditioned on β, γ and σ2. Also, we incorporate model selection between the null model and the full model in the MCMC. Therefore, β and σ2 must jump between those two models. It is reasonable to use a block Gibbs sampler and the random walk Metropolis-Hastings algorithm within MCMC (Liu, 2008). The algorithm is as follows: Suppose at the end of cycle k − 1, we have σ2(k−1),

(k−1) (k−1) (k−1) (k−1) (k−1) (k−1) (k−1) (k−1) (k−1) (k−1) (k−1) γ , β , p = (d1 , d2 , d3 , b1 , b2 , b3 , c1 , c2 ).

• Step 1. The full conditional posterior distribution for γ is multivariate normal. So, we draw γ(k) from

γ(k)|y, σ2(k−1), β(k−1), p(k−1) ∼ MVN((ZT Z)−1ZT (y−X(p(k−1))β(k−1)), (ZT Z)−1σ2(k−1)), (2.28) where MVN stands for multivariate normal distribution.

• Step 2. We calculate the conditional marginal likelihoods for the null model

(k) (k−1) (k) (k−1) L(M0|y, γ , p ) and the full model L(MF |y, γ , p ) given in Equation ( 2.23) and ( 2.24) respectively.

Generate a random number u from Unif(0, 1) and compare it with the condi-

28 tional posteriors ratio of M0 and MF :

L(MF |y, p, γ)p(MF) P r(MF |y, p, γ) = (2.29) L(MF |y, p, γ)p(MF) + L(M0|y, p, γ)p(M0)

(k) If u ≤ P r(MF |y, p, γ), then we choose the full model MF . So, we sample β from the full conditional posterior distribution of β

β(k)|p(k−1), σ2(k−1), γ(k) ∼ MVN((X(p(k−1))T X(p(k−1))+ diag(XT X ) diag(XT X ) 0 0 )−1X(p(k−1))T (y − Zγ(k)), σ2(k−1)(X(p(k−1))T X(p(k−1)) + 0 0 )−1). n n (2.30)

2(k) In addition, if we choose model MF then we sample σ from its full conditional

posterior conditional on MF

n + k σ2(k)|β(k), p(k−1), γ(k) ∼ IG( , 2 T 1 T diag(X X ) ((y − Zγ(k) − X(p(k−1)β(k))T (y − Zγ(k) − X(p(k−1)β(k)) + β(k) 0 0 β(k))). 2 n (2.31)

Conversely, if u > P r(MF |y, p, γ) then we select model M0. In that case,

(k) 2(k) N 1 (k) T (k) we let β = 0 and sample σ from IG( 2 , 2 (y − Zγ ) (y − Zγ )). More

specifically, if model M0 is selected, this voxel can be indicated as non-activated.

But if model MF is selected, this voxel can be identified as activated.

• Step 3. Last, we simulate all the parameters of HRF p = (d1, d2, d3, b1, b2, b3) by using Gibbs sampling and random walk Metropolis-Hastings from their full conditional posterior distributions.

29 Then, we compute and update the design matrix X(p(k)) with new samples of those HRF parameters.

We divide Gibbs sampling of 8 parameters into 3 blocks where each block cor- responds to one of the three components in the HRF.

∗ – Draw log(d1 ) from the proposal distribution

∗ (k−1) (k−1) Jk(d1 |d1 ) = logTN (k−1) (k−1) (log(d1 ), τ0), (2.32) (log(d3 +3),log(d2 −2))

∗ where logTN stands for log truncated normal distribution, and draw log(b1 ) from the proposal distribution

∗ (k−1) (k−1) Jk(b1 |b1 )) = N(b1 , τ0). (2.33)

∗ ∗ (k) ∗ Use the d1 and b1 to compute a new design matrix X1. Set d1 = d1

(k) ∗ and b1 = b1 with acceptance probability r1 and replace design matrix

X by X1. Then we update d1 and b1 with new samples.

∗ – Draw log(d2 ) from the proposal distribution

∗ (k−1) (k−1) Jk(d2 |d2 ) = logTN (k) (log(d2 ), τ1), (2.34) (log(d1 +2),+∞)

∗ draw log(b2 ) from the proposal distribution

∗ (k−1) (k−1) Jk(b2 |b2 )) = N(b2 , τ1), (2.35)

30 ∗ and draw log(c1 ) from the proposal distribution

∗ (k−1) (k−1) Jk(c1 |c1 )) = N(c1 , τ1). (2.36)

∗ ∗ ∗ (k) ∗ Use d2 , b2 and c1 to compute a new design matrix X2. Set d2 = d2 ,

(k) ∗ (k) ∗ b2 = b2 and c1 = c1 with acceptance probability r2 and replace

design matrix X by X2 based on new samples of d2, b2, c1, d1 and b1.

∗ – Draw log(d3 ) from the proposal distribution

∗ (k−1) (k−1) Jk(d3 |d3 ) = logTN (k) (log(d3 ), τ2), (−∞,log(d1 −3))

∗ draw log(b3 ) from the proposal distribution

∗ (k−1) (k−1) Jk(b3 |b3 )) = N(b3 , τ2), (2.37)

∗ draw log(c2 ) from the proposal distribution

∗ (k−1) (k−1) Jk(c2 |c2 )) = N(c2 , τ2). (2.38)

∗ ∗ ∗ (k) ∗ Use d3 , b3 and c2 to compute a new design matrix X3. Set d3 = d3 ,

(k) ∗ (k) ∗ b3 = b3 and c2 = c2 with acceptance probability r3 and replace

design matrix X by X3 based on all the accepted new samples.

Following this manner and after some burn-in, we can get estimates of all the parameters and the posterior probability of choosing MF over M0. We apply this procedure to each voxel of the whole brain.

31 Chapter 3

Bayesian Model Selection

This chapter focuses on developing the Bayesian model selection for our class of mod- els for fMRI data. The chapter is divided into four sections. Section 3.1 introduces a new class of nonlocal priors to perform Bayesian model selection. Section 3.2 de- velops a procedure to perform Bayesian model selection on neuronal responses or regression coefficients of the model mentioned in the previous chapter in order to choose activated voxels. Section 3.3 introduces the Bayesian method to control the false discovery rate on testing activated voxels which can be seen as thousands of multiple tests.

3.1 Bayesian Model Selection and Nonlocal Priors

In this section, we introduce a new class of priors to perform the Bayesian model selection. We only consider two types of models, the null model and the full model in our method. It reflects our prior knowledge about parameters in those different

32 models. According to our physiological knowledge of neuronal responses, we adopt a new class of non-local priors in our model selection Johnson and Rossell(2010). Nonlocal priors possess some attractive features for neuronal responses in our model. Each voxel, it is identified as non-activated if neuronal responses are zeros. It is also recognized as the null model. If neuronal responses are positive, the voxel has been activated. So, the true model under this situation is the full model. Because of this characteristic, we impose a nonlocal prior density on the neuronal response parameters for model selection. The nonlocal prior density is identically zero when parameter is equal to the null value and it is symmetric around the null value. On the other hand, local prior densities are always positive at null values in the model selection. As the sample size increases, the Bayes factor of using local priors would prefer true null models than true alternative models. Using the nonlocal prior improves this situation. The fMRI data is really noisy and is hard to detect the signal, so it is crucial to distinguish between the null model and the full model. Moreover, in this model selection method, we can also provide the posterior probability of choosing the full model, which approximates to an empirical probability that the full model is true. Next, we will introduce a modified nonlocal prior.

3.2 Model Selection on β

3.2.1 Nonlocal Positive Truncated pMOM

Let us first consider a class of non-local prior densities for β proposed by Johnson and Rossell(2010) and Johnson and Rossell(2012) for Bayesian model selection.

33 Specifically, we consider the first order product moment (pMOM) prior density defined as

K 2 −K/2 2 −rK−K/2 1 1 T Y 2 π(β|σ , τ, γ, p) = d (2π) (τσ ) |A | 2 exp(− β A β)( β r ), K K 2τσ2 K i i=1 (3.1) where r = 1, τ > 0, AK is a K × K nonsingular matrix and dK is the normalizing constant independent of σ2 and τ. The parameter τ represents a scale parameter that determines the dispersion of the prior density around 0. If the columns of the design matrix have been standardized, AK = IK , where IK is the identity matrix. In that case, Johnson and Rossell(2010) and Johnson and Rossell(2012) suggest a common value of τ for each component of β, e.g. τ = 0.348. For illustration, Figure 3.1 represents a prior that is a mixture of a one-dimension pMOM and a point mass at 0. We propose a new pMOM prior implied by Johnson and Rossell(2012) given in equation 4.1, but adapted to our fMRI framework. First, in our model for fMRI data, we have developed in Chapter 2 a new HRF model with 3 components and 8 parameters. This new HRF implies a distinct design matrix X(p) for each set of parameters p. To account for the distinct amount of information for each element

T of the vector β, one could set the matrix AK to be equal to X(p) X(p). However, that would imply that the prior in equation 4.1 would appear in the expression for the full conditional distribution of p. Another solution that is computationally more attractive is to use the design matrix X0 implied by the canonical HRF to define

T AK = diag(X0 X0). Here X0 = X(pSPM ) is generated by the vector of the same stimulus onset time. Another issue is the multimodality of the posterior distribution

34 Figure 3.1: Density plot of a mixture of a one-dimensional nonlocal pMOM and a point mass at 0. that occurs for unrestricted β when we use our three components HRF. To eliminate the multimodality, we constrain each element of β to be positive. Specifically, we propose to modify the standard pMOM prior to be a nonlocal positive truncated pMOM as following,

T T K 2 −K/2 2 −3K/2 T 1 β diag(X0 X0)β Y 2 π(β|σ , τ, γ) = d (2π) (τσ ) |diag(X X )| 2 exp{− }{ β I }, K 0 0 2τσ2 i (βi>0) i=1 (3.2) where

"Z K # 1 1 Y −K/2 T 2 T T 2 −1 dk = (2π) |diag(X0 X0)| exp(− ξ diag(X0 X0)ξ)( ξi )dξ) . (3.3) K 2 R+ i=1

35 3.2.2 Bayesian Model Selection for β

The regression coefficient vector β can be seen as corresponding to the neural response to K types of stimuli. We introduce a mixture prior model for β,

2 2 π(β|σ , τ, λ) = λπ(β|MF , σ , τ) + (1 − λ)π(β|M0). (3.4)

Here, λ is the prior probability of the full model, π(β|MF ) is the prior density for β under the full model MF , and π(β|M0) = δ(β = 0) indicates a point mass prior at zero that corresponds to neural responses β of the non-activated voxel. However, common local prior densities for model selection (Liang et al., 2008) may not be adequate to differentiate the noise from the true activation. In other words, common local prior densities are positive when parameters β are at null values and the Bayes Factor based on common local priors may not prefer to choose the null model over the alternative model even it is the true model. That’s the reason that we prefer the nonlocal prior in our method. Thus, the prior of σ2, γ and the mixture prior is presented as following:

1 π(σ2, γ) ∝ , (3.5) σ2 and

π(β|σ2, γ, λ) = (1 − λ)δ(β = 0)

K −K/2 2 −3K/2 T 1 1 T T Y 2 + λd (2π) (τσ ) |diag(X X )| 2 exp{− β diag(X X )β}{ β I }, K 0 0 2τσ2 0 0 i (βi>0) i=1 (3.6)

36 2 Here the normalizing constant dK is independent of τ and σ but is dependent on the dimension of the vector β. The dimension of the neural response β corresponds to the number of different types of stimuli. In particular,

"Z K #−1 1 1 Y −K/2 T 2 T T 2 dk = (2π) |X0 X0| exp(− ξ X0 X0ξ)( ξi )dξ K 2 R+ i=1 k −k Y 2 −1 = 2 ( E T −1 [ξ ]) (3.7) N+(0,(X0 X0)jj ) j j=1 k −k Y T = 2 [(X0 X0)jj)]. j=1

q 2 2 2 2 2 Here we use the fact that EN+(0,σ )[X] = σ π and V arN+(0,σ )(X) = σ (1 − π ), and 2 2 2 2 2 2 EN+(0,σ )[X ] = V arN+(0,σ )(X) + (EN+(0,σ )[X]) = σ .

T Thus, the parameter τ and diag(X0 X0) together determine the dispersion of T the prior density of β around 0. Note that by using diag(X0 X0), we are in fact standardizing β by the diagonal of the Fisher information under the canonical HRF. As a consequence, we may follow the recommendation of Johnson and Rossell(2010) and use the default value τ = 0.348. Under the Bayesian framework, we can detect if the voxel is activated based on its neural response and estimate all the HRF parameters simultaneously. The likelihood of voxel j is

2 L(yj|p, γ, β, σ ) N Y 2 = N(yj|Zjγj, X(pj)βj, σj I) (3.8) t=1

N 1 2 − 2 T = (2πσj ) exp{− 2 (yj − Zjγj − X(pj)βj) (yj − Zjγj − X(pj)βj)}. 2σj

37 So, the posterior distribution is:

2 2 2 2 p(pj, γj, βj, σj |yj) ∝ L(yj|pj, γj, βj, σj )π(pj)π(βj|σj , γj)π(σj , γj), (3.9)

where pj is the vector of all the HRF parameters and π(pj) is the prior of the HRF parameters of voxel j. We set this prior based on physiological knowledge of our collaborators. The marginal likelihood under the null model conditional on p and γ is

Z Z ∞ 2 2 2 2 m(y|M0) = m(y|p,M0, γ) = L(y|p, γ, σ , β = 0) × π(β|σ , p, γ,M0)π(σ , γ)dσ dβ RK 0 Γ( N ) = (2π)−N/2 2 . 1 T N/2 { 2 (y − Zγ) (y − Zγ)} (3.10)

In addition, the full conditional distribution of σ2 under the null model is

N 1 σ2|M , γ, β, p ∼ IG( , (y − Zγ)T (y − Zγ)). (3.11) 0 2 2

The marginal likelihood under the full model conditional on p, γ and τ is

Z Z ∞ 2 2 2 2 m(y|MF ) = m(y|p,MF , γ, τ) = L(y|p, γ, σ , β)π(β|σ , p, γ,MF )π(σ , γ)dσ dβ K R+ 0 K − N ν − 3K T 1/2 −1/2 ν 2 − ν + Y 2 = d (2π) 2 2 2 τ 2 |diag(X X )| |C | Γ( )(νs ) 2 P (β > 0)E ( β ), K 0 0 K 2 K K i i=1 (3.12)

38 where

"Z K #−1 1 1 Y −K/2 T 2 T T 2 dK = (2π) |diag(X0 X0)| exp(− ξ diag(X0 X0)ξ)( ξi )dξ , (3.13) K 2 R+ i=1

ν = N + 2K, (3.14)

1 C = X(p)T X(p) + I , (3.15) K τ K R s2 = K . (3.16) K ν and

T −1 T RK = (y − Zγ) (IN − X(p)CK X(p) )(y − Zγ). (3.17)

Here, Ep denotes expectation with respect to a truncated positive multivariate non- central t distribution with ν degrees of freedom, mean β˜ and covariance matrix

2 −1 sK CK . Since we have a relatively large sample size and large degrees of freedom, we can approximate the truncated positive multivariate non-central t distribution with the truncated positive multivariate normal distribution. In order to approximate the second product moment of the limiting normal distribution of β, we need to find the maximum value of β and its Hessian matrix by using the Newton-Raphson optimiza- tion algorithm. Then, we use Laplace approximation to the second product moment of this normal distribution to get the conditional marginal likelihood of MF . In this way, we can also find the maximum a posteriori (MAP) estimate β∗, which is the mode of the conditional posterior distribution of β and can be computed as

K ν Y β∗ = argmax {MVN (β; β˜, s2 C−1) (β2)}, (3.18) β + ν − 2 K K i i=1

39 where ˜ −1 T β = CK X(p) (y − Zγ), (3.19)

and MVN+ stands for the truncated positive multivariate normal distribution. The conditional posterior distribution under the full model is

K ν Y β|y, p, γ, σ2 ∼ MVN (β; β˜, s2 C−1) (β2). (3.20) + ν − 2 K K i i=1

We approximate the multivariate truncated positive non-central t density by its

ν 2 −1 limiting normal distribution with the same mean and variance ν−2 sK CK . After using ˜ ν 2 −1 QK 2 the Newton-Raphson method, we can find the mode of MVN+(β; β, ν−2 sK CK ) i=1(βi ). By a standard Laplace approximation, we can get the conditional marginal likelihood under the full model conditional on p, τ, and γ:

−N/2 ν −3K/2 ν T T − ν m(y|p, τ, γ) = d (2π) 2 2 τ Γ( ){(y − Zγ) (y − Zγ) − β˜ C β˜} 2 P (β > 0) k 2 K QK ∗ 2 ν−2 ∗ ˜ T CK ∗ ˜ 1 { i=1(β )i }exp{− (β − β) 2 (β − β)} T 2ν sK × |diag(X X0)| 2 , 0 νs2 ∗ 1 K 2 |CK + 2 ν−2 D(β )| (3.21)

where D(β∗) is the diagonal matrix with entry (i,i) given by 1/(β∗)2. Also, the

2 posterior distribution of σ conditional on MF is

N 3K 1 1 σ2|M , γ, β, p ∼ IG( + , (y−Zγ−X(p)β)T(y−Zγ−X(p)β)+ βTdiag(XTX )β), F 2 2 2 2τ 0 0 (3.22)

40 and the posterior distribution of β conditional on MF does not have a closed form

˜ ν 2 −1 QK 2 2 N+(β, ν−2 sK CK ) i=1 βi p(β|σ , γ, p,MF ) = . (3.23) R ˜ ν 2 −1 QK 2 R2 N+(β, ν−2 sK CK ) i=1 βi dβ

For model selection, we calculate the posterior probability of each model as

m(y|p, γ,M0)p(M0) P r(M0|y, p, γ) = (3.24) m(y|p, γ,M0)p(M0) + m(y|p, γ,MF )p(MF ) and

P r(MF |y, p, γ) = 1 − P (M0|y, p, γ). (3.25)

We can obtain all the posterior estimates βˆ as well as the posterior probabilityp ˆ of MF for each voxel.

3.2.3 The MCMC Algorithm for Posterior Simulations

Inference for the unknown parameters of our model cannot be performed analytically. Part of the difficulty is that the design matrix depends nonlinearly on the HRF parameters. In order to overcome this difficulty, we develop an MCMC algorithm that uses block Gibbs samplers and Metropolis-Hastings within Gibbs samplers Liu (2008). The algorithm is developed as follows: Suppose at the end of cycle k-1, we have σ2(k−1), γ(k−1), β(k−1), p(k−1).

• Step 1. Apply the Newton-Raphson method to search β∗ which maximise the

Laplace approximation of the conditional marginal likelihood m(y|MF ) and find its corresponding Hessian matrix H. Then the conditional posterior probability

(k−1) (k−1) P r(MF |y, p , γ ) can be computed. Generate a random number u from

41 Unif(0, 1) and compare it with the conditional marginal likelihood ratio of M0 and MF :

m(y|p, γ,MF )p(MF ) P r(MF |y, p, γ) = . (3.26) m(y|p, γ,M0)p(M0) + m(y|p, γ,MF )p(MF )

If u ≤ P r(MF |y, p, γ), then we decide to choose the full model MF . And we sample β(k−1) from

˜ ν 2 −1 QK 2 MVN+(β, s C ) β ν−2 K K i=1 i , (3.27) R ˜ ν 2 −1 QK 2 R2 MVN+(β, ν−2 sK CK ) i=1 βi dβ where ν = N + 2K, (3.28)

(k−1) T (k−1) −1 (k−1) T (k−1) RK = (y − Zγ ) (IN − X(p )CK X(p ) )(y − Zγ ), (3.29)

1 C = X(p(k−1))T X(p(k−1)) + diag(XT X ), (3.30) K τ 0 0

˜ −1 (k−1) T (k−1) β = CK X(p ) (y − Zγ ), (3.31) and R s2 = K . (3.32) K ν

Here, we adopt the acceptance-rejection method and draw the candidate from

42 ∗ −1 N+(β , 2.64 ∗ H ). Then the acceptance probability can be calculated as min(1, r) where

2 1/2 r = |Ck/σ | P (β > 0) 1 ˜ T ˜ 2 2 R ˜ ν 2 −1 QK 2 exp{− 2σ2 β − β) Ck(β − β)}β1 β2 / R2 N+(β, ν−2 sK CK ) i=1 βi dβ × + , H 1 1 ∗ ∗ 2 T c ∗ | 2.64 | exp{− 2∗2.64 (β − β ) H(β − β )} (3.33) where

2 1/2 c = |Ck/σ | P (β > 0) 1 ∗ ˜ T ∗ ˜ 2 2 R ˜ ν 2 −1 QK 2 exp{− 2σ2 β − β) Ck(β − β)}β1 β2 / R2 N+(β, ν−2 sK CK ) i=1 βi dβ × + . H 1 2 | 2.64 | (3.34)

Then, draw (σ2(k)|γ(k−1), β(k), p(k−1), y) from the full conditional distribution

1 IG(n/2 + 3K/2, ( (y − Zγ(k−1) − X(p(k−1))β(k))T 2 (3.35) 1 (y − Zγ(k−1) − X(p(k−1))β(k)) + (β(k)T diag(XT X )β(k)))−1), 2τ 0 0 where IG stands for the inverse gamma distribution. If u > P r(MF |y, p, γ),

(k) 2(k) then we would choose the null model M0, i.e., β = 0, and sample σ from

N 1 (k−1) T (k−1) IG( 2 , 2 (y − Zγ ) (y − Zγ )). The selected model is jumping between the full model and the null model. As the iteration goes on, the posterior probability of choosing the full model can be calculated.

43 • Step 2. Draw (γ(k)|σ2(k), β(k), p(k−1), y) from the full conditional distribution

MVN((ZT Z)−1ZT (y − X(p(k−1))β(k)), (ZT Z)−1σ2(k)). (3.36)

• Step 3. Last but not the least, we simulate all the HRF parameters p =

(d1, d2, d3, b1, b2, b3) by using block Gibbs samplers and random walk Metropolis- Hastings from their full conditional posterior distributions. Then we compute and update design matrix X(p) with new samples of HRF parameters. We sam- ple 8 parameters from 3 blocks of Gibbs samplers according to three components of HRF.

∗ – Draw log(d1 ) from the proposal distribution

∗ (k−1) (k−1) Jk(d1 |d1 ) = logTN (k−1) (k−1) (log(d1 ), τ0), (3.37) (log(d3 +3),log(d2 −2))

where logTN stands for the truncated log-normal distribution, and draw

∗ log(b1 ) from the proposal distribution

∗ (k−1) (k−1) Jk(b1 |b1 )) = N(b1 , τ0). (3.38)

∗ ∗ (k) ∗ Use d1 and b1 to compute a new design matrix X1. Set d1 = d1

(k) ∗ and b1 = b1 with the acceptance probability r1 and replace the design

matrix X by X1. Then update d1 and b1 with new samples.

∗ – Draw log(d2 ) from the proposal distributions

∗ (k−1) (k−1) Jk(d2 |d2 ) = logTN (k) (log(d2 ), τ1), (3.39) (log(d1 +2),+∞)

44 ∗ draw log(b2 ) from the proposal distributions

∗ (k−1) (k−1) Jk(b2 |b2 )) = N(b2 , τ1), (3.40)

∗ and draw log(c1 ) from the proposal distributions

∗ (k−1) (k−1) Jk(c1 |c1 )) = N(c1 , τ1). (3.41)

∗ ∗ ∗ (k) ∗ Use d2 , b2 and c1 to compute a new design matrix X2. Set d2 = d2 ,

(k) ∗ (k) ∗ b2 = b2 and c1 = c1 with the acceptance probability r2 and replace

the design matrix X by X2 based on new samples of d2, b2, c1, d1 and b1.

∗ – Draw log(d3 ) from the proposal distribution

∗ (k−1) (k−1) Jk(d3 |d3 ) = logTN (k) (log(d3 ), τ2), (−∞,log(d1 −3))

∗ draw log(b3 ) from the proposal distribution

∗ (k−1) (k−1) Jk(b3 |b3 )) = N(b3 , τ2), (3.42)

∗ draw log(c2 ) from the proposal distributions

∗ (k−1) (k−1) Jk(c2 |c2 )) = N(c2 , τ2). (3.43)

∗ ∗ ∗ (k) ∗ Use d3 , b3 and c2 to compute a new design matrix X3. Set d3 = d3 ,

(k) ∗ (k) ∗ b3 = b3 and c2 = c2 with the acceptance probability r3 and replace

the design matrix X by X3 based on all the accepted new samples.

45 3.3 Multiple Tests and FDR Control

We obtain all the estimates βˆ and HRF parameters p as well as the posterior proba- bilityp ˆ of choosing MF for each voxel. We use the posterior probability of selecting the full model as the measure to determine if the voxel has been activated. There are thousands of voxels in the brain, so this procedure of testing activation is repeated thousands times. We have to consider multiple test effects. It is not reasonable to set an arbitrary threshold for posterior probabilities such as the significance level in the hypothesis test. We have many thousands of simultaneous hypothesis tests for testing the activation and the power is critical in these tests. But the number of false discoveries can grow with the number of simultaneous tests. Thus, we should consider some methods of correcting the number of false discoveries in multiple comparisons. We choose the false discovery rate (FDR) control, which is a statistical method for multiple hypothesis testing problems. The FDR procedure is designed to control the expected proportion of incorrectly rejected null hypotheses (Benjamini and Hochberg, 1995). But it exerts a less stringent control over the false discovery compared to fam- ilywise error rate (FWER) procedures such as Bonferroni correction, which seek to reduce the probability of even one false discovery, as opposed to the expected pro- portion of false discoveries. Thus FDR procedures have greater power at the cost of increased rates of type I errors. Since we get the posterior probability of choosing the full model when the model jumps between the null model and the full model, we pre- fer to adopt the Bayesian FDR procedure (Muller et al., 2006) on all these posterior probabilities. Consider an indicator variable ri such that ri = 1 if β 6= 0, which are truly activated and ri = 0 otherwise. In addition, let δi indicate a decision such that

δi = 1 if we decide that voxel i is activated and δi = 0 otherwise. From the MCMC,

46 we get the posterior probability of choosing the full modelp ˆi = P r(ri = 1|data), which is equivalent to the posterior probability of activation. We may control the

posterior probability thatp ˆ is greater or equal to a certain threshold p0 which gives the FDR as PN (1 − r )I(p ˆ ≥ p ) FDR = i=1 i i 0 , (3.44) PN i=1 I(p ˆi ≥ p0) Thus, the expected FDR is

PN (1 − pˆ )I(p ˆ ≥ p ) FDR\ = i=1 i i 0 , (3.45) PN i=1 I(p ˆi ≥ p0)

Alternatively, when we compare all the voxels of the brain, we may use a nominal

FDR level q0 (Muller et al., 2006) to control the expected FDR. Specifically, we

rank all the posterior probabilitiesp ˆ(i) in the decreasing order. Denote the ordered posterior probabilities asp ˆ(1) > pˆ(2) > ··· > pˆ(N). If we declare the set of activated voxles asp ˆ > pˆd for each d = 1,...,N. Then the corresponding posterior expected FDR be

PN i=1(1 − pˆi)I(p ˆi ≥ pˆ(d)) FDR\d = PN I(p ˆ ≥ pˆ ) i=1 i (d) (3.46) Pd (1 − pˆ ) = i=1 i , d = 1,...,N, d

Finally, we decide the activated voxels as any voxels with FDR\d < q0.

47 3.4 Simulation Study

Here, we perform a simulation study to evaluate the performance of our proposed inferential procedure. We add three synthetic activations to a slice of a null data matrix (50 × 60). Those three activated regions possess three distinct region-specific HRFs. We simulate data from convolution of those HRFs with stimulus set time vector of four different types of stimuli (visiual stimulus and motor stimulus). The stimulus set time vector is the same as in the real experiment that we describe in the next section. For all those selected 58 voxels, we assume different regression coefficients β for each voxel denoted as “activated” regions. Moreover, within each voxel, we simulate 964 time points data as time-series data, which also are consid- ered the different runs effects as intercepts in the general linear model by using the independent Gaussian noise with σ2 = 64 which has relatively small signal to noise ratio. Figure 3.2 shows the graph of average of each voxel in this synthetic data set. We implement both the traditional fixed double-gamma HRF GLM as implemented in SPM and our new three-component HRF to this data set in order to compare them. For both methods, we estimate γ, β and particularly for our method, estimate the HRF parameters p and compute the posterior probabilities map of activation. Figure 3.3 presents the map of posterior probabilities of activation. As we mention in the previous section, we control FDR of activation to determine a threshold for activated regions. Figures 3.4, 3.5 and 3.6 present reconstructions of those three region specific HRFs. We compare reconstructions of four types of regression coefficients maps. Results are presented by standardized γ maps and standardized beta maps for both methods.

48 Figure 3.7 and 3.8 are from using fixed double gamma HRF to fit the model. They are quite noisy and can hardly detect real activation areas for those brain areas which may possess significant initial dips by just using β which are corresponding coefficients to those stimulus time vector convolution with canonical HRF design matrix columns. Figure 3.9 and 3.10 are from using proposed three-gamma components HRF to fit the model. By using our procedure, if the underlying HRF truly possesses a initial dip, our method can accurately capture this characteristic and is also adaptable to different HRF shapes. From Table 3.1, MSE of γ are similar for both methods but from Table 3.2, we can tell that our method is superior to the canonical HRF in terms of MSE. Moreover, our goal is to find time difference between visual area activation and motor area activation which needs specific estimates of HRF parameters. It is impossible to get those differences by using fixed canonical HRF. It is obvious to see that our method performs really well to identify those activated regions and estimate their region- specific HRF when the brain areas have the desired HRF shape.

MSE Parameter New HRF Canonical HRF γ1 0.274412 0.312371 γ2 0.283827 0.317333 γ3 0.266592 0.301243 γ4 0.270618 0.305965 Table 3.1: Comparison MSE of γ for two methods

49 MSE Parameter New HRF Canonical HRF β1 0.024684 2.931094 β2 0.031074 3.357522 β3 0.022589 2.553101 β4 0.020669 2.740783 Table 3.2: Comparison MSE of β for two methods

3.5 Real Data Applications

We analyze an experiment with different types of stimuli. The experiment consists of four different types of stimuli. Multiple subjects were scanned in the Brain Image Center (BIC) of University of Missouri-Columbia. It is a Magnetic Resonance (MR) Research Facility located on the Campus. The center is affiliated with the Depart- ment of Psychological Sciences. The centerpiece of the BIC is an 8-channel Siemens TIM Trio 3 Tesla MR System (Figure 3.11) and supporting hardware and software necessary for a wide variety of research applications. We model the time course of the brain by using our proposed new HRF in the previous chapter through picture orders and finger tapping tasks. The task-induced activations are measured by the existence and the arrival time of the early consumption of oxygen. We have two types of exper- iments. A preliminary experiment with 1 subject that we describe in section 3.5.1, and the main experiment with 4 subjects that we describe in Section 3.5.2.

50 Figure 3.11: Siemens TIM Trio 3 Tesla MR Scanner in the BIC

3.5.1 A Preliminary Experiment

One subject went through a similar experiment but a little different from what we did later. In this experiment, the participant lays down in the MRI scanner and is expected to see two boards, the cue board and the checkerboard. The cue board is used to show which side of the checkerboard to anticipate and can be seen as an early stimulus for visual area. After some delay time, there is a checkerboard shown on the indicated side with either letter “L” or “R”. If the letter is “L”, the participant pushes the button on the left side; if the letter is “R”, the participant pushes the button on the right side. Here, the letter can be seen as the stimulus for motor areas. In this experiment, we have one session with 4 runs. The delay times between the cue

51 board and the checkerboard are jittered for different runs. Since we are not sure about how to find the time difference from visual areas and motor areas. Although we find something interesting, this is not our primary goal but can guide later experiments. Therefore, I still decide to present those results here. For this subject, we run our model selection with the non-local prior and parameter estimations. It reflects some interesting spatial structures on the principal component analysis (PCA) results of estimates of all the HRF parameters for activated voxels. We apply the FDR control on the posterior probabilities of all the voxels and index the activated voxels with posterior probabilities larger than the threshold by FDR control at 0.01 significance level. Then we pool the estimates of 8 HRF parameters of all the activated voxels and perform the principal component analysis on the HRF estimations. We plot out all the activated voxels in four colors according to their first two scores. Figures 3.12, 3.13, 3.14 and 3.15 present 3D results in dfferent angles. From those plots, there is a natural spatial structure of characteristics of HRF parameters. Differ- ent brain regions possess different HRF shapes of their underlying neuron activities. Figure 3.16 shows HRF shapes with different colors corresponding to different brain regions in the 3D plot. From this plot, it is difficult to clearly distinguish HRF shapes just by the initial dip. Because there are too many voxels and most of them are not located in the interesting areas. So we need more sophisticated model selection methods. However, those plots indicate that we are in the right direction to add the spatially varied component to the HRF.

52 3.5.2 Main Experiment

Now, we illustrate the modified experiment. The participant needs to see the cue board and the checkerboard. The cue board is used to show which side of the checker- board to anticipate and can be seen as an early stimulus for visual area. Then after some delay time, there is a checkerboard shown on the indicated side with either the letter “L” or “R”. If the letter is “L”, the participant pushes the button on the left side; if the letter is “R”, the participant pushes the button on the right side. Here, the letter can be seen as the stimulus for motor areas. In this experiment, we have two sessions and each session has 4 runs. The dif- ference between the first and second session is the delay time between the cue board and the checkerboard. If the participant reaction time is too long, i.e., longer than 3 seconds, we denoted that stimulus does not stimulate the interested visual or motor areas. The whole brain is scanned every 2 seconds and consecutive 246 time points in a run. The procedure combining estimation and model selection in one procedure on all the voxels of the whole brain is really time-consuming. So we use two stage method. We run the traditional fixed canonical HRF with a moderate initial dip in a general linear regression on all the voxels. We assume that, for those activated voxels or surrounding voxels, there exists an initial dip. It can provide us estimates and standard errors of regression coefficients, and we perform the F -test on each voxel. Then we get their p-values and it is possible to use FDR control to set a threshold of p-values for all those voxels. We just want to see if our procedure can really work on those voxels which pass the threshold p-value. That reduces many non-related voxels, but the drawback is that we may lose some voxels with significant initial dip

53 or without initial dip if we screen all the voxels just by a fixed HRF. Generally, we can still find the pattern or some interesting phenomenon. Since the most interesting characteristic is the initial dip, we mainly focus on two HRF parameters related to the initial dip, i.e., the ratio of amplitude of initial dip to the positive peak amplitude c2 and initial dip peak time d3. Voxels possess smaller initial dip which would have larger ratio of amplitude of initial dip to the positive peak amplitude and shorter initial dip peak time. On the opposite, voxels possess larger initial dip which would have smaller ratio of amplitude of initial dip to the positive peak amplitude and longer initial dip peak time. The following results are for 4 subjects. For each subject, we have histograms of estimates of c2 and d3. It is easy to see how they distribute and it may give us some idea about the necessity to test if there exists an initial dip in terms of testing if c2 = 0. Figure ?? to ?? present results of each subject. From the histograms, it is hard to tell if the second session has an averagely longer delay time than the first one. From the 3D plots, there are some patterns in the different brain areas. We denote voxels with relative higher early oxygen consumption in red. Most of those activated voxels with larger initial dips are located at the lower part of the back brain where neurons controlling voluntary muscle movements and fine motor skills have been activated and some of them are at the top of the brain which are motor areas. We denote voxels with relative lower or no early oxygen consumption in blue. Most of them are located at the middle part and sides of the back brain which are the Occipital lobe and visual areas. They are not very clearly separated due to mixed stimuli presentations. When the subject is looking at the motor task order and performing it, both visual areas and motor areas could be stimulated simultaneously. In addition, motor areas control eye movements

54 and some of the motor areas also could be stimulated as earlier as the visual areas. In the next chapter, we will only focus on the areas that we are interested and perform ROI analysis in order to time the difference between visual and motor areas.

55 Figure 3.2: Noisy Synthetic Data

56

Figure 3.3: Map of Posterior Probability of Alternative Model Figure 3.4: Left HRF Figure 3.5: MiddleBack HRF

Figure 3.6: Right HRF

57 Figure 3.7: standardized γ for canonical HRF GLM

58 Figure 3.8: standardized β for canonical HRF GLM

59 Figure 3.9: standardized γ for new method

60 Figure 3.10: standardized β for new method

61 Figure 3.12: PCA Result on HRF parameters’ estimations from an- gle 1

Figure 3.13: PCA Result on HRF parameters’ estimations from an- gle 2 62 Figure 3.14: PCA Result on HRF parameters’ estimations from an- gle 3

Figure 3.15: PCA Result on HRF parameters’ estimations from an- gle 4 63 Figure 3.16: HRF shapes for voxels with posterior probabilities larger than the thresh- old defined by FDR control at 0.01 significance level and colors correspond to PCA results plots.

64 Figure 3.17: First session histograms of d3 and c2 of subject 7.

Figure 3.18: First session c2 of subject 7. Figure 3.19: First session d3 of subject 7.

65 Figure 3.20: Second session histograms of d3 and c2 of subject 7.

Figure 3.21: Second session c2 of sub- ject 7. Figure 3.22: Second session d3 of subject 7.

66 Figure 3.23: First session histograms of d3 and c2 of subject 8.

Figure 3.24: First session c2 of subject 8. Figure 3.25: First session d3 of subject 8.

67 Figure 3.26: Second session histograms of d3 and c2 of subject 8.

Figure 3.27: Second session c2 of subjectFigure 3.28: Second session d3 of subject 8. 8.

68 Figure 3.29: First session histograms of d3 and c2 of subject 9.

Figure 3.30: First session c2 of subject 9. Figure 3.31: First session d3 of subject 9.

69 Figure 3.32: Second session histograms of d3 and c2 of subject 9.

Figure 3.33: Second session c2 of subjectFigure 3.34: Second session d3 of subject 9. 9.

70 Figure 3.35: First session histograms of d3 and c2 of subject 10.

Figure 3.36: First session c2 of subject 10.Figure 3.37: First session d3 of subject 10.

71 Figure 3.38: Second session histograms of d3 and c2 of subject 10.

Figure 3.39: Second session c2 of subjectFigure 3.40: Second session d3 of subject 10. 10.

72 Chapter 4

Timing Brain Activation

In our specific designed cognitive experiments, the participant pushes the button after the see the cue board to indicate the side of checkerboard to appear and the checkerboard with letter to indicate sides of the button to push. The cue board shows up earlier than the checkerboard, so there is a delay time between cue board and checkerboard. Here, the cue board can be seen as the stimulus for visual areas and it stimulates the visual areas to wait for the checkerboard to appear from the expected side. The letters on the checkerboard can be seen as the motor stimulus. The subject taps his left or right finger as the letter on the checkerboard. Because of the delay between the visual stimulus and motor stimulus, we expect to see that the visual areas activated earlier than the motor areas and find the more accurate time differences. There are four types visual and motor combined stimuli. They are left cue board and letter ”L”, left cue board and letter ”R”, right cue board and letter ”L”, and right cue board and letter ”R”. But a fundamental problem with this experiment design is that the imperative for motor activation is given in

73 a visual manner. Unfortunately, that means some visual voxels will be activated by the imperative for motor activation. Similarly, because eyes and visualizing neurons are connected by motor area neurons such as eyeball movements, some motor area voxels may be activated by the visual imperative. Therefore, to deal with this possible confounding we propose to use Bayesian model selection to isolate visual area voxels that are activated purely by the visual imperative and motor area voxels that are activated purely by the motor imperative.

4.1 Model Selection Strategy for Timing Brain Ac- tivation

In Chapter 3, we find some activation patterns on the brain by previous experiments. Now, we want to explicitly time the difference between activation of motor and visual areas. In this chapter, based on the experiment of timing differences of activations between visual and motor areas, we are going to perform region of interest (ROI) analysis (Etzel et al., 2009) on this real data set and thus we will focus only on the visual and motor areas identified based on physiological knowledge by our collaborator Dr. Jeffrey Johnson from the Department of Psychological Sciences.

4.1.1 Model Selection on β and c2

Based on our new proposed triple gamma Hemodynamic Response Function (HRF), we investigate whether we can use the initial dip component to time the difference of brain activation between visual and motor areas. It may improve the time resolution in the clarification of brain functions. When the design matrix is generated from the

74 Hemodynamic Response Function (HRF), there are four columns corresponding to these four different types of stimuli.

In our Bayesian GLM framework in the previous chapters, we have βi (i=1,2,3,4) to indicate the neuronal responses for each type of stimulus. In order to better distinguish between non-activated voxels and activated voxels, we adopt the non-local prior (pMOM) for the neural response β. Furthermore, we want to know which type of stimulus activates the voxel and we combine this need into our model selection strategy. We have 24 possible models corresponding to different combinations of stimuli activating each voxel. Also, we need to consider if the underlying HRF for each voxel really has the initial dip. The HRF may or may not possess the initial dip component. Recall that the HRF has is written as:

h(t, d1, d2, d3, b1, b2, b3, c1, c2) t t − d t t − d t t − d (4.1) a1 1 a2 2 a3 3 = ( ) exp(− ) − c1( ) exp(− ) − c2( ) exp(− ), d1 b1 d2 b2 d3 b3

where t is the time in seconds, dj = ajbj, j = 1, 2, 3, d1 is the time to the peak, d2 is

the time to the lowest point of the undershoot, and d3 is the time to the lowest point

of the initial dip, b1 = b2 = b3 are reciprocals of dispersion of response, undershoot

and initial dip, c1 is the ratio of undershoot to response and c2 is the ratio of initial dip to response.

If the HRF does not have the initial dip, then c2 = 0 and the third component disappears. Moreover, if we combine the selection of the initial dip with the selection of β0s, the total number of possible models is 31. Each possible combination of β has two possible underlying HRF’s, with the initial dip and without the initial dip. That is the case except when all the components of the vector β0s are zero. In that case

75 there is no HRF and, therefore, there is only a simple model. Here, β1 stands for left visual and left motor, β2 stands for left visual and right motor, β3 stands for right visual and left motor and β4 stands for right visual and right motor. All the models we consider are listed as follows: Model 1: y = Zγ + ,

Model 2: y = Zγ + X1β1 + ,

∗ Model 2*: y = Zγ + X1 β1 + ,

Model 3: y = Zγ + X2β2 + ,

∗ Model 3*: y = Zγ + X2 β2 + ,

Model 4: y = Zγ + X3β3 + ,

∗ Model 4*: y = Zγ + X3 β3 + ,

Model 5: y = Zγ + X4β4 + ,

∗ Model 5*: y = Zγ + X4 β4 + ,

Model 6: y = Zγ + X1β1 + X2β2 + ,

∗ ∗ Model 6*: y = Zγ + X1 β1 + X2 β2 + ,

Model 7: y = Zγ + X1β1 + X3β3 + ,

∗ ∗ Model 7*: y = Zγ + X1 β1 + X3 β3 + ,

Model 8: y = Zγ + X1β1 + X4β4 + ,

∗ ∗ Model 8*: y = Zγ + X1 β1 + X4 β4 + ,

Model 9: y = Zγ + X2β2 + X3β3 + ,

∗ ∗ Model 9*: y = Zγ + X2 β2 + X3 β3 + ,

Model 10: y = Zγ + X2β2 + X4β4 + ,

∗ ∗ Model 10*: y = Zγ + X2 β2 + X4 β4 + ,

Model 11: y = Zγ + X3β3 + X4β4 + ,

76 ∗ ∗ Model 11*: y = Zγ + X3 β3 + X4 β4 + ,

Model 12: y = Zγ + X1β1 + X2β2 + X3β3 + ,

∗ ∗ ∗ Model 12*: y = Zγ + X1 β1 + X2 β2 + X3 β3 + ,

Model 13: y = Zγ + X1β1 + X2β2 + X4β4 + ,

∗ ∗ ∗ Model 13*: y = Zγ + X1 β1 + X2 β2 + X4 β4 + ,

Model 14: y = Zγ + X1β1 + X3β3 + X4β4 + ,

∗ ∗ ∗ Model 14*: y = Zγ + X1 β1 + X3 β3 + X4 β4 + ,

Model 15: y = Zγ + X2β2 + X3β3 + X4β4 + ,

∗ ∗ ∗ Model 15*: y = Zγ + X2 β2 + X3 β3 + X4 β4 + ,

Model 16: y = Zγ + X1β1 + X2β2 + X3β3 + X4β4 + ,

∗ ∗ ∗ ∗ Model 16*: y = Zγ + X1 β1 + X2 β2 + X3 β3 + X4 β4 + ,

∗ where X is the design matrix generated from HRF with c2 6= 0 and X is the design matrix generated from HRF with c2 = 0. We use the same priors for the parameters of the HRF as in chapter 3, which are as follows:

2 π(d1) ∼ NT (7, 1.5 )I(d3 + 3, d2 − 2), (4.2)

2 π(d2) ∼ NLT (12, 1.5 )I(d2 > d1 + 2), (4.3)

2 π(d3) ∼ NRT (2, 1 )I(d3 < d1 − 3), (4.4)

2 π(b1) ∼ N(1, 0.1 ), (4.5)

2 π(b2) ∼ N(0.9, 0.1 ), (4.6)

2 π(b3) ∼ N(0.9, 0.1 ), (4.7)

77 π(c1) ∼ exp(5), (4.8)

π(c2) ∼ exp(5), (4.9)

where NT stands for truncated normal distribution, NLT stands for left truncated normal distribution and NRT stands for right truncated normal distribution. We use the MCMC procedure as in Chapter 3 to estimate all the HRF parameters and different numbers of βs under each model, except that in this chapter the MCMC doesn’t jump between two models and the algorithm speeds up for each different model. Thus we have to devise different measures for model comparisons. Here, we calculate both the deviance information criterion (DIC) and the Laplace-Metropolis approximation for the marginal likelihood for each model as measures for model selection. By using those measures, we may choose the best model for each voxel. In order to improve the time efficiency, we did not run these procedures on all the voxels. Instead, we use a region of interest (ROI) analysis. For this ROI analysis, our collaborator Dr. Jeffrey Johnson choose 490 voxels in each area of interest, the visual and motor areas. We find that the Laplace-Metropolis approximation performs better than DIC. We will investigate more details about these two different model selection criteria. Based on models we fit, the voxel can be activated by one stimulus or multiple stimuli depending on which β0s are different from zero in the selected best model. We are more interested in those voxels which have been activated only by one isolated type of stimulus. In our model selection strategy, there are four models can be separated for isolated for each side and each area stimulus. Table 4.1 explains how the different models correspond to different combinations of stimuli.

78 Stimulus Type Model β1(L,L) β2(L,R) β3(R,L) β4(R,R) 1 0 0 0 0 2 1 0 0 0 3 0 1 0 0 4 0 0 1 0 5 0 0 0 1 6(VisualLeft) 1 1 0 0 7(MotorLeft) 1 0 1 0 8 1 0 0 1 9 0 1 1 0 10(MotorRight) 0 1 0 1 11(VisualRight) 0 0 1 1 12 1 1 1 0 13 1 1 0 1 14 1 0 1 1 15 0 1 1 1 16 1 1 1 1

Table 4.1: Table shows that how each model corresponds to different combinations of stimuli type and β corresponding to each model. Here, (L,L) stands for (Left visual, Left motor), (L,R) stands for (Left visual, Right motor), (R,L) stands for (Right visual, left Motor) and (R,R) stands for (Right visual, Right motor).

After we identify voxels which have been activated by a single type of stimulus, they are better for comparing the activation time differences between visual and motor areas. Here, we mainly focus on voxels selected by model 6, 7, 10, and 11 according to different types of stimuli. Model 6 corresponds to activation of the left visual areas. Model 7 corresponds to activation of left motor areas. Model 10 corresponds to activation of right motor areas. Model 11 corresponds to activation of right visual areas. We have noticed that activation time differences between visual and motor activation areas are more obvious and clearer for those voxels that have been activated by a single type of stimulus. Model selection procedures to identify such voxels are discussed in the next section.

79 4.1.2 Model Selection Criteria

Deviance Information Criterion (DIC)

The first model selection criteria we use is the Deviance Information Criteria (DIC) (Spiegelhalter et al., 2002). Model fit can be summarized numerically with the de- viance, which is defined as −2 times the log-likelihood:

deviance : D(y, θ) = −2logf(y|θ). (4.10)

The deviance summarizes the discrepancy between data and model which only de- ˆ pends on y by using the point estimate of θ. It can be defined as Dθˆ(y) = D(y, θ(y)). From a Bayesian perspective, we can average this discrepancy over the posterior dis- tribution:

Davg(y) = E(D(y, θ)|y), (4.11)

which may be estimated using posterior simulations θl at each iteration l:

L 1 X Dˆ (y) = D(y, θl). (4.12) avg L l=1

Spiegelhalter et al.(2002) defines the deviance information criterion (DIC) as

ˆ DIC = 2Davg(y) − Dθˆ(y). (4.13)

For our case, we compute D(y, θl) by using the simulated values of parameters from the MCMC output. More specifically, we compute the design matrix Xl by using

simulated HRF parameters at the lth iteration. In addition to Dθˆ(y), we compute

80 the posterior means of all the parameters and use them to compute the design matrix X. Specifically, in our framework the DIC for model i is

DICmodeli = L 2 X n 1  2 log(2π) + n log σl + (y − Z γl − X(pl)βl)T(y − Z γl − X(pl)βl) − L − a + 1 2 i i i i i i i i i 2(σ2)l l=a i   n ˆ T ˆ 1 2 log(2π) + n logσ ˆi + (y − Ziγˆi − X(ˆpi)βi) (y − Ziγˆi − X(ˆpi)βi) ˆ2 2 2σ i (4.14)

l where a is the number of iterations after burn-in, X(pi) is the design matrix generated

l from HRF parameters pi for model i at iteration l and X(pˆi) is the design matrix generated from posterior mean of HRF parameters pˆi for model i.

Laplace-Metropolis Marginal Likelihood

Another model selection criterion we use is the marginal likelihood of each model. To take advantage of the output from the MCMC algorithm, here we approximate the marginal likelihood with a Laplace-Metropolis approximation (Lewis and Raftery, 1997). Hence, we call this criterion the Laplace-Metropolis marginal likelihood. This assumes that the prior probabilities for all the models are the same, so we just need to compare models based on their marginal likelihoods acting as the Bayes factor. We can compute the posterior probability of each model for a specific voxel based on its Laplace-Metropolis approximation of the marginal likelihood. Specifically, the Laplace-Metropolis approximation of the marginal likelihood is

ˆ ˆ d ˆ 1 f(y|modelj) = f(y|θj, modelj)p(θj|modelj)(2π) 2 |Vj| 2 , (4.15)

81 (1) (2) (G) where, if θj , θj , ... θj is a sample from the posterior of θj, then

ˆ (g) (g) θj = argmaxg=b,...,G[f(y|θj , modelj)p(θj |modelj)], (4.16)

G 1 X (g) (g) Vˆ = (θ − θ¯ )(θ − θ¯ )T , (4.17) j G − b + 1 j j j j g=b

¯ 1 PG (g) where b is the iteration number after burn-in and θj = G−b+1 g=b θj . (g) (g) For our case, we compute log(f(y|θj , modelj)) + log(p(θj |modelj)) at each g ˆ ˆ iteration and find their maximum as log(f(y|θj, modelj)) + log(p(θj|modelj)) since ˆ (g) (g) θj = argmaxg=b,...,G[f(y|θj , modelj)p(θj |modelj)]. So all the required expressions are as follows: When c2 6= 0,

n n + 2 (y − Zγ − X(p)β)T (y − Zγ − X(p)β) log fˆ(y|M ) = − log(2π) − log(σ2) − j 2 2 2σ2 1 1 d − 7 d − 2 − 7 d + 3 − 7 − log(2π) − log(1.5) − ( 1 )2 − log(Φ( 2 ) − Φ( 3 )) 2 2 1.5 1.5 1.5 1 1 d − 12 d + 2 − 12 − log(2π) − log(1.5) − ( 2 )2 − log(1 − Φ( 1 )) 2 2 1.5 1.5 1 1 1 1 b − 1 − log(2π) − ((d − 2)2 − log(Φ(d − 3 − 2)) − log(2π) − log(0.1) − ( 1 )2 2 2 3 1 2 2 0.1 1 1 b − 0.9 1 1 b − 0.9 − log(2π) − log(0.1) − ( 2 )2 − log(2π) − log(0.1) − ( 3 )2 2 2 0.1 2 2 0.1 k 3k 3k 1 + log 5 − 5c + log5 − 5c + log(d ) − log(2π) − log(τ) − log(σ2) + log(|diag(XT X )|) 1 2 k 2 2 2 2 0 0 T T k β diag(X X0)β X d 1 − 0 + 2 log(β ) + log(2π) + log(|Vˆ |) 2τσ2 i 2 2 j i=1 (4.18)

82 When c2 = 0,

n n + 2 (y − Zγ − X(p)β)T (y − Zγ − X(p)β) log fˆ(y|M ) = − log(2π) − log(σ2) − j 2 2 2σ2 1 1 d − 7 d − 2 − 7 d + 3 − 7 − log(2π) − log(1.5) − ( 1 )2 − log(Φ( 2 ) − Φ( 3 )) 2 2 1.5 1.5 1.5 1 1 d − 12 d + 2 − 12 − log(2π) − log(1.5) − ( 2 )2 − log(1 − Φ( 1 )) 2 2 1.5 1.5 1 1 b − 1 1 1 b − 0.9 − log(2π) − log(0.1) − ( 1 )2 − log(2π) − log(0.1) − ( 2 )2 2 2 0.1 2 2 0.1 k 3k 3k 1 + log 5 − 5c + log(d ) − log(2π) − log(τ) − log(σ2) + log(|diag(XT X )|) 1 k 2 2 2 2 0 0 T T k β diag(X X0)β X d 1 − 0 + 2 log(β ) + log(2π) + log(|Vˆ |) 2τσ2 i 2 2 j i=1 (4.19)

ˆ ˆ ˆ ˆ 1 PG (g) where θj = argmaxg=1,...,G(log f(y|θj,Mj) + log f(θj|Mj)), and Vj = G g=1(θj − ¯ (g) ¯ T ¯ 1 PG (g) T θj)(θj − θj) , and θj = G g=1 θj . Note, since diag(X0 X0) is a diagonal matrix,

"Z K #−1 1 1 Y −K/2 T 2 T T 2 dk = (2π) |diag(X0 X0)| exp(− ξ diag(X0 X0)ξ)( ξi )dξ K 2 R+ i=1 k −k Y 2 −1 = 2 ( E T −1 [ξ ]) (4.20) N+(0,(diag(X0 X0))jj ) j=1 k −k Y T = 2 [diag(X0 X0)jj)]. j=1

4.1.3 Comparing DIC and Marginal Likelihood

Now, we compare model selection results of DIC and marginal likelihood approxima- tion of the Bayes Factor.

Tables 4.2, 4.3 and 4.4 present averages of posterior estimates of d1, d3 and c2

83 of selected voxels by using different criteria for subject 7 to subject 11. We focus on four models that represent four different stimulation of different areas. LeftV stands for the left visual model, which only contains left visual stimuli but both left and right motor stimuli. RightV stands for the right visual model, which only contains right visual stimuli but both left and right motor stimuli. LeftM stands for the left motor model, which only contains left motor stimuli but both left and right visual stimuli. RightM stands for the right motor model, which only contains right motor stimuli but both left and right visual stimuli. These combinations are picked because they differentiate the stimulus between visual or motor and all of them possess initial dips. In conclusion, from Figures 4.21 to 4.34, we can see that the Laplace-Metropolis approximations of the Bayes Factor for model selection performs better than DIC. The DIC tends to select the models with more parameters but not the best model. But Laplace-Metropolis approximation of Bayes factor chooses the best model over the one with more parameters. Moreover, it works better with the non-local prior. Selections of DIC include more weird behavior HRF shape that can not be the true models. However, only based on those results, it is still not enough to prove the time difference between visual areas and motor areas. Tables 4.2, 4.3 and 4.4 show the time difference by using Bayes Factor as the model selection criteria. However, brain function is too complicated and voxels are not clearly separated from different function areas. In the next section, we use different model frameworks on pooling these selected voxels by Bayes factor to improve the power to estimate time differences.

84 Average Posterior Estimates Subjects 7-8 Subject/Region Criterion Voxel Number d1 d3 c2 Subject 7 A LeftV DIC 18 8.8218 3.0627 0.6290 LeftV BF 29 7.8276 2.2292 0.1627 RightV DIC 29 8.4549 2.6953 0.5123 RightV BF 40 7.9336 2.1534 0.1957 LeftM DIC 11 8.2711 2.3165 0.1934 LeftM BF 10 7.8781 2.3852 0.1641 RightM DIC 8 8.5586 2.1931 0.1806 RightM BF 40 8.2579 2.3010 0.2369 Subject 7 B LeftV DIC 24 8.4878 2.7157 0.4762 LeftV BF 31 8.0055 2.1766 0.2488 RightV DIC 10 8.0528 2.2870 0.2194 RightV BF 20 8.0515 2.1893 0.2015 LeftM DIC 17 8.8809 2.2401 0.2079 LeftM BF 23 8.6401 2.2166 0.2230 RightM DIC 13 7.8246 2.4187 0.1868 RightM BF 17 7.5777 2.3476 0.1641 Subject 8 A LeftV DIC 11 9.3485 2.1719 0.2783 LeftV BF 27 9.6911 2.0845 0.2788 RightV DIC 22 9.4420 2.6247 0.5064 RightV BF 30 9.7629 2.0877 0.2841 LeftM DIC 17 8.3669 2.3437 0.2003 LeftM BF 15 8.6169 2.2712 0.2092 RightM DIC 23 8.4077 2.2435 0.2094 RightM BF 6 8.8088 2.2402 0.1908 Subject 8 B LeftV DIC 18 9.1951 2.3290 0.3618 LeftV BF 30 9.6722 2.1977 0.3766 RightV DIC 12 8.9294 2.1236 0.2142 RightV BF 28 8.6649 2.1016 0.2155 LeftM DIC 14 8.9652 2.2944 0.2022 LeftM BF 13 8.9100 2.3245 0.2203 RightM DIC 20 8.4116 2.2561 0.2330 RightM BF 11 8.2076 2.2815 0.2061

Table 4.2: Table of subjects 7-8 shows averages of posterior estimates of HRF param- eters of selected isolated models for different criteria.

85 Average Posterior Estimates Subjects 9-10 Subject/Region Criterion Voxel Number d1 d3 c2 Subject 9 A LeftV DIC 24 8.1455 2.5048 0.2989 LeftV BF 27 7.6416 2.3020 0.1856 RightV DIC 15 8.0211 2.2164 0.1899 RightV BF 12 7.5269 2.2624 0.1568 LeftM DIC 10 8.2758 2.3254 0.2280 LeftM BF 19 8.5484 2.3433 0.2453 RightM DIC 19 8.5950 2.3530 0.2234 RightM BF 12 8.3413 2.3133 0.2202 Subject 9 B LeftV DIC 22 9.6067 2.1542 0.2646 LeftV BF 10 8.5059 2.1093 0.2521 RightV DIC 7 8.5364 2.2129 0.2808 RightV BF 7 8.2802 2.2098 0.2180 LeftM DIC 12 8.5538 2.3392 0.2335 LeftM BF 14 8.0240 2.3373 0.2300 RightM DIC 12 8.3107 2.1963 0.2016 RightM BF 18 8.2381 2.2819 0.2062 Subject 10 A LeftV DIC 15 8.8301 2.3607 0.3875 LeftV BF 26 8.6693 2.1934 0.3290 RightV DIC 13 8.3239 2.2203 0.2135 RightV BF 11 8.1314 2.1255 0.2311 LeftM DIC 5 8.3508 2.2477 0.1814 LeftM BF 2 7.3938 2.3738 0.2375 RightM DIC 8 8.3125 2.2460 0.1782 RightM BF 16 9.0381 2.2203 0.2233 Subject 10 B LeftV DIC 11 8.0112 2.2827 0.1958 LeftV BF 20 8.2564 2.2478 0.1761 RightV DIC 21 8.9716 2.5121 0.4194 RightV BF 13 8.4546 2.2154 0.3474 LeftM DIC 10 8.5392 2.3950 0.2968 LeftM BF 4 7.8996 2.2880 0.2015 RightM DIC 6 8.3459 2.2085 0.1570 RightM BF 6 7.5017 2.3348 0.1467

Table 4.3: Table of subjects 9-10 shows averages of posterior estimates of HRF pa- rameters of selected isolated models for different criteria.

86 Average Posterior Estimates Subject 11 Subject/Region Criterion Voxel Number d1 d3 c2 Subject 11 A LeftV DIC 17 8.5941 2.6479 0.4114 LeftV BF 15 7.4366 2.2096 0.1865 RightV DIC 15 7.6404 2.2667 0.2007 RightV BF 20 7.2769 2.1639 0.1808 LeftM DIC 7 7.5755 2.2175 0.1959 LeftM BF 21 7.3537 2.2806 0.1708 RightM DIC 12 7.8234 2.3094 0.1972 RightM BF 6 7.0355 2.2986 0.1556 Subject 11 B LeftV DIC 18 7.9470 2.2541 0.2236 LeftV BF 19 7.9426 2.2256 0.2123 RightV DIC 14 8.1658 2.3378 0.2370 RightV BF 11 7.7553 2.2349 0.1822 LeftM DIC 16 7.9575 2.2878 0.2104 LeftM BF 11 7.2415 2.2993 0.1774 RightM DIC 15 7.9071 2.3345 0.2066 RightM BF 34 7.4821 2.3070 0.1769

Table 4.4: Table of subject 11 shows averages of posterior estimates of HRF param- eters of selected isolated models for different criteria.

87 Figure 4.1: DIC Result for first 4 runs of Figure 4.2: Bayes Factor Result for first 4 subject 7 runs of subject 7

Figure 4.3: DIC Result for last 4 runs of Figure 4.4: Bayes Factor Result for last 4 subject 7 runs of subject 7

88 Figure 4.5: DIC Result for first 4 runs of Figure 4.6: Bayes Factor Result for first 4 subject 8 runs of subject 8

Figure 4.7: DIC Result for last 4 runs of Figure 4.8: Bayes Factor Result for last 4 subject 8 runs of subject 8

89 Figure 4.9: DIC Result for first 4 runs of Figure 4.10: Bayes Factor Result for first subject 9 4 runs of subject 9

Figure 4.11: DIC Result for last 4 runs of Figure 4.12: Bayes Factor Result for last 4 subject 9 runs of subject 9

90 Figure 4.13: DIC Result for first 4 runs of Figure 4.14: Bayes Factor Result for first subject 10 4 runs of subject 10

Figure 4.15: DIC Result for last 4 runs of Figure 4.16: Bayes Factor Result for last 4 subject 10 runs of subject 10

91 Figure 4.17: DIC Result for first 4 runs of Figure 4.18: Bayes Factor Result for first subject 11 4 runs of subject 11

4.2 Testing the Model Selection Results

Because of the findings of 4.1.3, we use the Laplace-Metropolis marginal likelihood to perform the model selection. We measure d3 of the initial dip to indicate the activation of the voxel in different areas. And d3 is the time peak of the initial oxygen consumption which may occur in those highly active areas. So, it is reasonable to measure and compare activated areas by using the d3. We include boxplots of each subject of each run from figures 4.21 to 4.25 which provide some obvious evidence to support our conjecture that in our experiment visual areas have activated faster than motor areas. The average of d3 for the motor area is larger than the average of d3 for the visual area. But we are not sure about the accuracy of the time difference between these two areas and how significantly they are different. Therefore, we decide to use two ways

92 Figure 4.19: DIC Result for last 4 runs of Figure 4.20: Bayes Factor Result for last 4 subject 11 runs of subject 11 to explore selected voxels for each area. First, we fit a linear model on the estimated d3 with fixed effects on subjects, runs and areas, and random effects on subjects and areas respectively to check if our conjecture is true and the difference of d3 is due to areas or other factors. Second, we combine selected voxels for one isolated stimulus type and assume they would have a common HRF and neural response levels β0s. We expect that combining voxels in a single analysis will increase the precision of our procedure to estimate the parameters of the model.

4.2.1 Linear Model Fitting Exploring

First, we fit a linear model on the estimated d3 with fixed effects for subjects, areas and runs. Also, we fit a linear mixed model on the estimated d3 with random effects for subjects and with fixed effects for indicators of type of area (visual =1, motor =0)

93 and for group of runs (first four runs=1, last four runs =0). Here is the linear model expression:

d3 = b0 + subject × b1 + area × b2 + runs × b3 + , (4.21)

2 where  ∼ N(0, σ1I). Table 4.5 shows the results of this linear model.

Results Coefficients Estimate Std. Error t Value P r(> |t|) Intercept 2.310e+00 2.193e-02 105.332 < 2e-16 *** Subject8 -5.124e-02 2.621e-02 -1.955 0.0591 Subject9 2.004e-02 2.621e-02 0.764 0.4501 Subject10 -1.375e-06 2.621e-02 0.000 1.0000 Subject11 -4.116e-02 2.621e-02 -1.570 0.1259 run2 1.864e-03 1.658e-02 0.112 0.9112 Visual -1.221e-01 1.658e-02 -7.367 1.85e-08 ***

Table 4.5: Results of fitting a linear model on d3 with fixed effects for subjects, runs and areas.

Here’s the linear mixed model expression:

d3 = b0 + subject × b1 + area × b2 + runs × b3 + , (4.22)

2 2 where  ∼ N(0, σ2I) and b1 ∼ N(0, σ3D). The results are shown in Table 4.6.

Results Random Effects: Variance Std. Dev Subject (Intercept) 0.0005749 0.02398 Residual 0.0027483 0.05242 Fixed Effects: Estimate Std. Error t Value (Intercept) 2.295537 0.017919 128.10 run2 0.001864 0.016578 0.11 Visual -0.122129 0.016578 -7.37

Table 4.6: Results of fitting a linear mixed model on d3 with random effects for subjects and with fixed effects for runs and areas.

94 We find that the only significant variable was type of area, with the visual area having a coefficient close to −0.122 (p − 1.85e − 08). That means that voxels that

are purely activated by the visual imperative have an estimated d3 that is on average 0.122 seconds less than voxels that are purely activated by the motor imperative. So the time difference between visual and motor area is quite significant. This result is consistent in both the linear and linear mixed models.

4.3 One Model Fitting Procedure of Selected Vox- els

In this section, we combine the data of all the selected voxels of in the same area of interest together to estimate the HRF in one model. We use the marginal likelihood criterion to select voxels in each area of interest for different types of stimuli. This procedure is still performed in the Bayesian General Linear Model (GLM) framework. To be consistent with previous results, we use the non-local prior (pMOM) for neu- ronal responses βs. All the selected voxels belong to the full model. We assume that these selected voxels have the same intercepts and movement correction coefficients γ but different neuronal responses β for different types of stimuli and different voxels.

Let y1, y2, . . . , ym be the BOLD response data for selected voxels in the same region of interest for separated blocks. Each y is a vector of dimension 984 × 1. Let Z be the same for each corresponding y; Z has dimension 984 × 28. Note that bfZ contains intercepts and movement correction coefficients. In addition, let

X1(p), X2(p),..., Xm(p) be design matrices generated from the same HRF with pa- rameters p. Each design matrix corresponds to an acquisition time vector of the

95 selected voxels. All the selected voxels should use the full model, so there are four columns in the design matrix X(p).

      y1 Z X1(p)              y2   Z   X2(p)         .  =  .  γ +  .  (β11, β12, β13, β14, ..., βm1, βm2, βm3, βm4) +   .   .   .              ym Z Xm(p)

2 where  ∼ N(0, σ I(984×m)×(984×m)). Based on the same priors and the same proce- dure as before, we can sample γ, β and HRF parameters p from their full conditional posterior distributions using MCMC. Following are analysis results for this one model fitting procedure for each subject and each session. We can see the posterior proba- bilities that visual areas stimulated earlier than motor areas are larger and boxplots also show significant stimulated time differences between visual areas and motor ar- eas. However, we can see large variability for different subjects, which are expected. It is almost impossible to control different participants through the experiment. That is also the most difficult problem when we analyze data.

96 Figure 4.21: Boxplots of selected voxels d3 Figure 4.22: Histograms of posterior dis- for both runs of subject 7 tributions of d3 for subject 7

Subject 7 Regions and Sessions Motor A Visual A Motor B Visual B Posterior Mean d3 3.342721 1.774196 4.202979 2.423475

Posterior Prob. P(d3motorA > d3visualA ) =0.9998 P(d3motorB > d3visualB ) =0.997 Table 4.7: Results of one model fitting for subject 7.

97 Figure 4.23: Estimated HRF of Subject 7 pooled voxels for motor (red) and visual (blue) areas of the first (A) and second (B) sessions

98 Figure 4.24: Boxplots of selected voxels d3 Figure 4.25: Histograms of posterior dis- for both runs of subject 8 tributions of d3 for subject 8

Subject 8 Regions and Sessions Motor A Visual A Motor B Visual B Posterior Mean d3 4.431883 1.591715 3.199039 2.372007

Posterior Prob. P(d3motorA > d3visualA ) =0.9972 P(d3motorB > d3visualB ) =0.8931 Table 4.8: Results of one model fitting for subject 8.

99 Figure 4.26: Estimated HRF of Subject 8 pooled voxels for motor (red) and visual (blue) areas of the first (A) and second (B) sessions

100 Figure 4.27: Boxplots of selected voxels d3 Figure 4.28: Histograms of posterior dis- for both runs of subject 9 tributions of d3 for subject 9

Subject 9 Regions and Sessions Motor A Visual A Motor B Visual B Posterior Mean d3 4.096393 2.287836 2.929611 1.859327

Posterior Prob. P(d3motorA > d3visualA ) =0.9548 P(d3motorB > d3visualB ) =0.8927 Table 4.9: Results of one model fitting for subject 9.

101 Figure 4.29: Estimated HRF of Subject 9 pooled voxels for motor (red) and visual (blue) areas of the first (A) and second (B) sessions

102 Figure 4.30: Boxplots of selected voxels d3 Figure 4.31: Histograms of posterior dis- for both runs of subject 10 tributions of d3 for subject 10

Subject 10 Regions and Sessions Motor A Visual A Motor B Visual B Posterior Mean d3 3.041475 2.676088 2.701728 2.394421

Posterior Prob. P(d3motorA > d3visualA ) =0.711 P(d3motorB > d3visualB ) =0.606 Table 4.10: Results of one model fitting for subject 10.

103 Figure 4.32: Estimated HRF of Subject 10 pooled voxels for motor (red) and visual (blue) areas of the first (A) and second (B) sessions

104 Figure 4.33: Boxplots of selected voxels d3 Figure 4.34: Histograms of posterior dis- for both runs of subject 11 tributions of d3 for subject 11

Subject 11 Regions and Sessions Motor A Visual A Motor B Visual B Posterior Mean d3 2.385714 0.9385019 2.439306 1.649001

Posterior Prob. P(d3motorA > d3visualA ) =0.8615 P(d3motorB > d3visualB ) =0.8603 Table 4.11: Results of one model fitting for subject 11.

105 Figure 4.35: Estimated HRF of Subject 11 pooled voxels for motor (red) and visual (blue) areas of the first (A) and second (B) sessions

106 Chapter 5

Bayesian Model Selection by Using Nonlocal Prior in Time Course RNA-seq Experiments

5.1 Introduction

In this section, we develop a new method based on product inverse moment (piMOM) priors for the analysis of RNA-seq Time Course experiments. To that end, we have developed a specialized MCMC algorithm. Finally, we use that while in the previous chapters, we have used product moment prior (pMOM), in this chapter we use the piMOM prior (Johnson and Rossell, 2012). Time course RNA-seq experiments are frequently used to study gene reactions to a biological process through time. For example, Cicatiello et al.(2004) studied the time course changes in gene expressions by cDNA analysis in response to estro- gen. Spellman et al.(1998) used DNA microarrays to learn the periodical transcript

107 variation with the cell cycle of yeast genes. The distinguished advantage of using time course RNA-seq experiments over traditional 2-treatment RNA-seq experiments is that researchers have an expression profile through time for each single gene and for different subjects that may be under different conditions. The main objective of such studies is to identify genes whose expression levels change over time. Thus, in time course RNA-seq studies, the differentially expressed genes can be defined as genes whose expression profiles are significantly distinct from flat lines with respect to time. We introduce a novel empirical Bayesian approach to detect differentially ex- pressed genes in RNA-seq Time Course experiments. The proposed Bayesian method identifies major variation in gene expression profile by Bayesian principal component regression. The expression data are normalized for each gene, and the high dimen- sionality of the time course data is first reduced by principal component analysis. The proposed model assumes a mixture distribution of expression parameters for differ- entially and non-differentially expressed genes, borrows strength by sharing the same variance across multiple subjects for each single gene though time, as well as shares information across subjects by assuming gene-wise probabilities of being differentially expressed from a common beta prior distribution. The remainder of this chapter is organized as follows. Section 5.2 describes the details of the empirical Bayesian principal components regression model. Section 5.3 introduces the empirical Bayes procedure to determine the priors for unknown pa- rameters in the model. Section 5.4 describes the posterior inference procedures based on Markov chain Monte Carlo (MCMC) techniques and the procedures for the iden- tification of differentially expressed genes. To illustrate our approach, a vaccination

108 dataset (Henn et al., 2013) is analyzed in Section 5.5.

5.2 Principal Component Regression Model

In a typical RNA-seq Time Course experiment, let Ysgt denote the normalized obser- vation at the time point t of gene g for subject s, g = 1, 2,...,G, t = 1, 2,...,T , s =

0 1, 2,...,S. The normalized observations (Ysg1,...,YsgT ) , through time (t = 1,...,T ) for gene g and subject s are modeled as

  Ysg1    .  PL  .  = µsg1T + elβsgl + sg   l=1   (5.1) YsgT

= µsg1T + eβsg + sg,

where µsg denotes the mean expression of gene g for subject s, matrix e = (e1,..., eL)

consists of the first L principal components, in which el represents the lth principal component. That is, columns of matrix e reflect the directions of major variations

from the mean gene expression µsg through the time. Moreover, each coefficient

βsgl represents the magnitude of deviation in the direction of the specific principal

component el (l = 1,...,L) from a flat line. Furthermore, the random error effect

0 sg = (sg1, . . . , sgL) is assumed to be independent across genes and distributed with

2 same variance for all subjects, that is, sg ∼ N(0, σg IT ) for gene g across all subjects,

where IT denotes the T by T identity matrix. In the representation of Equation (5.1),

0 coefficient vector βsg = (βsg1, . . . , βsgL) 6= 0 indicates a differential expression profile of gene g for subject s.

109 In order to model the differential expression of each gene under a Bayesian frame- work, the differential effects βsg are assumed to follow a mixture distribution. Specif- ically, with probability pg, coefficient βsg is different than 0 (differentially expressed), and with probability (1 − pg), coefficient βsg equals 0 (non-differentially expressed).

Let hsg denote a latent differential state for gene g and subject s that is equal to 1 in the case of differential expression βsg 6= 0 and is equal to 0 in the case of non- differential expression βsg = 0. We note that if there are at least one element of βsg nonzero, then hsg = 1 and otherwise hsg = 0. In addition, note that p(hsg = 1) = pg and p(hsg = 0) = 1 − pg. Here, we use one differentially expressed probability pg for each gene g to borrow strength across all subjects.

Given the state hsg, differential effect βsg is assumed to follow a mixture distribu- tion

 q 2 2 QK τσg exp{− τσg }β−2 if h = 1 (differentially-expressed),  i=1 π β2 sgi sg sgi π(βsg | hsg) ∼  δ{0} if hsg = 0 (non-differentially-expressed) (5.2) where we adopt a product inverse moment (piMOM) prior for the case of differential expression. We note that the piMOM prior possesses Cauchy-like tails and K stands for the number of coefficients included in the models which can be K = 1,...,L.

Also δ{0} indicates a point mass prior at βsg = 0 when the gene is non-differentially expressed. Figure 5.1 shows the density plot of a one-dimensional nonlocal piMOM prior and a point mass prior at 0. Obviously, hsg is the parameter of interest, which

0 indicates whether the expression profile Y sg = (Ysg1,...,YsgT ) is a flat line (hsg = 0) or not (hsg = 1) for gene g of subject s. Hence, hsg is the parameter that indicates a non-null model or null model in the Bayesian model selection framework. The

110 piMOM is good at differentiating the null and non-null models, which is the reason

why we choose the piMOM prior in model selection for βsg.

5.3 Prior Specification

For the specification of the prior hyper-parameters, we follow the recommendations

2 suggested by Cui et al.(2015). Specifically, for σg , g = 1,...,G, we assume a log student t distribution with hyper-parameters estimated by empirical Bayes. Finally, for pg, g = 1,...,G, we assume a Beta prior distribution with hyper-parameters estimated by empirical Bayes. We assume the suggested value in Johnson and Rossell (2010) for the tuning parameter τ in the piMOM prior.

5.4 Posterior Inference

We explore the posterior distribution of the model parameters using Markov chain Monte Carlo (MCMC) methods (Robert and Casella, 1999; Gamerman and Lopes, 2006). Since there does not exist closed forms for the posterior density of those pa- rameters, Gibbs sampling and Metropolis-Hastings algorithm are applied to simulate

2 the model parameters hsg, βsg, σg and pg. To that end, in the next section we present the full conditional distributions of model parameters. Finally, we also note that we combine model selection steps within the MCMC algorithm.

111 5.4.1 Full Conditionals and MCMC Algorithm

In this section, we present each step of how we simulate draws of each parameter based on their full conditional distributions.

• Step 1. Simulate latent differential state hsg, s = 1,...,S, g = 1,...,G. In this step, we perform the model selection procedure by using conditional marginal likelihoods in the similar fashion of Bayes Factor in the model selection.

In order to get the conditional marginals, given the latent differential state

hsg, we integrate over the distribution of the differential expression effect βsg in (5.2). With that, we can get different conditional marginal likelihoods for

two states hsg = 1 and hsg = 0.

Specifically,

( T ) 1 Y Y sg M (Y | h = 0) = ( )T exp − sg (5.3) 0 sg sg p 2 2 2πσg 2σg

In addition, under the non-null state, there are multiple possible models de- pending on which principal components are included:

MK (Y sg | hsg = 1) = ( Pd T Pd ) Z 1 (Y sg − ejβsgj) (Y sg − ejβsgj) ( )T exp − j=1 j=1 p 2 2 (5.4) Rd 2πσg 2σg d 2  2  Y τσg 1 τσg 1 ( ) 2 exp − dβ π β2 β2 d j=1 sgj sgj

where K stands for the number of principal components included in the model. There are different combinations of principal components. All those integrals

112 cannot be analytically calculated. So we use the Newton Raphson method to get maximum value of the core function and the corresponding Hessian matrix. Then we get Laplace approximation of integrals for all the possible combina-

tions. Together with the probability of the non-null model, p(hsg = 1) = pg, we can calculate the posterior probability of selecting the null model or non-null model by using model averaging,

p(hsg = 0 | Y sg) = (5.5) M0(Y sg | hsg = 0) · (1 − pg) P L M0(Y sg | hsg = 0) · (1 − pg) + MK (Y sg | hsg = 1) · pg/(2 − 1)

X p(hsg = 1 | Y sg) = p(hsg = 1, modelk | Y sg) (5.6) P M (Y | h = 1) · p /(2L − 1) = K sg sg g P L M0 · (1 − pg) + MK (Y sg | hsg = 1) · pg/(2 − 1)

• Step 2. Simulate differential expression effects βsg, s = 1,...,S and g = 1,...,G. From the previous step, we have the probability for each model, so we random sample a number from 0, 1,..., 2L − 1 by using the probability vector

of p(hsg = 0 | Y sg), p(hsg = 1, model1 | Y sg), . . . , p(hsg = 1, model2L−1 | Y sg).

– If the number is 0 which is hsg = 0 (non-differential expression), then

βsg = 0

L – Otherwise, if hsg = 1 (differential expression), there are 2 − 1 possible

differential expressed models. If we want to simulate βsg, we have to decide which model to simulate from:

∗ If the number is l and assume that there l different principal com-

113 ponents included, which could be el = (e1, . . . , el) or in other com-

binations. We simulate the βsg from model l, the full conditional distribution is:

2 p(βsg | Y sg, σg ) ∝  T  ( 2 ) 2 1 T (Y sg − elβsg) (Y sg − elβsg) X τσg Y τσg 1 1 ( ) exp − exp − ( ) 2 p2πσ2 2σ2 β2 π β2 g g j sgj j sgj ( 2 ) X τσg 1 ∝ N(β : β˜ , σ˜2) exp − sg l l β2 Q β2 j sgj j sgj (5.7)

˜ T −1 T 2 T −1 2 where βl = (el el) el Y sg andσ ˜ = (el el) σg .

All these posteriors are not available in closed forms. So we use Metropolis-

Hastings random walk to simulate posteriors of βsg. The maximum values and Fisher information matrices found in the previous step can be used as the mean and covariance matrix in the normal distribution to generate the M-H proposal.

2 • Step 3. Simulate gene specific variances across all the subjects σg , g = 1,...,G.

2 2 – The prior for ξg is ξg = log (σg ) ∼ tν(a, b ),

– which is

ν+1 2!− 2 Γ( ν+1 ) 1 log (σ2) − a 1 p(σ2) = √2 1 + g g ν 2 ν b σ2 Γ( 2 ) νπb g

114 – together with the likelihood function

 P P 2  S·T (ysgt − µsgt) 2 2 − 2 s t p(Y g | σg ) ∝ (σg ) exp − 2 2σg

2 – The full conditional distribution of σg is

2 p(σg | Y g) ∝ − ν+1  P P 2   2 2! 2 S·T (ysgt − µsgt) 1 log (σg ) − a 2 − 2 +1 s t (σg ) exp − 2 1 + 2σg ν b

Repeat all three steps for desired number of iterations.

5.5 Application to Vaccination Data

Henn et al.(2013) studies the human influenza vaccine temporal response patterns by high-frequency sampling technique. The data contain RNA-seq data from blood B cell samples for five subjects, with three subjects having received all available vaccines in previous three consecutive years, and the other two subjects were not vaccined in the previous three years. The samples are first drawn in the first week prior to vaccination (day 0) and then taken at each of 10 days after vaccination (days 1 through 10).

5.5.1 Preprocessing and Empirical Priors

We first process and normalize the Henn et al.(2013) dataset. The data contain 20, 263 genes at T = 11 time points from S = 5 subjects, and 2, 660 genes are filtered due to the observed number of 0 counts at all time points across all subjects. For

115 those G = 17, 603 genes left, let Xsgt denote read count of gene g (g = 1,..., 17603) on subject s (s = 1,..., 5) at time t (t = 1,..., 11). We compute the normalizing constant for each library (sample of all genes at one single time point for one subject) to scale the data in a comparable level among all libraries, which may differ in exper- imental conditions like sequencing depth. Here we use the Trimmed Mean Method (TMM) of Robinson and Oshlack(2010), which is used to compensate for different library sizes. We calculate the “effective library size” lst for subject s ∈ 1, 2,...,S, time t ∈ 1, 2,...,T , by the product of library size and TMM factor. A variance– stabilizing transformation is further applied to stabilize the variance in count data. The above procedures are summarized in equation (5.8) to calculate the count per million (cpm) processed observations,

r Xsgt Zsgt = · 1, 000, 000. (5.8) lst

The normalized observation Ysgt is further obtained by centering the cpm processed observations on subject s of gene g in equation (5.9),

¯ Ysgt = Zsgt − Zsg (5.9)

We assign priors to unknown parameters in model (5.1) using empirical Bayes methods discussed in Section 5.3. Specifically, we compute eigenvalues and eigen- vectors, and choose the first L = 2 eigenvectors that explain 70% of the variation observed in the time course profiles across all subjects and all genes. Figure 5.2 shows that the second principal component seems to be like the deriva- tive of the first principal component. So, it is reasonable to use both first and second

116 principal components in the model to describe the temporal trend of mean gene ex- pression. As explained in Section 5.3, we follow the recommendations suggested in Cui

2 2 et al.(2015) to specify the priors for σg and pg, g = 1, ..., G. Specifically, for log σg , we assign a t1.56(−2.88, 0.18) prior. Finally, for pg we assume a Beta(0.2018, 0.2412) prior. The tuning parameter τ is selected to be 0.3 to keep the sharpness of the piMOM around 0. Since we fit the model (5.1) using Markov chain Monte Carlo techniques, the

2 full conditional distributions of h, βsg, pg and σg are obtained and updated by the Metropolis-Hastings algorithm.

• For differentially expressed genes detection, we calculate the posterior probabil-

ity of p(hsg 6= 0 | data) for s = 1,...,S and g = 1,...,G. We control the false discovery rate by Newton et al.(2004); Muller et al.(2007). The histogram of

FDR[ sg shown in Figure 5.4.

• For each subject, we plot the scatter plot of βsg, which has two dimensions of all differentially expressed genes. The first three subjects are with years of vaccines and last two subjects aren’t.

5.5.2 Differentially expressed genes identification

We fit model (5.1) using Markov chain Monte Carlo techniques, and then summarize the posterior samples to calculate the posterior probability of p(hsg 6= 1 | data) for s = 1,...,S and g = 1,...,G. False discovery rate is controlled by the Bayesian method introduced in Newton et al.(2004); Muller et al.(2007). We set the nominal

117 FDR at level 0.05, and thus identify genes that are differentially expressed for each subject. Figure 5.7 displays the Venn diagram of DE genes for all five subjects and shows that there are a total of 120 common genes that are differentially expressed for all five subjects and 356 for four out of five subjects.

We select the top 10 genes with the largestp ˆg (probability of being differentially expressed across subjects), and these 10 genes are actually differentially expressed for all five subjects. We plot their estimated temporal trends in Figure 5.8, where the estimated temporal values are calculated from equation (5.10).

  ˆ Ysg1    .  PL ˆ  .  = µsg1T + elβsgl, (5.10)   l=1   ˆ YsgT

ˆ where βsgl is taken to be the posterior mean of βsgl. You may see that the temporal peak usually appears at 6th or 7th days, and the three vaccinated subjects seem to have more significant reactions than two non-vaccinated those subjects. We search different resources to find biological support any of those top 10 genes are related to vaccination or the immune system. We list some information from gene card (www..org) and some literature support as follows:

• CCR2: chemokine (C-C motif) receptor 2 encodes related to inflam- matory disease or responses. It is potentially involved in both pathogenesis of immunologic diseases and cardiovascular diseases (Zhao, 2010). And recently it has been found related to amplification of vaccine immunity (Mitchell et al., 2013).

118 • XBP1: X-box binding protein 1 encodes a transcription factor related to viral protein binding in the T cell. It has a decisive role in antitumor immunity(Zhang et al., 2015).

• CAV1: Caveolin 1, Caveolae Protein, 22kDa is an integral membrane protein, expressing highly in lymphoid/immune systems. It is used to be different kinds of vaccines (Phillips et al., 1989).

• CD38: it is a novel multifunctional ectoenzyme widely expressed in cells and tissues especially in leukocytes. The loss of CD38 function is associated with impaired immune responses (Malavasi et al., 2008).

• HSP90B1: Heat Shock Protein 90kDa Beta (Grp94), Member 1. It plays critical roles in folding proteins related to immune system.

• IGF1: Insulin-Like Growth Factor 1 (Somatomedin C). It is a hormone similar in molecular structure to insulin. It plays an important role in childhood growth and continues to have anabolic effects in adults.

• RPN2: RPN2 (Ribophorin II) encodes a type I integral membrane protein found only in the rough endoplasmic reticulum.

• SGK1: Serum/Glucocorticoid Regulated Kinase 1 encodes a serine/threonine protein kinase that plays an important role in cellular stress response.

• SELK: Selenoprotein K required for Ca2+ flux in immune cells and plays a role in T-cell proliferation and in T-cell and neutrophil migration.

• TBC1D1: TBC1 (Tre-2/USP6, BUB2, Cdc16) Domain Family, Member 1 may play a role in the cell cycle and differentiation of various tissues.

119 According to vaccine information, we find some of these genes are involved.

• Fluarix for flu: from (Nakaya et al., 2011), CAV1 was positively correlated with the HAI antibody response at Day 7 after vaccination with Fluarix; CD38 is associated with ASC (adipose stromal) cell differentiation, which was positively correlated with the HAI antibody response at Day 7 after vaccination with Fluarix; HSP90B is associated with the unfolded protein response, which was postively correlated with the HAI antibody response at Day 7 after vaccination with Fluarix; RPN2 was positively correlated with the HAI antibody response at Day 7 after vaccination with Fluarix.

• APSV Wetvax: from (Scherer, 2007), Wetvax induced down-regulation of CCR2 in PBMC’s extracted from human subjects: 0 fold change 2-4 days after vac- cination, 0 fold change 5-7 days after vaccination, and -0.75 fold change 50-60 days after vaccination.

• YF-Vax for yellow fever:from (Gaucher et al., 2008), YF-Vax induced up-regulation (and down-regulation) of XBP1 in PBMC’s extracted from human subjects: - 1.00 fold change at day 3, 1.09 fold change at day 7, and 1.08 fold change at day 10 after vaccination; YF-Vax induced up-regulation of CD38 in PBMC’s extracted from human subjects: 1.21 fold change at day 3, 1.83 fold change at day 7, and 1.67 fold change at day 10 after vaccination; YF-Vax induced up-regulation (and down-regulation) of HSP90B1 in PBMC’s extracted from human subjects: -1.07 fold change at day 3, 1.05 fold change at day 7, and 1.13 fold change at day 10 after vaccination; YF-Vax induced down-regulation of SGK1 in PBMC’s extracted from human subjects: -1.13 fold change at day

120 3, -1.43 fold change at day 7, and -1.11 fold change at day 10 after vaccination.

Except for IGF1 and TBC1D1 that are directly related to cell functions, all the other eight genes are more or less related to immune response.

Figure 5.9 shows the linear relationship between average of β1 and β2 which are posterior estimations of coefficients of first and second principal components for DE genes across all five subjects. Figures 5.10 and 5.11 show predicted temporal pattern of DE genes for all vacci- nated subjects but not for non-vaccinated subjects. There are some genes related to immune responses or network of cell functions. More details about those genes are shown in Tables 5.1, 5.2 and 5.3. In contrast, Figure 5.12 shows predicted temporal pattern of DE genes for all non-vaccinated subjects but not for vaccinated subjects. Also, details about those genes are in the Table 5.4.

121 Figure 5.1: Density plot of a mixture of a one-dimensional nonlocal piMOM and a point mass at 0.

122 Figure 5.2: Eigen Values of all the time points for the first and the second principal component. Black line is for the first principal component and the red line is for the second principal component.

123 beta hat

−20 0 20 40 60 80 100 −20 0 10 20 30 40 50 100 100 80 80 60 60

40 PC_1 40 20 20 0 0 −20 −20 50 40 30 20

PC_2 10 0 −20

−20 0 10 20 30 40 50

Figure 5.3: Empirical plot of βg.

124 Figure 5.4: Histogram of FDR[ sg.

125 Figure 5.5: Empirical density plot of posterior FDR.

126 Subject 1 with vaccine Subject 2 with vaccine Subject 3 with vaccine 5 20 50 ● ● ● ● ●●●● ●●● ●●●●● ● ●●●●●●● ●●●●●●● ● ●●●●●●●●● 40 ● ●●●●●● 15 ●●●●●●●●●●●●●●●●●●● ● 0 ●●●●●●●●●●●●●●●●● ● ●●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●●● ●●●●●●●●●●● ●●●●●●●●● ● ●●●●●●● ● ●●● ● ●●●● ● 30 ●●● ● 10 ● ● −5

β β ● β ● ● ● ●

20 ● ● ● ● 5 2nd 2nd ●● ● 2nd ● ● ● ● ● ●● ● −10 ● ● ● ● ● ● ● ●●● ●● ● ● ●● ●● ●●● ● ● ●● ●●●●●●●●●●●●● ●● ●●●●●●●●●●●●●● 10 ●● ● ●●● ●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●●● 0 ● ●●●●●●●● ● ●●●●●● ●●●● ● ●●●●●●●●● ● ●●●●●●●●●●●●●●● ●●●●●●●●●●● ●● ●●●●●●●●●●●● ●●●●●●●●●●● −15 ●●●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●●●●●●●● 0 ●●●●●●● ● ● ●●●●●●●●●● ●●●●●●●●● ●●●●●●● ● −5 ●●● ● ● ● ● ● −20 0 5 10 15 20 −1 0 1 2 3 4 5 −20 20 60 100

1st β 1st β 1st β

Subject 4 no vaccine Subject 5 no vaccine

● ● ● 30 8 ● 25

● 6 20 β β

15 ● 4 ● 2nd 2nd ● ●● ● 10 ●

●●● 2 ● ●●●● ● ●● ●●●● ● ●●●●● ● ● ● ● ● ● ● ●●● ●● ● 5 ● ●● ●● ● ● ●●●●●●● ●● ● ● ●●● ●●●●● ● ●●●●● ●●●●●● ●● ●●●●●●●●●●● ●●●●●●●●● ●●●●●●●●●●● ● ●●●●●●●●●●●● ●●●●●●●●●●●●●●● ●●●●●●●●●●●●● ●●●●●●●●●●●●●●●●●●●●●●● ● ●●●●●●●●●● 0 ● ● ● ● ●●●● ●●●●●●●●●●●●●● ● ●●● ●●●●●● ●●●●●●●●●●●●●●●●●●●●●●●●●● ● ●●● ● 0 ●●●●●●●● ●● ●●●●●●●●●●●●● ●●● ●●●●●●●●●●● ●●●● ● ●●● ●● ● ● ● ●● ● ● ● ●●

−10 0 5 10 15 20 −1 0 1 2 3

1st β 1st β

Figure 5.6: Scatter plots of DE genes β1 vs β2 for each subject

127 Figure 5.7: Venn diagram of DE genes for five subjects, where three vaccinated subjects are represented by purple, blue and green colors, and two non-vaccinated subjects are represented by red and orange colors.

128 CCR2 SELK CD38 TBC1D1 SGK1 22 25 9 7 16 20 8 20 14 6 12 18 7 15 5 10 8 6 16 10 4 6 5 14 5

2 6 10 2 6 10 2 6 10 2 6 10 2 6 10

time time time time time

CAV1 IGF1 HSP90B1 RPN2 XBP1 7 12 45 90 35 40 6 10 80 35 5 30 8 70 30 4 60 25 6 25 3 50 20 20 4 40 2 15 15 2 30 1

2 6 10 2 6 10 2 6 10 2 6 10 2 6 10

time time time time time

Figure 5.8: Estimated temporal trends of top 10 genes with largestˆpg, and they actually are differentially expressed for all five subjects, where red and orange lines correspond to two non-vaccinated subjects.(subject 1: ”purple”, subject 2: ”blue”, subject 3: ”green”, subject ”red”, ”orange”)

129 130 Average of β for All Subjects DE

20 ● 15

● β 10 2nd

● ● ● ● 5

● ●●● ● ● ● ● ● ●● ● ● ●●● ●● ● ● ●●●●● ● ●● ● ● ● ● ●●●●●●● ●● ●●●●●● ● ●●●●●● ●●● ●●●●●●●●● ● ● ●●●●● 0 ●●● ●● ●● ●● ● ●●● ● ●●

0 5 10 15 20 25

1st β

Figure 5.9: Scatter plot of average β1 of all five subjects DE genes vs average β2 of all five subjects DE genes.

131 IER5 PFKFB2 HADHB EPHA4 15 7.0 9.0 10.6 13 6.0 10.2 8.0 11 9.8 9 5.0 7.0 2 4 6 8 2 4 6 8 2 4 6 8 2 4 6 8

time time time time

FAM174A AGPAT1 ZDHHC4 RNF122 9.5 6.2 6.4 3.0 5.8 6.0 8.5 5.4 5.6 2.0 7.5 5.0 5.2

2 4 6 8 2 4 6 8 2 4 6 8 2 4 6 8

time time time time

PIGO IARS NEK6 LMO7 4.8 6.5 14.5 10.5 4.4 9.5 13.5 6.0 4.0 8.5 12.5 5.5 7.5 3.6 11.5

2 4 6 8 2 4 6 8 2 4 6 8 2 4 6 8

time time time time

Figure 5.10: Scatter plots of β1 vs β2 of DE genes for vaccinated subjects (Black) but not for non-vaccinated subjects (Red).

132 Table 5.1: DE genes for vaccinated subjects but not for non-vaccinated subjects

Gene Symbol Functions Description IER5 Summary encoding immediate early response proteins,play an important role in mediating the cellular response to mitogenic signals Disease unknown Pathway unknown PFKFB2 Summary encoding by this gene is involved in both the synthesis and degradation of fructose-2,6-bisphosphate, a regulatory molecule that controls glycolysis in eukaryotes Disease unknown Pathway Akt Signaling, Metabolism HADHB Summary encodes the beta subunit of the mitochondrial trifunctional protein catalyzes long chain fatty acids can bind RNA and decreases the stability of some mRNAs Disease trifunctional protein deficiency, mitochondrial trifunctional protein deficiency Pathway Akt Signaling, Metabolism EPHA4 Summary EPH receptor subfamily of the protein-tyrosine kinase family implicated in mediating development (nervous system) Disease staphyloenterotoxemia Pathway GPCR Pathway, Akt Signaling FAM174A Summary Family With Sequence Similarity 174, Member A, a Protein Coding gene Disease hepatitis c, hepatitis c virus Pathway unknown AGPAT1 Summary encodes an enzyme to convert lysophosphatidic acid (LPA) into phosphatidic acid (PA) Disease unknown Pathway Metabolism ZDHHC4 Summary Zinc Finger, DHHC-Type Containing 4 encodes enzyme Disease unknown Pathway unknown RNF122 Summary encodes a motif present in functionally distinct proteins and involved in protein-protein and protein-DNA interactions Disease unknown Pathway unknown PIGO Summary encodes a protein involved in glycosylphosphatidylinositol (GPI)-anchor133 biosynthesis Disease hyperphosphatasia with mental retardation syndrome 2 hyperphosphatasia-intellectual disability syndrome Pathway Biosynthesis of the N-glycan precursor and transfer to a nascent protein Table 5.2: DE genes for vaccinated subjects but not for non-vaccinated subjects

Gene Symbol Functions Description IARS Summary Aminoacyl-tRNA synthetases catalyze the aminoacylation of tRNA (the first proteins that appeared in evolution) Disease polymyositis, dermatomyositis Pathway Gene Expression, tRNA Aminoacylation NEK6 Summary encoded a kinase required for progression through the metaphase portion of mitosis, inhibition can lead to apoptosis, may enhance tumorigenesis by suppressing tumor cell senescence Disease unknown Pathway Cell Cycle, Mitotic LMO7 Summary encodes a protein containing a calponin homology (CH) domain, a PDZ domain, and a LIM domain, and may be involved in protein-protein interactions Disease townes-brocks syndrome Pathway PAK Pathway, Adherens junction IFI27L2 Summary Interferon, Alpha-Inducible Protein 27-Like 2 encodes a protein Disease unknown Pathway unknown UBN1 Summary (Ubinuclein 1) encodes a protein, a novel regulator of senescence Disease unknown Pathway Cellular Senescence MRPL45 Summary Suppressor Of Ty 6 Homolog (S. Cerevisiae) encodes a Protein Disease unknown Pathway unknown SUPT6H Summary Suppressor Of Ty 6 Homolog (S. Cerevisiae) encodes a Protein Disease unknown Pathway unknown MFSD11 Summary Major Facilitator Superfamily Domain Containing 11 Disease unknown Pathway unknown NARF Summary Nuclear Prelamin A Recognition Factor Disease unknown Pathway unknown S1PR2 Summary encodes a member of the G protein-coupled receptors and the EDG family of proteins and participates in sphingosine 1-phosphate-induced cell proliferation, survival, and transcriptional activation Disease unknown 134 Pathway Signaling by GPCR, Nanog in Mammalian ESC Pluripotency Table 5.3: DE genes for vaccinated subjects but not for non-vaccinated subjects

Gene Symbol Functions Description BST2 Summary Bone marrow stromal cells are involved in the growth and development of B-cells, related to YF-Vax Disease west nile encephalitis, rheumatoid arthritis Pathway unknown CRLS1 Summary (Cardiolipin Synthase 1) encodes a Protein Disease catastrophic antiphospholipid syndrome Pathway Metabolism IL2RG Summary an important signaling component of many interleukin receptors and the common gamma chain, related to YF-Vax Disease severe combined immunodeficiency, x-linked Pathway PI3K-Akt signaling pathway, Akt Signaling PHKA1 Summary The alpha subunit includes the skeletal muscle and hepatic isoforms, and the skeletal muscle isoform is encoded Disease muscle glycogenosis, muscular phosphorylase kinase deficiency Pathway Activation of cAMP-Dependent PKA

135 IFI27L2 UBN1 SUPT6H MRPL45 MFSD11 5.0 6.0 5.6 11.5 4.5 15.5 5.5 5.2 11.0 4.0 5.0 10.5 4.8 14.5 3.5

2 6 10 2 6 10 2 6 10 2 6 10 2 6 10

time time time time time

NARF S1PR2 BST2 CRLS1 IL2RG 11 8.8 10 6.5 22.0 1.6 8.4 9 6.0 21.0 1.2 8.0 8 5.5 0.8 20.0 7 7.6

2 6 10 2 6 10 2 6 10 2 6 10 2 6 10

time time time time time

PHKA1 2.5 2.0 1.5 1.0

2 6 10

time

Figure 5.11: Scatter plots of β1 vs β2 of DE genes for vaccinated subjects (Black) but not for non-vaccinated subjects (Red).

136 GPR87 SULT1E1 TWIST1 0.15 0.3 0.20 0.15 0.10 0.2 0.10 0.05 0.1 0.05 0.00 0.00 0.0 −0.05 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10

time time time

RERG BCAR4 PRKY 0.15 8 0.20 0.15 6 0.10 0.10 4 0.05 0.05 2 0.00 0.00 0 −0.05 2 4 6 8 10 2 4 6 8 10 2 4 6 8 10

time time time

Figure 5.12: Scatter plots of β1 vs β2 of DE genes for non-vaccinated subjects (Red) but not for vaccinated subjects (Black).

137 Table 5.4: DE genes for non-vaccinated subjects but not for vaccinated subjects

Gene Symbol Functions Description GPR87 Summary encoding a G protein-coupled receptor Disease lung squamous cell carcinoma, oral squamous cell carcinoma Pathway Peptide ligand-binding receptors SULT1E1 Summary Sulfotransferase family enzymes catalyze the sulfate conjugation (hormones,neurotransmitters, drugs, and xenobiotic compounds) Disease endometrial adenocarcinoma, endometrial stromal sarcoma Pathway Metabolism, Estrogen metabolism TWIST1 Summary twists BHLH transcription factor, affects cell lineage determination, differentiation Disease saethre-chotzen syndrome, robinow-sorauf syndrome Pathway Proteoglycans in cancer, Neural Crest Differentiation RERG Summary RAS superfamily of GTPases, inhibits cell proliferation and tumor formation Disease pituitary carcinoma Pathway unknown BCAR4 Summary non-coding RNA Gene and affiliated with the lncRNA class Disease breast cancer Pathway unknown PRKY Summary Pseudogene for the protein kinase, X-linked gene in the pseudoautosomal region of the X Disease XX males and XY females Pathway unknown

138 Chapter 6

Summary

The chapters in this dissertation can be separated into two parts. Through the whole work, we develop the Bayesian model selection method by using nonlocal prior: pMOM and piMOM. In the first part, we develop the Bayesian method of jointly de- tection and estimation of neuronal activations and timing the difference of neuronal activations of interested areas for the functional MRI data. In the second part, we develop the Bayesian method of time course RNA-seq data to identify the differen- tially expressed genes by borrowing information across all the subjects. For both of these two parts, we develop a Bayesian model selection scheme by using a nonlocal prior. We use the pMOM prior in the first part and the piMOM prior in the second part. The first four chapters focus on fMRI data. Specifically, we propose a new triple- gamma hemodynamic response function that includes the initial dip to describe voxel- wise neuron activity. Based on that, we apply two different Bayesian model selection schemes by using pMOM prior on our data to jointly detect and estimate neuronal

139 activations. Also, our model selection method allows us to estimate the time difference of activations between visual and motor areas which are regions of interests. We pool the information across the selected voxels in the same interested area to estimate the start of neuronal activation and compare the time difference between motor and visual areas. More specifically, we implement the Bayesian model selection in two different ways. In Chapter 3, we approximate Bayes Factor inside MCMC by using conditional marginal likelihoods and in Chapter 4, we approximate Bayes Factor outside the MCMC by using Laplace-Metropolis marginal likelihoods. Chapter 5 focuses on RNA-seq Time Course experiments. Specifically, we pro- pose a Bayesian principal component regression model and we use empirical Bayes method to specify the priors for unknown parameters in the model. We also apply the Bayesian model selection scheme inside the MCMC by using piMOM prior for the model coefficients. After applying the whole framework on vaccination data, we use posterior probability of selecting non-null model to determine the differentially expressed genes across all the subjects. Also, by using differentially expressed prob- abilities, we identify top genes differentially expressed for all the subjects and find several of them having close connection to immune system responses. In conclusion, the Bayesian model selection by using nonlocal prior has better performance of selecting correct models and can fairly separate the null model and alternative models. In the future, the use of Bayesian model selection with nonlocal priors can be applied to other complex and high dimensional data and can also be extended to more complex hierarchical model structures.

140 Bibliography

Aguirre, G., Zarahn, E., and D’esposito, M. “The variability of human, BOLD hemodynamic responses.” Neuroimage, 8(4):360–369 (1998).

Anders, S. and Huber, W. “Differential expression analysis for sequence count data.” Genome biol, 11(10):R106 (2010).

Benjamini, Y. and Hochberg, Y. “Controlling the false discovery rate: a practical and powerful approach to multiple testing.” Journal of the Royal Statistical Society. Series B (Methodological), 289–300 (1995).

Buckner, R. L., Bandettini, P. A., OCraven, K. M., Savoy, R. L., Petersen, S. E., Raichle, M. E., and Rosen, B. R. “Detection of cortical activation during aver- aged single trials of a cognitive task using functional magnetic resonance imaging.” Proceedings of the National Academy of Sciences, 93(25):14878–14883 (1996).

Chaari, L., Forbes, F., Vincent, T., and Ciuciu, P. “Hemodynamic-informed parcel- lation of fMRI data in a joint detection estimation framework.” In Medical Image Computing and Computer-Assisted Intervention–MICCAI 2012 , 180–188. Springer (2012).

141 Chee, M. W., Venkatraman, V., Westphal, C., and Siong, S. C. “Comparison of block and event-related fMRI designs in evaluating the word-frequency effect.” Human Brain Mapping, 18(3):186–193 (2003).

Cicatiello, L., Scafoglio, C., Altucci, L., Cancemi, M., Natoli, G., Facchiano, A., Iazzetti, G., Calogero, R., Biglia, N., De Bortoli, M., et al. “A genomic view of estrogen actions in human breast cancer cells by expression profiling of the hormone- responsive transcriptome.” Journal of molecular endocrinology, 32(3):719–775 (2004).

Cui, S., Guha, S., Ferreira, M. A., Tegge, A. N., et al. “hmmSeq: A hidden Markov model for detecting differentially expressed genes from RNA-seq data.” The Annals of Applied Statistics, 9(2):901–925 (2015). de Magalh˜aes,J. P., Finch, C. E., and Janssens, G. “Next-generation sequencing in aging research: emerging applications, problems, pitfalls and possible solutions.” Ageing research reviews, 9(3):315–323 (2010).

Donaldson, D. I. and Buckner, R. L. “Effective paradigm design.” In IN P. JEZZARD (ED.), FUNCTIONAL MRI . Citeseer (2001).

Duong, T. Q., Kim, D.-S., U˘gurbil,K., and Kim, S.-G. “Spatiotemporal dynamics of the BOLD fMRI signals: toward mapping submillimeter cortical columns using the early negative response.” Magnetic resonance in medicine, 44(2):231–242 (2000).

Etzel, J. A., Gazzola, V., and Keysers, C. “An introduction to anatomical ROI-based fMRI classification analysis.” Brain research, 1282:114–125 (2009).

142 Fox, M. D. and Raichle, M. E. “Spontaneous fluctuations in brain activity ob- served with functional magnetic resonance imaging.” Nature Reviews Neuroscience, 8(9):700–711 (2007).

Friston, K. J. “Statistical parametric mapping.” In Neuroscience Databases, 237–250. Springer (2003).

Friston, K. J., Fletcher, P., Josephs, O., Holmes, A., Rugg, M., and Turner, R. “Event-related fMRI: characterizing differential responses.” Neuroimage, 7(1):30– 40 (1998a).

Friston, K. J., Glaser, D. E., Henson, R. N., Kiebel, S., Phillips, C., and Ashburner, J. “Classical and Bayesian inference in neuroimaging: applications.” Neuroimage, 16(2):484–512 (2002).

Friston, K. J., Holmes, A. P., Worsley, K. J., Poline, J.-P., Frith, C. D., and Frack- owiak, R. S. “Statistical parametric maps in functional imaging: a general linear approach.” Human brain mapping, 2(4):189–210 (1994).

Friston, K. J., Josephs, O., Rees, G., and Turner, R. “Nonlinear event-related re- sponses in fMRI.” Magnetic resonance in medicine, 39(1):41–52 (1998b).

Friston, K. J., Zarahn, E., Josephs, O., Henson, R., and Dale, A. M. “Stochastic designs in event-related fMRI.” Neuroimage, 10(5):607–619 (1999).

Gamerman, D. and Lopes, H. F. Markov chain Monte Carlo: stochastic simulation for Bayesian inference, volume 68. Boca Raton, FL: Champan & Hall (2006).

Gaucher, D., Therrien, R., Kettaf, N., Angermann, B. R., Boucher, G., Filali- Mouhim, A., Moser, J. M., Mehta, R. S., Drake, D. R., Castro, E., et al. “Yel-

143 low fever vaccine induces integrated multilineage and polyfunctional immune re- sponses.” The Journal of experimental medicine, 205(13):3119–3131 (2008).

Glover, G. H. “Deconvolution of Impulse Response in Event-Related BOLD fMRI.” Neuroimage, 9(4):416–429 (1999).

Goutte, C., Nielsen, F. A., and Hansen, L. K. “Modeling the hemodynamic re- sponse in fMRI using smooth FIR filters.” Medical Imaging, IEEE Transactions on, 19(12):1188–1201 (2000).

Hall, N. “Advanced sequencing technologies and their wider impact in microbiology.” Journal of Experimental Biology, 210(9):1518–1525 (2007).

Henn, A. D., Wu, S., Qiu, X., Ruda, M., Stover, M., Yang, H., Liu, Z., Welle, S. L., Holden-Wiltse, J., Wu, H., and Zand, M. S. “High-resolution temporal response patterns to influenza vaccine reveal a distinct human plasma cell gene signature.” Scientific Reports, 3 (2013).

Johnson, V. E. and Rossell, D. “On the use of non-local prior densities in Bayesian hypothesis tests.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 72(2):143–170 (2010).

—. “Bayesian model selection in high-dimensional settings.” Journal of the American Statistical Association, 107(498):649–660 (2012).

Kershaw, J., Ardekani, B. A., and Kanno, I. “Application of Bayesian inference to fMRI data analysis.” Medical Imaging, IEEE Transactions on, 18(12):1138–1153 (1999).

144 Kruggel, F., Herrmann, C., Wiggins, C., and Von Cramon, D. “Hemodynamic and electroencephalographic responses to illusory figures: recording of the evoked po- tentials during functional MRI.” Neuroimage, 14(6):1327–1336 (2001).

Kruggel, F. and Von Cramon, D. “Temporal properties of the hemodynamic response in functional MRI.” Human brain mapping, 8(4):259–271 (1999).

Lewis, S. M. and Raftery, A. E. “Estimating Bayes factors via posterior simula- tion with the LaplaceMetropolis estimator.” Journal of the American Statistical Association, 92(438):648–655 (1997).

Liang, F., Paulo, R., Molina, G., Clyde, M. A., and Berger, J. O. “Mixtures of g priors for Bayesian variable selection.” Journal of the American Statistical Association, 103(481) (2008).

Lindquist, M. A., Meng Loh, J., Atlas, L. Y., and Wager, T. D. “Modeling the hemodynamic response function in fMRI: efficiency, bias and mis-modeling.” Neu- roimage, 45(1):S187–S198 (2009).

Lindquist, M. A. and Wager, T. D. “Validity and power in hemodynamic response modeling: a comparison study and a new approach.” Human brain mapping, 28(8):764–784 (2007).

Lindquist, M. A. et al. “The statistical analysis of fMRI data.” Statistical Science, 23(4):439–464 (2008).

Liu, J. S. Monte Carlo strategies in scientific computing. springer (2008).

Maher, C. A., Kumar-Sinha, C., Cao, X., Kalyana-Sundaram, S., Han, B., Jing, X.,

145 Sam, L., Barrette, T., Palanisamy, N., and Chinnaiyan, A. M. “Transcriptome sequencing to detect gene fusions in cancer.” Nature, 458(7234):97–101 (2009).

Malavasi, F., Deaglio, S., Funaro, A., Ferrero, E., Horenstein, A. L., Ortolan, E., Vaisitti, T., and Aydin, S. “Evolution and function of the ADP ribosyl cy- clase/CD38 gene family in physiology and pathology.” Physiological reviews, 88(3):841–886 (2008).

Marrelec, G., Benali, H., Ciuciu, P., P´el´egrini-Issac,M., and Poline, J.-B. “Robust Bayesian estimation of the hemodynamic response function in event-related BOLD fMRI using basic physiological information.” Human Brain Mapping, 19(1):1–17 (2003).

Miezin, F. M., Maccotta, L., Ollinger, J., Petersen, S., and Buckner, R. “Character- izing the hemodynamic response: effects of presentation rate, sampling procedure, and the possibility of ordering brain activity based on relative timing.” Neuroimage, 11(6):735–759 (2000).

Mitchell, L. A., Hansen, R. J., Beaupre, A. J., Gustafson, D. L., and Dow, S. W. “Optimized dosing of a CCR2 antagonist for amplification of vaccine immunity.” International immunopharmacology, 15(2):357–363 (2013).

Muller, P., Parmigiani, G., and Rice, K. “FDR and Bayesian multiple comparisons rules.” Bayesian Statistics, 8 (2006).

—. “FDR and Bayesian multiple comparisons rules.” In Bernardo, J. M., Bayarri, S., Berger, J. O., Dawid, A., Heckerman, D., Smith, A. F. M., and West, M. (eds.), Bayesian Statistics 8 . Oxford University Press (2007).

146 Nagalakshmi, U., Wang, Z., Waern, K., Shou, C., Raha, D., Gerstein, M., and Snyder, M. “The transcriptional landscape of the yeast genome defined by RNA sequenc- ing.” Science, 320(5881):1344–1349 (2008).

Nakaya, H. I., Wrammert, J., Lee, E. K., Racioppi, L., Marie-Kunze, S., Haining, W. N., Means, A. R., Kasturi, S. P., Khan, N., Li, G.-M., et al. “Systems biology of vaccination for seasonal influenza in .” Nature immunology, 12(8):786– 795 (2011).

Newton, M. A., Noueiry, A., Sarkar, D., and Ahlquist, P. “Detecting differential gene expression with a semiparametric hierarchical mixture method.” Biostatistics, 5(2):155–176 (2004).

Ogawa, S., Tank, D. W., Menon, R., Ellermann, J. M., Kim, S. G., Merkle, H., and Ugurbil, K. “Intrinsic signal changes accompanying sensory stimulation: functional brain mapping with magnetic resonance imaging.” Proceedings of the National Academy of Sciences, 89(13):5951–5955 (1992).

Oh, S., Song, S., Dasgupta, N., and Grabowski, G. “The analytical landscape of static and temporal dynamics in transcriptome data.” Frontiers in genetics, 5 (2014).

Oh, S., Song, S., Grabowski, G., Zhao, H., and Noonan, J. P. “Time series expression analyses using RNA-seq: a statistical approach.” BioMed research international, 2013 (2013).

Oshlack, A., Robinson, M. D., and Young, M. D. “From RNA-seq reads to differential expression results.” Genome Biology, 11(12):220 (2010).

147 Phillips, T. R., Jensen, J. L., Rubino, M. J., Yang, W. C., and Schultz, R. D. “Ef- fects of vaccines on the canine immune system.” Canadian Journal of Veterinary Research, 53(2):154 (1989).

Robert, C. and Casella, G. “Monte Carlo statistical methods.” Springer texts in statistics (1999).

Robinson, M. D. and Oshlack, A. “A scaling normalization method for differential expression analysis of RNA-seq data.” Genome Biol, 11(3):R25 (2010).

Scherer, K. “Component models of emotion can inform the quest for emotional competence.” The science of emotional intelligence: Knowns and unknowns, 101– 126 (2007).

Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D., and Futcher, B. “Comprehensive identification of cell cycle–regulated genes of the yeast Saccharomyces cerevisiae by microarray hy- bridization.” Molecular biology of the cell, 9(12):3273–3297 (1998).

Spiegelhalter, D. J., Best, N. G., Carlin, B. P., and Van Der Linde, A. “Bayesian measures of model complexity and fit.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 64(4):583–639 (2002).

Thompson, J. K., Peterson, M. R., and Freeman, R. D. “High-resolution neu- rometabolic coupling revealed by focal activation of visual neurons.” Nature neu- roscience, 7(9):919–920 (2004).

Woolrich, M. W., Ripley, B. D., Brady, M., and Smith, S. M. “Temporal autocorre-

148 lation in univariate linear modeling of FMRI data.” Neuroimage, 14(6):1370–1386 (2001).

Worsley, K. J., Liao, C., Aston, J., Petre, V., Duncan, G., Morales, F., and Evans, A. “A general statistical analysis for fMRI data.” Neuroimage, 15(1):1–15 (2002).

Yacoub, E. and Hu, X. “Detection of the early negative response in fMRI at 1.5 Tesla.” Magnetic resonance in medicine, 41(6):1088–1092 (1999).

Zarahn, E., Aguirre, G. K., and D’Esposito, M. “Empirical analyses of BOLD fMRI statistics.” NeuroImage, 5(3):179–197 (1997).

Zhang, Y., Chen, G., Liu, Z., Tian, S., Zhang, J., Carey, C. D., Murphy, K. M., Storkus, W. J., Falo, L. D., and You, Z. “Genetic Vaccines To Potentiate the Ef- fective CD103+ Dendritic Cell–Mediated Cross-Priming of Antitumor Immunity.” The Journal of Immunology, 1500089 (2015).

Zhao, Q. “Dual targeting of CCR2 and CCR5: therapeutic potential for immunologic and cardiovascular diseases.” Journal of leukocyte biology, 88(1):41–55 (2010).

149 VITA

Yuan Cheng was born in Zhejiang Hangzhou, China on April 17, 1985. After graduating with a Bachelor of Computational Science degree from Xidian University in 2008, she attended University of Missouri-Kansas City. She graduated with a Master of Science degree specializing in Statistics in 2010. Then she entered the University of Missouri-Columbia and began research on this project with Professor Marco Ferreira since May 2012. She got married with Yuelei Sui in 2011 and their daughter Anna Sui was born in Dec, 2011. After graduation, she chose to start her career in Novartis Oncology which is one of the biggest pharmaceutical companies in the world.

150