Applying Markov Chain Monte Carlo Model Composition to a Restricted Model Space

APPLYING MARKOV CHAIN MONTE CARLO MODEL COMPOSITION TO A RESTRICTED MODEL SPACE Tim D. Brown A Thesis Submitted to the University of North Carolina Wilmington in Partial Fulfillment of the Requirements for the Degree of Master of Science Department of Mathematics and Statistics University of North Carolina Wilmington 2011 Approved by Advisory Committee John Karlof Dargan Frierson Susan Simmons Chair Accepted by Dean, Graduate School TABLE OF CONTENTS ABSTRACT . v INTRODUCTION . 1 BAYESIAN HIERARCHICAL MODEL . 3 MODEL . 3 POSTERIOR AND LIKELIHOOD . 4 GIBBS SAMPLER . 7 SEARCH ALGORITHM . 9 STOCHASTIC SEARCH . 9 INITIAL RESTRICTION . 10 ACTIVATION PROBABILITY . 12 SIMULATION AND RESULTS . 13 CONCLUSION . 20 REFERENCES . 22 APPENDIX . 24 ii LIST OF FIGURES 1 Algorithm . 11 2 Number of parameters for first half of the run . 16 3 Number of parameters for the second half of the run . 16 4 Amount of loci from Parent A and B form their respective markers . 17 5 Comparison of loci from marker 37 and 92 from their respective lines 17 iii LIST OF TABLES 1 Top Five Activation Probabilities by Marker . 18 2 Bottom Five Activation Probabilities by Marker . 18 3 Top Five Visits by Marker . 18 4 Bottom Five Visits by Marker . 18 5 Number of Times Markers 37 and 92 Entered and Exited the Model Search . 19 6 Conditional Expected Mean of the Qunatitative Trait . 19 iv ABSTRACT Model selection is an essential component in a variety of fields in research as well as a process of finding which feature or variables are related to a response. With many model search algorithms, for example forward selection and backward elimination, it is required to have more observations than features or variables. However, many current datasets have more features than observations. This dilemma is called the PN problem. The following research proposes a solution to this dilemma in a complex Bayesian hierarchical setting. The search algorithm will be a Monte Carlo Markov chain model composition or (MC)3 algorithm with a restricted model space. v INTRODUCTION The classical applications to model selection are: forward selection, backward elimination, and stepwise procedure [?]. Forward selection involves starting from a null model (no parameters) and then individually adding each feature to find the best fit. The process of adding one feature at a time is continued until all important features are included in the model. Backward elimination involves selection from the opposite direction by starting with a full model of features and individually eliminating them to find the best fit. The stepwise procedure combines processes from the previous two examples by adding and eliminating features one at a time with a systematic search. Using these three algorithms on the same set may result in having different final results for the \best model", and at times this might not be the true best model for the dataset. Due to this inconsistency, there have been more current algorithms proposed for model selection. An example is Broman and Speed's testing of the MCMC algorithm. The MCMC algorithm is a simulation that starts out with a approximate distribution called the prior and, through sequential samples, ends at a more accurate model[?]. From their test, they concluded that using Bayesian information criteria (or BIC) with MCMC was the \best" at selecting the correct features. This was because the algorithm eliminated extraneously cor- related features, a problem that has occurred with many other model selections[?]. The BIC is based upon using a logarithmic transformation and comparing likelihood functions[?]. This research investigates using the (MC)3 algorithm in a complex hierarchical setting. The Bayesian framework provides an efficient structure to incorporate a hierarchical setting for complex datasets. In the next section, we develop the Bayesian hierarchical model and discuss how the Gibbs sampler can be used to create posterior samples. In the search algorithm section, we discuss the stochastic search and how 1 restrictions can be placed on the number of parameters. The activation probability section describes how activation probabilities are calculated, and finally we conclude this thesis with the conclusion and a discussion of future ideas. BAYESIAN HIERARCHICAL MODEL MODEL The hierarchical model can accommodate various complex structures in a data set, such as information across various sites or laboratories. For example, in plant QTL experiments, plants within lines contain identical genetic information and can be considered clusters. The hierarchical model can accommodate these clusters and combine this information in a model. Using a Bayesian framework is advantageous due to its flexibility. The first level of the model assumes that the average mean (θi) from each cluster is normally distributed with mean (XT B) and variance (τ 2). The feature matrix X has L by M dimensions where L represents the number of clusters and M represents the number of features. The coefficient vector (β) provides information about the effect size of each feature. If a coefficient is equal to 0, the feature is not important to the model. This can be represented by: 2 T 2 θijβ; τ ;X ∼ N(Xi β; τ ): (1) For the coefficient vector of β and variances, τ 2, we use the following hyperpriors: τ 2 ∼ Inv − χ2(1) β ∼ N(0; 100): Here, i represents a particular cluster and j (j = 1; ::; ni) represent observations within clusters. Our response matrix Y has L rows representing clusters for the X matrix, respectively. The length of each row, however, can vary in length. An example of this 3 would be if one cluster has 5 observations and the next cluster has 6 observations, the rows of the matrix would have 5 and 6 values, respectively. We assume that yi;j 2 is normally distributed with mean θi and variance σi , or in other words[?]: 2 2 yi;jjθi; σi ∼ N(θi; σi ) (2) 2 The parameter σi is assumed to follow an inverse-Chi-square distribution or: 2 2 σi ∼ Inv − χ (1): POSTERIOR AND LIKELIHOOD The posterior distribution for a single unknown parameter (θ) is given by: p(θ; y) p(yjθ)p(θ) p(yjθ)p(θ) p(θjy) = = = R : (3) p(y) p(y) θ p(yjθ)p(θ)dθ Here, p(θ) represents the prior distribution and p(yjθ) is the sampling distribution of the data. Using this method, we can find the joint posterior given our feature matrix[?]. 2 2 However, in this model, we have L σi parameters, L θi parameters, one τ parameter and up to M β parameters. The joint posterior distribution can be found by: p(θ; σ2; τ 2; β; Y ) p(θ; σ2; τ 2; βjY ) = : (4) P (Y ) Under certain assumptions of independence and disregarding the normalizing constant, we have that[?]: 4 p(θ; τ 2; β; σ2jy) / p(yjθ; σ2)p(β)p(σ2)p(τ 2)p(θjX; β; τ 2) Y X 1 1 1 / (τ τ0+2+L (σni+σ0i+2))−1 × exp − − − βT β 2σ2 2τ 2 200 i i i ! 1 X X 1 − (θ − Xβ)T (θ − Xβ) − (y − θ )2 : 2τ 2 2σ2 ij i i j i (5) Due to using conjugate priors the full conditionals of the parameters simplify to known parametric forms. The following equations illustrate the full conditional distribution of each parameter[?]: p(θ; τ 2; β; σ2; y) p(θjτ 2; β; σ2; y) = p(τ 2; β; σ2; y) p(τ 2; β; σ2; y; θ) = R p(τ 2; β; σ2; y; θ)dθ n 2 0 0 P i y 1 1 L Xiβ + j=i ij X −1 τ 2 σ2 / exp B θ − i C (6) 1 @ i 1 ni A @ 2( n ) + A i=1 1 + i τ 2 σ2 τ2 σ2 i i n P i y 0 Xiβ j=i ij 1 τ 2 + σ2 1 θ jτ 2; β; σ2; y ∼ N i ; i @ 1 ni 1 ni A 2 + 2 2 + 2 τ σi τ σi 5 p(σ2; τ 2; β; θ; y) p(σ2jτ 2; β; θ; y) = p(τ 2; β; θ; y) L L L ni !! 1+n 2 Y 2 − i +2 X 1 X X (yij − θi) / (σ ) ( 2 )exp − − i 2σ2 2σ2 i=1 i i=1 j=1 i (7) Pni 2 ! 1=2 + n (yij − θi) + 1 σ2jτ 2; θ; β; y ∼ Inv − Γ i ; j=1 2 2 p(β; τ 2; σ2; θ; y) p(βjτ 2; σ2; θ; y) = p(τ 2; σ2; θ; y) 1 1 / exp − ββT − (θ − Xβ)T (θ − Xβ) 200 2τ 2 I XT X XT θT I XT X XT θ / exp − β − ( + ) β − ( + ) 100 τ 2 τ 2 100 τ 2 τ 2 ! 1 I XT X −1 + 2 100 τ 2 ! I XT θ XT X XT θ I XT X −1 βjσ2τ 2; θ; y ∼ N + ; + : 100 τ 2 τ 2 τ 2 100 τ 2 In the equation above, I is representative of an L by L identity matrix. p(τ 2; β; σ2; θ; y) p(τ 2jβ; σ2; θ; y) = p(β; σ2; θ; y) T − 5 −L (θ − Xβ) (θ − Xβ) + 1 / τ( 2 )exp − (8) 2τ 2 n ! 1 + L Xi (θ − Xβ)T (θ − Xβ) + 1 τ 2jβ; σ2; θ; y ∼ Inv − Γ 2 ; 2 2 6 GIBBS SAMPLER We shall use the full conditionals from the previous subsection to execute the Gibbs sampler. The Gibbs sampler is a type of MCMC algorithm that approaches a stationary distribution with enough sampling [?]. In this case, the stationary distribution represents the joint posterior distribution. The initial state of the Gibbs sampler starts with initial values of the parameters: th θ0i is the mean of the i observed data cluster 2 th σi is the variance of the i observed data cluster 2 τ0 is the variance of the observed sample means β is the initial set of parameters for the regression model based on X: Our parameters for the Gibbs sampler are computed as follows[?]: 2 2 βq+1 ∼ N(βjθq; σq ; τq ; y) (9) 2 τq+1 ∼ Inv − Γ(τjβq+1; θq; σq ; y) (10) 2 2 θq+1 ∼ N(θjβq+1; σq ; τq+1; y) (11) 2 2 2 σq+1 ∼ Inv − Γ(σ jβq+1; θq+1; τq+1; y): (12) After the posterior samples from the joint posterior distribution are simulated, we use this information in calculating P (Y jMr) for a model where Mr is the model from which posterior samples are generated.

Applying Markov Chain Monte Carlo Model Composition to a Restricted Model Space

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support