A HMM Approach to Identifying Distinct DNA Methylation Patterns

Total Page:16

File Type:pdf, Size:1020Kb

A HMM Approach to Identifying Distinct DNA Methylation Patterns A HMM Approach to Identifying Distinct DNA Methylation Patterns for Subtypes of Breast Cancers Thesis Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University By Maoxiong Xu, B.S. Graduate Program in Computer Science and Engineering The Ohio State University 2011 Thesis Committee: Victor X. Jin, Advisor Raghu Machiraju Copyright by Maoxiong Xu 2011 Abstract The United States has the highest annual incidence rates of breast cancer in the world; 128.6 per 100,000 in whites and 112.6 per 100,000 among African Americans.[1,2] It is the second-most common cancer (after skin cancer) and the second-most common cause of cancer death (after lung cancer).[1] Recent studies have demonstrated that hyper- methylation of CpG islands may be implicated in tumor genesis, acting as a mechanism to inactivate specific gene expression of a diverse array of genes (Baylin et al., 2001). Genes have been reported to be regulated by CpG hyper-methylation, include tumor suppressor genes, cell cycle related genes, DNA mismatch repair genes, hormone receptors and tissue or cell adhesion molecules (Yan et al., 2001). Usually, breast cancer cells may or may not have three important receptors: estrogen receptor (ER), progesterone receptor (PR), and HER2. So we will consider the ER, PR and HER2 while dealing with the data. In this thesis, we first use Hidden Markov Model (HMM) to train the methylation data from both breast cancer cells and other cancer cells. Also we did hierarchy clustering to the gene expression data for the breast cancer cells and based on the clustering results, we get the methylation distribution in each cluster. Finally, we correlate the HMM training results with the methylation distribution and get the biology meanings for the states in the HMM results. ii Dedicated to my father, mother, and wife, for all of their love and support. iii Acknowledgments I have many people to thank for my making it this far: my advisor, Dr. Victor Jin, for everything he's done; Dr. Raghu Machiraju, for his help and support; all of my lab mates, for their knowledge, assistance, and encouragement; and the incredible Biomedical Informatics Department staff for everything they do. iv Vita 2005……………………………...Mudu Central High School 2009……………………………...B.S. Computer Science, Southeast University 2009 to present……….……..……M.S. Computer Science & Engineering, The Ohio State University Sep. 2010 to present……………...Graduate Teaching Associate, Department of Bioinformatics, The Ohio State University Publications Cao AR, Rabinovich R, Xu M, Xu X, Jin VX, Farnham PJ: Genome-wide analysis of transcription factor E2F1 mutant proteins reveals that N- and C-terminal protein interaction domains do not participate in targeting E2F1 to the human genome. J Biol Chem. 2011 Apr 8; 286(14):11985-96. Epub 2011 Feb 10. Fields of Study Major: Computer Science & Engineering Machine Learning applied in Bioinformatics v Table of Contents Abstract……........................................................................................................................ii Dedication………………………………………………………………………..……….iii Acknowledgments…..........................................................................................................iv Vita......................................................................................................................................v Table of Contents ...............................................................................................................vi List of Tables .....................................................................................................................ix List of Figures.....................................................................................................................xi Chapter 1: Introduction........................................................................................................1 1.1 Methylation……………………………………………………………………1 1.1.1 What Is Methylation? ......................................................................1 1.1.2 DNA Methylation…………………………………………………2 1.1.3 DNA Methylation Mechanism……………………………….........3 1.1.4 DNA Methylation in Cancer...……………………………….........5 1.2 Gene Expression………………………………………………………………6 1.2.1 Gene Expression Measurement……………………………….…….7 1.2.2 mRNA Quantification……………………………………………8 1.2.3 Regulation of Gene Expression……………………….….……...10 1.3 Hidden Markov Model………………………………………………...…….11 1.3.1 Introduction to Hidden Markov Model…………………….……12 vi 1.3.2 Hidden Markov Model……………………………………..…….13 1.3.3 Model Architecture...…………………………………………….13 1.3.4 HMM Training and Decoding……………………………..…….14 1.3.5 HMMs in Computational Biology………………………..……...15 1.3.6 Application of HMMs to Specific Problems……………..……...16 Chapter 2: Methods and Algorithms……………………………....….………………….18 2.1 The Probabilistic Model…………………….………….…………….………18 2.2 Baum-Welch Algorithm…………………………….……………….….……19 2.3 Work Flow…………………………………………….…………….….……23 Chapter 3: Data Process…..…………………………………………………………..….26 3.1 Data Sets……………………………………………………………..………26 3.2 MBD-seq Protocol…………………………………………………..……….27 3.3 Data Preprocess…………………………………………………….….……..27 3.4 Input for HMM……………………….………………………………..…….30 3.5 Methylation Distribution Overview………………….……………….….…..33 3.6 Gene Expression Data………………………………………………….……34 Chapter 4: Results and Discussion………………………………………………….…...35 4.1 Results from HMM………………………………………………….………35 vii 4.2 Biology Meanings………………………………………………………..…..41 4.2.1 Gene Expression Results for 33 Breast Cancer Cell Lines........…..41 4.2.2 Results Based on Different Clusters…………………………..…...42 4.2.3 States Meanings and Group Patterns……………………….....…...50 Chapter 5: Data Visualization……………………………………………………..……..56 Chapter 6: Conclusions and Suggestions for Further Work………………………..…....59 6.1 Conclusion……………………………………………………………..….…59 6.2 Future Work…………………………………….……………….…..…….…60 References………………………………………………………………………...….…..61 Appendix_Formats………………………………………………………….…..…….…66 A. BAM format………………………………………………………..…..…….66 B. SAM format………………………………………………………..….….….66 C. Export format………………………………………………………..…...…..67 D. BED format………………………………………………………..…………68 E. Fastq format………………………………………………………..………...70 F. Bowtie output format………………………………………………..……….71 viii List of Tables Table 3.1 Data summary for 36 cell lines……………………...………..……………….29 Table 3.2 12 Groups for 36 cell lines……………..………………………..…….….…...31 Table 4.1 BIC results for HMM results…………………………………..………….…..35 Table 4.2 Transition Matrix…………………………………………………..…….……36 Table 4.3 Emission probabilities for each mark in each state……………………..….…38 Table 4.4 Ordered emission probabilities for each mark in each state-mark………...…..39 Table 4.5 Ordered emission probabilities for each mark in each state- probabilities.…...39 Table 4.6 Filtered ordered emission probabilities for each mark in each state- marks….40 Table 4.7 Number of genes in each cluster……………………………………..………..43 Table 4.8 First 3 marks for each state…………………………………………………..50 ix Table 4.9 States and interval correlation results………………………………………..51 Table 4.10 States meanings…………………………………………………………….52 Table 4.11 Patterns for subtypes of Breast cancers…………………………………….52 x List of Figures Fig 1.1 Methylation…………………………………………………………………….…1 Fig 1.2 DNA methylation……………………………………………………………..…..2 Fig 1.3 DNA methylation mechanism……………………………………….……….…...4 Fig 1.4 DNA methylation in cancer…………………………………………….…….…...6 Fig 1.5 Gene Expression………………………………………………………….….……6 Figure 1.6: A simple HMM λ= (A,B, π),where N = 3, M = 3, a12,a23,a32 are non-zero, b1(a), b2(t),b3(g) = 1 and π = 1, 0, 0. ……………………………………………..……...13 Fig2.1 A Broad overview of the HMM work-flow, highlighting the most significant inputs, transformations, and outputs at each step from start to end. ……………..…...…23 Fig 3.1 Bar figure for 36 cell lines……...……………………………………………..…30 Fig 3.2 Methylation distribution for 33 breast cancer cell lines……...…………..……...34 Fig 4.1 Heatmap for transition matrix…………………………………………….……..37 Fig 4.2 33 Breast Cancer Cell Gene Expression One-Way Hierarchy Clustering……....41 Fig 4.3 Grouped 33 Breast Cancer Cell Gene Expression One-Way Hierarchy Clustering …………………………………………………………………….….……...42 Fig 4.4 Methylation distribution based on cluster 1 genes……………………...……….44 Fig 4.5 Methylation distribution based on cluster 2 genes…………………...…..……...45 xi Fig 4.6 Methylation distribution based on cluster 3 genes………………………...…….45 Fig 4.7 Methylation distribution based on cluster 4 genes…………………...……….....46 Fig 4.8 Methylation distribution based on cluster 5 genes…………………….………...46 Fig 4.9 Methylation distribution based on cluster 6 genes…………………...…….........47 Fig 4.10 Methylation distribution based on cluster 7 genes………………..….………...48 Fig 4.11 Methylation distribution based on cluster 8 genes………………….……........48 Fig 4.12 Methylation distribution based on cluster 9 genes………………….……........49 Fig 5.1 Database Web Tool……………………………………………………………..56 xii Chapter 1: Introduction 1.1 Methylation 1.1.1 What Is Methylation? In the view of chemical sciences, methylation means the addition of a methyl group to a substrate or the substitution of an atom or group by a methyl group. Methylation is a form of alkylation with, to be specific, a methyl group, rather than a larger carbon chain, replacing a hydrogen atom. In the view of biological systems, methylation is catalyzed by enzymes; such methylation can be involved in modification of heavy metals, regulation of gene expression, regulation
Recommended publications
  • Modeling Dependence in Data: Options Pricing and Random Walks
    UNIVERSITY OF CALIFORNIA, MERCED PH.D. DISSERTATION Modeling Dependence in Data: Options Pricing and Random Walks Nitesh Kumar A dissertation submitted in partial fulfillment of the requirements for the degree Doctor of Philosophy in Applied Mathematics March, 2013 UNIVERSITY OF CALIFORNIA, MERCED Graduate Division This is to certify that I have examined a copy of a dissertation by Nitesh Kumar and found it satisfactory in all respects, and that any and all revisions required by the examining committee have been made. Faculty Advisor: Harish S. Bhat Committee Members: Arnold D. Kim Roummel F. Marcia Applied Mathematics Graduate Studies Chair: Boaz Ilan Arnold D. Kim Date Contents 1 Introduction 2 1.1 Brief Review of the Option Pricing Problem and Models . ......... 2 2 Markov Tree: Discrete Model 6 2.1 Introduction.................................... 6 2.2 Motivation...................................... 7 2.3 PastWork....................................... 8 2.4 Order Estimation: Methodology . ...... 9 2.5 OrderEstimation:Results. ..... 13 2.6 MarkovTreeModel:Theory . 14 2.6.1 NoArbitrage.................................. 17 2.6.2 Implementation Notes. 18 2.7 TreeModel:Results................................ 18 2.7.1 Comparison of Model and Market Prices. 19 2.7.2 Comparison of Volatilities. 20 2.8 Conclusion ...................................... 21 3 Markov Tree: Continuous Model 25 3.1 Introduction.................................... 25 3.2 Markov Tree Generation and Computational Tractability . ............. 26 3.2.1 Persistentrandomwalk. 27 3.2.2 Number of states in a tree of fixed depth . ..... 28 3.2.3 Markov tree probability mass function . ....... 28 3.3 Continuous Approximation of the Markov Tree . ........ 30 3.3.1 Recursion................................... 30 3.3.2 Exact solution in Fourier space .
    [Show full text]
  • A Study of Hidden Markov Model
    University of Tennessee, Knoxville TRACE: Tennessee Research and Creative Exchange Masters Theses Graduate School 8-2004 A Study of Hidden Markov Model Yang Liu University of Tennessee - Knoxville Follow this and additional works at: https://trace.tennessee.edu/utk_gradthes Part of the Mathematics Commons Recommended Citation Liu, Yang, "A Study of Hidden Markov Model. " Master's Thesis, University of Tennessee, 2004. https://trace.tennessee.edu/utk_gradthes/2326 This Thesis is brought to you for free and open access by the Graduate School at TRACE: Tennessee Research and Creative Exchange. It has been accepted for inclusion in Masters Theses by an authorized administrator of TRACE: Tennessee Research and Creative Exchange. For more information, please contact [email protected]. To the Graduate Council: I am submitting herewith a thesis written by Yang Liu entitled "A Study of Hidden Markov Model." I have examined the final electronic copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the equirr ements for the degree of Master of Science, with a major in Mathematics. Jan Rosinski, Major Professor We have read this thesis and recommend its acceptance: Xia Chen, Balram Rajput Accepted for the Council: Carolyn R. Hodges Vice Provost and Dean of the Graduate School (Original signatures are on file with official studentecor r ds.) To the Graduate Council: I am submitting herewith a thesis written by Yang Liu entitled “A Study of Hidden Markov Model.” I have examined the final electronic copy of this thesis for form and content and recommend that it be accepted in partial fulfillment of the requirements for the degree of Master of Science, with a major in Mathematics.
    [Show full text]
  • Entropy Rate
    Lecture 6: Entropy Rate • Entropy rate H(X) • Random walk on graph Dr. Yao Xie, ECE587, Information Theory, Duke University Coin tossing versus poker • Toss a fair coin and see and sequence Head, Tail, Tail, Head ··· −nH(X) (x1; x2;:::; xn) ≈ 2 • Play card games with friend and see a sequence A | K r Q q J ♠ 10 | ··· (x1; x2;:::; xn) ≈? Dr. Yao Xie, ECE587, Information Theory, Duke University 1 How to model dependence: Markov chain • A stochastic process X1; X2; ··· – State fX1;:::; Xng, each state Xi 2 X – Next step only depends on the previous state p(xn+1jxn;:::; x1) = p(xn+1jxn): – Transition probability pi; j : the transition probability of i ! j P = j – p(xn+1) xn p(xn)p(xn+1 xn) – p(x1; x2; ··· ; xn) = p(x1)p(x2jx1) ··· p(xnjxn−1) Dr. Yao Xie, ECE587, Information Theory, Duke University 2 Hidden Markov model (HMM) • Used extensively in speech recognition, handwriting recognition, machine learning. • Markov process X1; X2;:::; Xn, unobservable • Observe a random process Y1; Y2;:::; Yn, such that Yi ∼ p(yijxi) • We can build a probability model Yn−1 Yn n n p(x ; y ) = p(x1) p(xi+1jxi) p(yijxi) i=1 i=1 Dr. Yao Xie, ECE587, Information Theory, Duke University 3 Time invariance Markov chain • A Markov chain is time invariant if the conditional probability p(xnjxn−1) does not depend on n p(Xn+1 = bjXn = a) = p(X2 = bjX1 = a); for all a; b 2 X • For this kind of Markov chain, define transition matrix 2 3 6 ··· 7 6P11 P1n7 = 6 ··· 7 P 46 57 Pn1 ··· Pnn Dr.
    [Show full text]
  • Markov Decision Process Example
    Markov Decision Process Example Clarifying Stanleigh vernalised consolingly while Matthus always converts his lev lynch belligerently, he crossbreeding so subcutaneously. andDemetris air-minded. missend Answerable his heisters and touch-types pilose Tuckie excessively flanged heror hortatively flagstaffs turfafter while Hannibal Orbadiah effulging scrap and some emendated augmenters ungraciously, symptomatically. elaborative The example when an occupancy grid example of a crucial role of dots stimulus consists of theories, if π is. Below illustrates the example based on the next section. Data structure to read? MDP Illustrated mdp schematic MDP Example S 11 12 13 21 23 31 32 33 41 42 43 A. Both models by comparison of teaching mdp example, a trial starts after responding, and examples published in this in this technique for finding approximate solutions. A simple GUI and algorithms to experiment with Markov Decision Process miromanninomarkov-decision-process-examples. Markov Decision Processes Princeton University Computer. Senior ai scientist passionate about getting it simply runs into the total catch games is a moving object and examples are. Markov decision processes c Vikram Krishnamurthy 2013 6 2 Application Examples 21 Finite state Markov Decision Processes MDP xk is a S state Markov. Real-life examples of Markov Decision Processes Cross. Due to solve in a system does not equalizing strategy? First condition this remarkable and continues to. Example achieving a state satisfying property P at minimal. Example 37 Recycling Robot MDP The recycling robot Example 33 can be turned into his simple example had an MDP by simplifying it and providing some more. How does Markov decision process work? In state values here are you can be modeled as if we first accumulator hits its solutions are.
    [Show full text]
  • 1. Markov Models
    1. Markov models 1.1 Markov-chain Let X be a random variable X = (X1,...,Xt) taking values in some set S = {s1, . , sN }. The sequence is Markov chain if it has the following properties: 1. Limited horizon: P (Xt+1 = sk|X1,...,Xt) = P (Xt+1 = sk|Xt) (1) 2. Time invariant (stationary): P (X2 = sk|X1 = sj) = P (Xt+1 = sk|Xt = sj), ∀t, k, j (2) The subsequent variables may be dependent. In the general (non-Markovian) case, a variable Xt may depend on the all previous variables X1,...,Xt−1. Transition matrix A Markov model can be represented as a set of states (observations) S, and a transition matrix A. For example, S = {a, e, i, h, t, p} and aij = P (Xt+1 = sj|Xt = si): ai,j a e i h t p a 0.6 0 0 0 0 0.4 e ...... i 0.6 0.4 h ...... t 0.3 0 0.3 0.4 0 0 p ...... s0 0.9 0.1 Initial state probabilitites πsi can be represented as transitions from an auxil- liary initial state s0. 2 Markov model as finite-state automaton • Finite-State Automaton (FSA): states and transitions. • Weighted or probabilistic automaton: each transition has a probability, and transitions leaving a state sum to one. • A Markov model can be represented as a FSA. Observations either in states or transitions. b 0.6 1.0 0.9 b / 0.6 c / 1.0 c / 0.9 0.4 c / 0.4 a c 0.1 a / 0.1 3 Probability of a sequence: Given a Markov model (set of states S and transition matrix A) and a sequence of states (observations) X, it is straight- forward to compute the probability of the sequence: P (X1 ...XT ) = P (X1)P (X2|X1)P (X3|X1X2) ...P (XT |X1 ...XT −(3)1) = P (X1)P (X2|X1)P (X3|X2) ...P (XT |XT −1) (4) T −1 Y = πX1 aXtXt+1 (5) t=1 where aij is the transition probability from state i to state j, and πX1 is the initial state probability.
    [Show full text]
  • Markov Chains and Hidden Markov Models
    COMP 182 Algorithmic Thinking Luay Nakhleh Markov Chains and Computer Science Hidden Markov Models Rice University ❖ What is p(01110000)? ❖ Assume: ❖ 8 independent Bernoulli trials with success probability of α? ❖ Answer: ❖ (1-α)5α3 ❖ However, what if the assumption of independence doesn’t hold? ❖ That is, what if the outcome in a Bernoulli trial depends on the outcomes of the trials the preceded it? Markov Chains ❖ Given a sequence of observations X1,X2,…,XT ❖ The basic idea behind a Markov chain (or, Markov model) is to assume that Xt captures all the relevant information for predicting the future. ❖ In this case: T p(X1X2 ...XT )=p(X1)p(X2 X1)p(X3 X2) p(XT XT 1)=p(X1) p(Xt Xt 1) | | ··· | − | − t=2 Y Markov Chains ❖ When Xt is discrete, so Xt∈{1,…,K}, the conditional distribution p(Xt|Xt-1) can be written as a K×K matrix, known as the transition matrix A, where Aij=p(Xt=j|Xt-1=i) is the probability of going from state i to state j. ❖ Each row of the matrix sums to one, so this is called a stochastic matrix. 590 Chapter 17. Markov and hidden Markov models Markov Chains 1 α 1 β − α − A11 A22 A33 1 2 A12 A23 1 2 3 β ❖ A finite-state Markov chain is equivalent(a) (b) Figure 17.1 State transition diagrams for some simple Markov chains. Left: a 2-state chain. Right: a to a stochastic automaton3-state left-to-right. chain. ❖ One way to represent aAstationary,finite-stateMarkovchainisequivalenttoa Markov chain is stochastic automaton.Itiscommon 590 to visualizeChapter such 17.
    [Show full text]
  • Notes on Markov Models for 16.410 and 16.413 1 Markov Chains
    Notes on Markov Models for 16.410 and 16.413 1 Markov Chains A Markov chain is described by the following: • a set of states S = fs1; s2; : : : sng • a set of transition probabilities T (si; sj ) = p(sj jsi) • an initial state s0 2 S The Markov Assumption The state at time t, st, depends only on the previous state st−1 and not the previous history. That is, p(stjst−1; st−2; st−3; s0) = p(stjst−1) (1) Things you might want to know about Markov chains: • Probability of being in state si at time t • Stationary distribution 2 Markov Decision Processes The extension of Markov chains to decision making A Markov decision process (MDP) is a model for deciding how to act in “an accessible, stochastic environment with a known transition model” (Russell & Norvig, pg 500.). A Markov decision process is described by the following: • a set of states S = fs1; s2; : : : sng • a set of actions A = fa1; a2; : : : ; amg • a set of transition probabilities T (si; a; sj ) = p(sjjsi; a) • a set of rewards R : S × A 7! < • a discount factor γ 2 [0; 1] • an initial state s0 2 S Things you might want to know about MDPs: • The optimal policy One way to compute the optimal policy: Define the optimal value function V (si) by the Bellman equation: jSj V (si) = max 0R(si; a) + γ p(sj jsi; a) · V (sj )1 (2) a Xj=1 @ A The value iteration algorithm using Bellman’s equation: 1 1. t = 0 0 2.
    [Show full text]
  • Stochastic Processes and Hidden Markov Models Introduction
    Stochastic processes and Hidden Markov Models Dr Mauro Delorenzi and Dr Frédéric Schütz Swiss Institute of Bioinformatics EMBnet course – Basel 23.3.2006 Introduction A mainstream topic in bioinformatics is the problem of sequence annotation: given a sequence of DNA/RNA or protein, we want to identify “interesting” elements Examples: – DNA/RNA: genes, promoters, splicing signals, segmentation of heterogeneous DNA, binding sites, etc – Proteins: coiled-coil domains, transmembrane domains, signal peptides, phosphorylation sites, etc – Generally: homologs, etc. “The challenge of annotating a complete eukaryotic genome: A case study in Drosophila melanogaster” – http://www.fruitfly.org/GASP1/tutorial/presentation/ EMBNET course Basel 23.3.2006 Sequence annotation The sequence of many of these interesting elements can be characterized statistically, so we are interested in modeling them. By modeling, we mean find statistical models than can: – Accurately describe the observed elements of provided sequences; – Accurately predict the presence of particular elements in new, unannotated, sequences; – If possible, be readily interpretable and provide some insight into the actual biological process involved (i.e. not a black box). EMBNET course Basel 23.3.2006 Example: heterogeneity of DNA sequences The nucleotide composition of segments of genomic DNA changes between different regions in a single organism – Example: coding regions in the human genome tend to be GC-rich. Modeling the differences between different homogeneous regions is interesting
    [Show full text]
  • Introduction to Stochastic Processes
    Introduction to Stochastic Processes Lothar Breuer Contents 1 Some general definitions 1 2 Markov Chains and Queues in Discrete Time 3 2.1 Definition .............................. 3 2.2 ClassificationofStates . 8 2.3 StationaryDistributions. 13 2.4 RestrictedMarkovChains . 20 2.5 ConditionsforPositiveRecurrence . 22 2.6 TheM/M/1queueindiscretetime . 24 3 Markov Processes on Discrete State Spaces 33 3.1 Definition .............................. 33 3.2 StationaryDistribution . 40 3.3 Firsthittingtimes .......................... 44 3.3.1 DefinitionandExamples . 45 3.3.2 ClosureProperties . 50 4 Renewal Theory 59 4.1 RenewalProcesses ......................... 59 4.2 RenewalFunctionandRenewalEquations . 62 4.3 RenewalTheorems ......................... 64 4.4 Residual Life Times and Stationary Renewal Processes . .... 67 5 Appendix 73 5.1 ConditionalExpectationsandProbabilities . .... 73 5.2 ExtensionTheorems . .. .. .. .. .. .. 76 5.2.1 Stochasticchains . 76 5.2.2 Stochasticprocesses . 77 i ii CONTENTS 5.3 Transforms ............................. 78 5.3.1 z–transforms ........................ 78 5.3.2 Laplace–Stieltjestransforms . 80 5.4 Gershgorin’scircletheorem. 81 CONTENTS iii Chapter 1 Some general definitions see notes under http://www.kent.ac.uk/IMS/personal/lb209/files/notes1.pdf 1 2 CHAPTER 1. SOME GENERAL DEFINITIONS Chapter 2 Markov Chains and Queues in Discrete Time 2.1 Definition Let X with n N denote random variables on a discrete space E. The sequence n ∈ 0 =(Xn : n N0) is called a stochastic chain. If P is a probability measure suchX that ∈ X P (X = j X = i ,...,X = i )= P (X = j X = i ) (2.1) n+1 | 0 0 n n n+1 | n n for all i0,...,in, j E and n N0, then the sequence shall be called a Markov chain on E.
    [Show full text]
  • Markov Decision Processes
    Lecture 2: Markov Decision Processes Lecture 2: Markov Decision Processes David Silver Lecture 2: Markov Decision Processes 1 Markov Processes 2 Markov Reward Processes 3 Markov Decision Processes 4 Extensions to MDPs Lecture 2: Markov Decision Processes Markov Processes Introduction Introduction to MDPs Markov decision processes formally describe an environment for reinforcement learning Where the environment is fully observable i.e. The current state completely characterises the process Almost all RL problems can be formalised as MDPs, e.g. Optimal control primarily deals with continuous MDPs Partially observable problems can be converted into MDPs Bandits are MDPs with one state Lecture 2: Markov Decision Processes Markov Processes Markov Property Markov Property \The future is independent of the past given the present" Definition A state St is Markov if and only if P [St+1 St ] = P [St+1 S1; :::; St ] j j The state captures all relevant information from the history Once the state is known, the history may be thrown away i.e. The state is a sufficient statistic of the future Lecture 2: Markov Decision Processes Markov Processes Markov Property State Transition Matrix For a Markov state s and successor state s0, the state transition probability is defined by 0 ss0 = P St+1 = s St = s P j State transition matrix defines transition probabilities from all states s to all successorP states s0, to 2 3 11 ::: 1n P. P = from 6 . 7 P 4 5 n1 ::: nn P P where each row of the matrix sums to 1. Lecture 2: Markov Decision Processes Markov Processes Markov Chains Markov Process A Markov process is a memoryless random process, i.e.
    [Show full text]
  • Stochastic Processes and Markov Chains (Part I)
    Stochastic processes and Markov chains (part I) Wessel van Wieringen w. n. van. wieringen@vu. nl Department of Epidemiology and Biostatistics, VUmc & Department of Mathematics , VU University Amsterdam, The Netherlands Stochastic processes Stochastic processes ExExampleample 1 • The intensity of the sun. • Measured every day by the KNMI. • Stochastic variable Xt represents the sun’s intensity at day t, 0 ≤ t ≤ T. Hence, Xt assumes values in R+ (positive values only). Stochastic processes ExExampleample 2 • DNA sequence of 11 bases long. • At each base position there is an A, C, G or T. • Stochastic variable Xi is the base at position i, i = 1,…,11. • In case the sequence has been observed, say: (x1, x2, …, x11) = ACCCGATAGCT, then A is the realization of X1, C that of X2, et cetera. Stochastic processes position … t t+1 t+2 t+3 … base … A A T C … … A A A A … … G G G G … … C C C C … … T T T T … position position position position t t+1 t+2 t+3 Stochastic processes ExExampleample 3 • A patient’s heart pulse during surgery. • Measured continuously during interval [0, T]. • Stochastic variable Xt represents the occurrence of a heartbeat at time t, 0 ≤ t ≤ T. Hence, Xt assumes only the values 0 (no heartbeat) and 1 (heartbeat). Stochastic processes Example 4 • Brain activity of a human under experimental conditions. • Measured continuously during interval [0, T]. • Stochastic variable Xt represents the magnetic field at time t, 0 ≤ t ≤ T. Hence, Xt assumes values on R. Stochastic processes Differences between examples Xt Discrete Continuous e tt Example 2 Example 1 iscre DD Time s Example 3 Example 4 tinuou nn Co Stochastic processes The state space S is the collection of all possible values that the random variables of the stochastic process may assume.
    [Show full text]
  • A Two-Stage Approach for Estimating the Effect of Dna Methylation on Differential Expression Using Tiling Array Technology
    Kansas State University Libraries New Prairie Press Conference on Applied Statistics in Agriculture 2008 - 20th Annual Conference Proceedings A TWO-STAGE APPROACH FOR ESTIMATING THE EFFECT OF DNA METHYLATION ON DIFFERENTIAL EXPRESSION USING TILING ARRAY TECHNOLOGY Suk-Young Yoo R. W. Doerge Follow this and additional works at: https://newprairiepress.org/agstatconference Part of the Agriculture Commons, and the Applied Statistics Commons This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 4.0 License. Recommended Citation Yoo, Suk-Young and W., R. Doerge (2008). "A TWO-STAGE APPROACH FOR ESTIMATING THE EFFECT OF DNA METHYLATION ON DIFFERENTIAL EXPRESSION USING TILING ARRAY TECHNOLOGY," Conference on Applied Statistics in Agriculture. https://doi.org/10.4148/2475-7772.1101 This is brought to you for free and open access by the Conferences at New Prairie Press. It has been accepted for inclusion in Conference on Applied Statistics in Agriculture by an authorized administrator of New Prairie Press. For more information, please contact [email protected]. Conference on Applied Statistics in Agriculture Kansas State University A TWO-STAGE APPROACH FOR ESTIMATING THE EFFECT OF DNA METHYLATION ON DIFFERENTIAL EXPRESSION USING TILING ARRAY TECHNOLOGY Suk-Young Yoo and R.W. Doerge Department of Statistics Purdue University 150 North University Street West Lafayette, IN 47907 USA Abstract Epigenetics is the study of heritable alterations in gene function without changing the DNA sequence itself. It is known that epigenetic modifications such as DNA methylation and histone modifications are highly correlated with the regulation of gene expression. A two- stage analysis is proposed that employs a hidden Markov model and a linear model to evaluate differential expression as related to DNA methylation for the purpose of examining the effects of DNA methylation on gene regulation using tiling array technology.
    [Show full text]