Prediction-Constrained Hidden Markov Models for Semi-Supervised Classification

Total Page:16

File Type:pdf, Size:1020Kb

Prediction-Constrained Hidden Markov Models for Semi-Supervised Classification Prediction-Constrained Hidden Markov Models for Semi-Supervised Classification Gabriel Hope 1 Michael C. Hughes 2 Finale Doshi-Velez 3 Erik B. Sudderth 1 Abstract We consider the broad family of hidden Markov models (HMMs), for which a wide range of training methods We develop a new framework for training hid- have been previously proposed. Unsupervised learning for den Markov models that balances generative and HMMs such as the EM algorithm [27] do not use task labels discriminative goals. Our approach requires y to inform the learned state structure. They thus may have likelihood-based or Bayesian learning to meet poor predictive power. In contrast, discriminative methods task-specific prediction quality constraints, pre- for training HMMs [5; 32; 12] have been widely used for venting model misspecification from leading to problems like speech recognition to learn states informed by poor subsequent predictions. When users specify labels y. These methods have several shortcomings, includ- appropriate loss functions to constrain predictions, ing restrictions on the loss function used for label prediction, our approach can enhance semi-supervised learn- and a failure to allow users to select a task-specific tradeoff ing when labeled sequences are rare and boost ac- between generative and discriminative performance. See curacy when data has unbalanced labels.Via auto- App.B for detailed related work discussion. matic differentiation we backpropagate gradients through dynamic programming computation of This paper develops a framework for training prediction con- the marginal likelihood, making training feasible strained hidden Markov models (PC-HMMs) that minimize without auxiliary bounds or approximations. Our application-motivated loss functions in the prediction of task approach is effective for human activity modeling labels y, while simultaneously learning good quality genera- and healthcare intervention forecasting, deliver- tive models of the raw sequence data x. We demonstrate that ing accuracies competitive with well-tuned neural PC-HMM training leads to prediction performance competi- networks for fully labeled data, and substantially tive with modern recurrent network architectures on fully- better for partially labeled data. Simultaneously, labeled corpora, and excels over these baselines when given our learned generative model illuminates the dy- semi-supervised corpora where labels are rare. At the same namical states driving predictions. time, PC-HMM training also learns latent states that enable exploratory interpretation, visualization, and simulation. 1 Introduction 2 Hidden Markov Models for Prediction We develop broadly applicable methods for learning models In this section, we review HMMs. We consider a dataset of of data sequences x, some of which are annotated with task- N sequences xn, some of which may have labels yn. Each sequence x has T timesteps: x = [x ; x : : : x ]. y n n n n1 n2 n;Tn specific labels . For example, in human activity tasks [29], D x might be body motion data captured by an accelerometer, At each timestep t we observe a vector xnt 2 R . and y might be activity labels (walking, swimming, etc). We 2.1 Hidden Markov Models seek good predictions of labels y from data x, while simul- Standard unsupervised HMMs [27] assume that the N ob- taneously learning a good model of x itself that is informed served sequences are generated by a common model with by task labels y. Strong generative models are valuable K hidden, discrete states. The sequences are generated because they allow analysts to visualize recovered patterns by drawing a sequence of per-timestep state assignments and query for missing data. Moreover, our approach enables z = [z ; z : : : z ] from a Markov process, and then semi-supervised time-series learning from data where only n n1 n2 nTn drawing observed data x conditioned on these assignments. a small subset of data sequences have labels. n Specifically, we generate the sequence zn as 1UC Irvine 2Tufts University 3Harvard University . Correspon- zn1 ∼ Cat(π0); znt ∼ Cat(πznt−1 ); dence to: Gabriel Hope <[email protected]>. K where π0 is the initial state distribution, and π = fπkgk=0 Accepted Paper at the Time Series Workshop at ICML 2021. Copy- denotes the probabilities of transitioning from state j to state right 2021 by the authors. k: πjk = p(zt = k j zt−1 = j). Next the observations xn Prediction-Constrained Hidden Markov Models for Semi-Supervised Classification are sampled such that the distribution of xnt depends only find that a prediction function incorporating a convolutional on znt. In this work we consider Gaussian emissions with a transformation of the belief sequence, followed by local state-specific mean and covariance, max-pooling, leads to improved accuracy. The convolu- p(xnt j znt = k; φk = fµk; Σkg) = N (xnt j µk; Σk) tional structure allows for predictions to depend on belief as well as a first-order autoregressive Gaussian emissions: patterns that span several time-steps, while localized pool- ing preserves high-level temporal structures while removing p(x j x ; z = k; φ = fA ; µ ; Σ g) nt nt−1 nt k k k k some sensitivity to the temporal alignment of sequences. = N (xnt j Akxnt−1 + µk; Σk): Event detection. In other applications, we seek to densely To simplify notation, we assume that there exists an unmod- label the events occurring at each timestep of a sequence, eled observation x at time zero. n0 such as the prediction of medical events from hourly obser- Given the observed data xn, we can efficiently com- vations of patients in a hospital. To predict the label ynt pute posterior marginal probabilities, or beliefs, for the at time t, we use the beliefs btk at times twst: : twend in a latent states zn via the belief propagation or forward- window around t as features for a regression or classification backward algorithm [27]. We denote these probabilities model with parameters η: by btk(xn; π; φ) p(znt = k j xn; π; φ), and note that , y^nt , y^t(xn; π; φ, η) = f(btw :tw (xn; π; φ); η): (4) these beliefs are a deterministic function of x (which will st: end n Here f(·; η) could either be a generalized linear model based be important for our end-to-end optimization) with com- on the average state frequencies in the local time window, putational cost O(T K2). The probabilities b (x ; π; φ) n tk n or a more complicated non-linear model as discussed for take into account the full sequence xn, including future 0 sequence classification. timesteps xnt0 , t > t. In some applications, predictions must be made only on the data up until time t. These be- Finally, we note that many prediction tasks are offline: the ! liefs btk (xn; π; φ) , p(znt = k j xn1; : : : ; xnt; π; φ) are prediction is needed for post-hoc analysis and thus can in- computed by the forward pass of belief propagation. corporate information from the full data sequence. When a prediction needs to happen online, for example in forecast- 2.2 Predicting Labels from Beliefs ing applications, we instead use as regression features only Now we consider the prediction of labels y given data x. ! the forward-belief representation btk . Because they capture uncertainty in the hidden states zn, the beliefs btk are succinct (and computationally efficient) 3 Learning via Prediction Constraints summary statistics for the data. We use beliefs as features We now develop an overall optimization objective and al- for the prediction of labels yn from data xn in two scenarios: gorithm to estimate parameters π; φ, η given (possibly par- per-sequence classification, where the entire sequence n has tially) labeled training data. Our goal is to jointly learn how a single binary or categorical label y , and per-timestep n to model the sequence data xn (via estimated transition pa- classification, where each timestep has its own event label. rameters π and emission parameters φ) and how to predict Sequence classification. In the sequence classification the labels yn (via estimated regression parameters η), so scenario, we seek to assign a scalar label yn to the entire that predictions may be high quality even when the model sequence. Below, we provide two classification functions is misspecified. given the beliefs. In the first, we use the average amount of Suppose we observe a set L of labeled sequences time spent in each state as our feature: NL fxn; yngn=1 together with a set U of unlabeled sequences ¯ 1 PTn N b(xn; π; φ) bt(xn; π; φ); (1) U , Tn t=1 fxngn=1. Our goal is to both explain the sequences xn, by T ¯ achieving a high p(x j π; φ), while also making predictions y^n , y^(xn; π; φ, η) = f(η b(xn; π; φ)): (2) y^ that minimize some task-specific loss (e.g., hinge loss where b (x ; π; φ) = p(z j x ; π; φ), η is a vector of n t n nt n or logistic loss). Our prediction-constrained (PC) training regression coefficients, and f(·) is an appropriate link func- objective for both sequence classification scenarios is: tion (e.g., a logistic f(w) = 1=(1 + e−w) for binary labels X or a softmax for categorical labels). min − log p(xn j π; φ) − log p(π; φ); (5) π,φ,η n2L[U For some tasks it may be beneficial to use a more flexible, P y ; y^(x ; π; φ, η) ≤ . non-linear model to predict labels from belief states. In these subject to: n2L loss n n cases, we can replace the linear model based on averaged Here, p(π; φ) is an optional prior density to regularize pa- belief states in Eq. (2) with a general function that takes in rameters.
Recommended publications
  • The Distribution of Local Times of a Brownian Bridge
    The distribution of lo cal times of a Brownian bridge by Jim Pitman Technical Rep ort No. 539 Department of Statistics University of California 367 Evans Hall 3860 Berkeley, CA 94720-3860 Nov. 3, 1998 1 The distribution of lo cal times of a Brownian bridge Jim Pitman Department of Statistics, University of California, 367 Evans Hall 3860, Berkeley, CA 94720-3860, USA [email protected] 1 Intro duction x Let L ;t 0;x 2 R denote the jointly continuous pro cess of lo cal times of a standard t one-dimensional Brownian motion B ;t 0 started at B = 0, as determined bythe t 0 o ccupation density formula [19 ] Z Z t 1 x dx f B ds = f xL s t 0 1 for all non-negative Borel functions f . Boro din [7, p. 6] used the metho d of Feynman- x Kac to obtain the following description of the joint distribution of L and B for arbitrary 1 1 xed x 2 R: for y>0 and b 2 R 1 1 2 x jxj+jbxj+y 2 p P L 2 dy ; B 2 db= jxj + jb xj + y e dy db: 1 1 1 2 This formula, and features of the lo cal time pro cess of a Brownian bridge describ ed in the rest of this intro duction, are also implicitinRay's description of the jointlawof x ;x 2 RandB for T an exp onentially distributed random time indep endentofthe L T T Brownian motion [18, 22 , 6 ]. See [14, 17 ] for various characterizations of the lo cal time pro cesses of Brownian bridge and Brownian excursion, and further references.
    [Show full text]
  • Probabilistic and Geometric Methods in Last Passage Percolation
    Probabilistic and geometric methods in last passage percolation by Milind Hegde A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Mathematics in the Graduate Division of the University of California, Berkeley Committee in charge: Professor Alan Hammond, Co‐chair Professor Shirshendu Ganguly, Co‐chair Professor Fraydoun Rezakhanlou Professor Alistair Sinclair Spring 2021 Probabilistic and geometric methods in last passage percolation Copyright 2021 by Milind Hegde 1 Abstract Probabilistic and geometric methods in last passage percolation by Milind Hegde Doctor of Philosophy in Mathematics University of California, Berkeley Professor Alan Hammond, Co‐chair Professor Shirshendu Ganguly, Co‐chair Last passage percolation (LPP) refers to a broad class of models thought to lie within the Kardar‐ Parisi‐Zhang universality class of one‐dimensional stochastic growth models. In LPP models, there is a planar random noise environment through which directed paths travel; paths are as‐ signed a weight based on their journey through the environment, usually by, in some sense, integrating the noise over the path. For given points y and x, the weight from y to x is defined by maximizing the weight over all paths from y to x. A path which achieves the maximum weight is called a geodesic. A few last passage percolation models are exactly solvable, i.e., possess what is called integrable structure. This gives rise to valuable information, such as explicit probabilistic resampling prop‐ erties, distributional convergence information, or one point tail bounds of the weight profile as the starting point y is fixed and the ending point x varies.
    [Show full text]
  • Derivatives of Self-Intersection Local Times
    Derivatives of self-intersection local times Jay Rosen? Department of Mathematics College of Staten Island, CUNY Staten Island, NY 10314 e-mail: [email protected] Summary. We show that the renormalized self-intersection local time γt(x) for both the Brownian motion and symmetric stable process in R1 is differentiable in 0 the spatial variable and that γt(0) can be characterized as the continuous process of zero quadratic variation in the decomposition of a natural Dirichlet process. This Dirichlet process is the potential of a random Schwartz distribution. Analogous results for fractional derivatives of self-intersection local times in R1 and R2 are also discussed. 1 Introduction In their study of the intrinsic Brownian local time sheet and stochastic area integrals for Brownian motion, [14, 15, 16], Rogers and Walsh were led to analyze the functional Z t A(t, Bt) = 1[0,∞)(Bt − Bs) ds (1) 0 where Bt is a 1-dimensional Brownian motion. They showed that A(t, Bt) is not a semimartingale, and in fact showed that Z t Bs A(t, Bt) − Ls dBs (2) 0 x has finite non-zero 4/3-variation. Here Ls is the local time at x, which is x R s formally Ls = 0 δ(Br − x) dr, where δ(x) is Dirac’s ‘δ-function’. A formal d d2 0 application of Ito’s lemma, using dx 1[0,∞)(x) = δ(x) and dx2 1[0,∞)(x) = δ (x), yields Z t 1 Z t Z s Bs 0 A(t, Bt) − Ls dBs = t + δ (Bs − Br) dr ds (3) 0 2 0 0 ? This research was supported, in part, by grants from the National Science Foun- dation and PSC-CUNY.
    [Show full text]
  • An Excursion Approach to Ray-Knight Theorems for Perturbed Brownian Motion
    stochastic processes and their applications ELSEVIER Stochastic Processes and their Applications 63 (1996) 67 74 An excursion approach to Ray-Knight theorems for perturbed Brownian motion Mihael Perman* L"nil,ersi O, q/ (~mt)ridge, Statistical Laboratory. 16 Mill Lane. Cambridge CB2 15B. ~ 'h" Received April 1995: levised November 1995 Abstract Perturbed Brownian motion in this paper is defined as X, = IB, I - p/~ where B is standard Brownian motion, (/t: t >~ 0) is its local time at 0 and H is a positive constant. Carmona et al. (1994) have extended the classical second Ray --Knight theorem about the local time processes in the space variable taken at an inverse local time to perturbed Brownian motion with the resulting Bessel square processes having dimensions depending on H. In this paper a proof based on splitting the path of perturbed Brownian motion at its minimum is presented. The derivation relies mostly on excursion theory arguments. K<rwordx: Excursion theory; Local times: Perturbed Brownian motion; Ray Knight theorems: Path transfl)rmations 1. Introduction The classical Ray-Knight theorems assert that the local time of Brownian motion as a process in the space variable taken at appropriate stopping time can be shown to be a Bessel square process of a certain dimension. There are many proofs of these theorems of which perhaps the ones based on excursion theory are the simplest ones. There is a substantial literature on Ray Knight theorems and their generalisations; see McKean (1975) or Walsh (1978) for a survey. Carmona et al. (1994) have extended the second Ray Knight theorem to perturbed BM which is defined as X, = IB, I - Iz/, where B is standard BM (/,: I ~> 0) is its local time at 0 and tl is an arbitrary positive constant.
    [Show full text]
  • Stochastic Analysis in Continuous Time
    Stochastic Analysis in Continuous Time Stochastic basis with the usual assumptions: F = (Ω; F; fFtgt≥0; P), where Ω is the probability space (a set of elementary outcomes !); F is the sigma-algebra of events (sub-sets of Ω that can be measured by P); fFtgt≥0 is the filtration: Ft is the sigma-algebra of events that happened by time t so that Fs ⊆ Ft for s < t; P is probability: finite non-negative countably additive measure on the (Ω; F) normalized to one (P(Ω) = 1). The usual assumptions are about the filtration: F0 is P-complete, that is, if A 2 F0 and P(A) = 0; then every sub-set of A is in F0 [this fF g minimises the technicalT difficulties related to null sets and is not hard to achieve]; and t t≥0 is right-continuous, that is, Ft = Ft+", t ≥ 0 [this simplifies some further technical developments, like stopping time vs. optional ">0 time and might be hard to achieve, but is assumed nonetheless.1] Stopping time τ on F is a random variable with values in [0; +1] and such that, for every t ≥ 0, the setTf! : 2 τ(!) ≤ tg is in Ft. The sigma algebra Fτ consists of all the events A from F such that, for every t ≥ 0, A f! : τ(!) ≤ tg 2 Ft. In particular, non-random τ = T ≥ 0 is a stopping time and Fτ = FT . For a stopping time τ, intervals such as (0; τ) denote a random set f(!; t) : 0 < t < τ(!)g. A random/stochastic process X = X(t) = X(!; t), t ≥ 0, is a collection of random variables3.
    [Show full text]
  • A Representation for Functionals of Superprocesses Via Particle Picture Raisa E
    View metadata, citation and similar papers at core.ac.uk brought to you by CORE provided by Elsevier - Publisher Connector stochastic processes and their applications ELSEVIER Stochastic Processes and their Applications 64 (1996) 173-186 A representation for functionals of superprocesses via particle picture Raisa E. Feldman *,‘, Srikanth K. Iyer ‘,* Department of Statistics and Applied Probability, University oj’ Cull~ornia at Santa Barbara, CA 93/06-31/O, USA, and Department of Industrial Enyineeriny and Management. Technion - Israel Institute of’ Technology, Israel Received October 1995; revised May 1996 Abstract A superprocess is a measure valued process arising as the limiting density of an infinite col- lection of particles undergoing branching and diffusion. It can also be defined as a measure valued Markov process with a specified semigroup. Using the latter definition and explicit mo- ment calculations, Dynkin (1988) built multiple integrals for the superprocess. We show that the multiple integrals of the superprocess defined by Dynkin arise as weak limits of linear additive functionals built on the particle system. Keywords: Superprocesses; Additive functionals; Particle system; Multiple integrals; Intersection local time AMS c.lassijication: primary 60555; 60H05; 60580; secondary 60F05; 6OG57; 60Fl7 1. Introduction We study a class of functionals of a superprocess that can be identified as the multiple Wiener-It6 integrals of this measure valued process. More precisely, our objective is to study these via the underlying branching particle system, thus providing means of thinking about the functionals in terms of more simple, intuitive and visualizable objects. This way of looking at multiple integrals of a stochastic process has its roots in the previous work of Adler and Epstein (=Feldman) (1987), Epstein (1989), Adler ( 1989) Feldman and Rachev (1993), Feldman and Krishnakumar ( 1994).
    [Show full text]
  • Long Time Behaviour and Mean-Field Limit of Atlas Models Julien Reygner
    Long time behaviour and mean-field limit of Atlas models Julien Reygner To cite this version: Julien Reygner. Long time behaviour and mean-field limit of Atlas models. ESAIM: Proceedings and Surveys, EDP Sciences, 2017, Journées MAS 2016 de la SMAI – Phénomènes complexes et hétérogènes, 60, pp.132-143. 10.1051/proc/201760132. hal-01525360 HAL Id: hal-01525360 https://hal.archives-ouvertes.fr/hal-01525360 Submitted on 19 May 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Long time behaviour and mean-field limit of Atlas models Julien Reygner ABSTRACT. This article reviews a few basic features of systems of one-dimensional diffusions with rank-based characteristics. Such systems arise in particular in the modelling of financial mar- kets, where they go by the name of Atlas models. We mostly describe their long time and large scale behaviour, and lay a particular emphasis on the case of mean-field interactions. We finally present an application of the reviewed results to the modelling of capital distribution in systems with a large number of agents. 1. Introduction The term Atlas model was originally introduced in Fernholz’ monograph on Stochastic Portfolio Theory [13] to describe a stock market in which the asset prices evolve according to independent and identically distributed processes, except the smallest one which undergoes an additional up- ward push.
    [Show full text]
  • Gaussian Processes and the Local Times of Symmetric Lévy
    GAUSSIAN PROCESSES AND THE LOCAL TIMES OF SYMMETRIC LEVY´ PROCESSES MICHAEL B. MARCUS AND JAY ROSEN CITY COLLEGE AND COLLEGE OF STATEN ISLAND Abstract. We give a relatively simple proof of the necessary and sufficient condition for the joint continuity of the local times of symmetric L´evypro- cesses. This result was obtained in 1988 by M. Barlow and J. Hawkes without requiring that the L´evyprocesses be symmetric. In 1992 the authors used a very different approach to obtain necessary and sufficient condition for the joint continuity of the local times of strongly symmetric Markov processes, which includes symmetric L´evyprocesses. Both the 1988 proof and the 1992 proof are long and difficult. In this paper the 1992 proof is significantly simplified. This is accomplished by using two recent isomorphism theorems, which relate the local times of strongly symmetric Markov processes to certain Gaussian processes, one due to N. Eisenbaum alone and the other to N. Eisenbaum, H. Kaspi, M.B. Marcus, J. Rosen and Z. Shi. Simple proofs of these isomorphism theorems are given in this paper. 1. Introduction We give a relatively simple proof of the necessary and sufficient condition for the joint continuity of the local times of symmetric L´evyprocesses. Let X = {X(t), t ∈ R+} be a symmetric L´evyprocess with values in R and characteristic function (1.1) EeiξX(t) = e−tψ(ξ). x x Let Lt denote the local time of X at x ∈ R. Heuristically, Lt is the amount of time that the process spends at x, up to time t.
    [Show full text]
  • The Distribution of Local Times of a Brownian Bridge Séminaire De Probabilités (Strasbourg), Tome 33 (1999), P
    SÉMINAIRE DE PROBABILITÉS (STRASBOURG) JIM PITMAN The distribution of local times of a brownian bridge Séminaire de probabilités (Strasbourg), tome 33 (1999), p. 388-394 <http://www.numdam.org/item?id=SPS_1999__33__388_0> © Springer-Verlag, Berlin Heidelberg New York, 1999, tous droits réservés. L’accès aux archives du séminaire de probabilités (Strasbourg) (http://portail. mathdoc.fr/SemProba/) implique l’accord avec les conditions générales d’utili- sation (http://www.numdam.org/conditions). Toute utilisation commerciale ou im- pression systématique est constitutive d’une infraction pénale. Toute copie ou im- pression de ce fichier doit contenir la présente mention de copyright. Article numérisé dans le cadre du programme Numérisation de documents anciens mathématiques http://www.numdam.org/ The distribution of local times of a Brownian bridge Jim Pitman 1 Introduction Let ( Lf , , t > 0, x E R) denote the jointly continuous process of local times of a standard one-dimensional Brownian motion (Bt, t > 0) started at Bo = 0, as determined by the occupation density formula [20] dx JO J-oot0f(Bs)ds=~f(x)Lxt for all non-negative Borel functions f. Borodin [7, p. 6] used the method of Feynman- Kac to obtain the following description of the joint distribution of Lf and Bl for arbitrary fixed x E R: for y > 0 and b E R P(Lx1 ~ dy, B1 ~ db) = 1 203C0(|x | + |b - x| + y) e-1 2(|x|+b-x|+y)2 dy db. ( 1) This formula, and features of the local time process of a Brownian bridge described in the rest of this introduction, are also implicit in Ray’s description of the joint law of of the (.LT, :r E R) and BT for T an exponentially distributed random time independent Brownian motion [19, 23, 6].
    [Show full text]
  • Continuity of Local Time: an Applied Perspective Arxiv:1503.04660V1
    Continuity of Local Time: An applied perspective Jorge M. Ramirez ,∗ Enirque A. Thomann yand Edward C. Waymire z September 28, 2018 Abstract Continuity of local time for Brownian motion ranks among the most notable mathematical results in the theory of stochastic processes. This article addresses its implications from the point of view of applications. In particular an extension of previous results on an explicit role of continuity of (natural) local time is obtained for applications to recent classes of problems in physics, biology and finance involving discontinuities in a dispersion coefficient. The main theorem and its corollary provide physical principles that relate macro scale continuity of deterministic quantities to micro scale continuity of the (stochastic) local time. 1 Introduction Quoting from the backcover of the intriguing recent monograph Barndorff-Nielsen and Shiriaev (2010): Random change of time is key to understanding the nature of various stochastic processes and gives rise to interesting mathematical results and insights of importance for the modelling and interpretation of empirically arXiv:1503.04660v1 [math.PR] 16 Mar 2015 observed dynamic processes This point could hardly have been more aptly made with regard to the perspective of the present paper. The focus of this paper is on identifying the way in which continuity/discontinuity properties of certain local times of a diffusive Markov process inform interfacial dis- continuities in large scale concentration, diffusion coefficients, and transmission rates. For these purposes one may ignore processes with drift, and focus on discontinuities in diffusion rates and/or specific rate coefficients. This builds on related work of ∗Universidad Nacional de Colombia, Sede Medellin, [email protected] yOregon State University, [email protected] zOregon State University, [email protected] 1 the authors where the focus was on interfacial effects on other functionals such as occupation time and first passage times; see Ramirez et al.
    [Show full text]
  • Hidden Markov Model with Information Criteria Clustering and Extreme Learning Machine Regression for Wind Forecasting
    Missouri University of Science and Technology Scholars' Mine Electrical and Computer Engineering Faculty Research & Creative Works Electrical and Computer Engineering 01 Jan 2014 Hidden Markov Model with Information Criteria Clustering and Extreme Learning Machine Regression for Wind Forecasting Dao Lam Shuhui Li Donald C. Wunsch Missouri University of Science and Technology, [email protected] Follow this and additional works at: https://scholarsmine.mst.edu/ele_comeng_facwork Part of the Electrical and Computer Engineering Commons, and the Numerical Analysis and Scientific Computing Commons Recommended Citation D. Lam et al., "Hidden Markov Model with Information Criteria Clustering and Extreme Learning Machine Regression for Wind Forecasting," Journal of Computer Science and Cybernetics, vol. 30, no. 4, pp. 361-376, Vietnam Academy of Science and Technology, Jan 2014. The definitive version is available at https://doi.org/10.15625/1813-9663/30/4/5510 This Article - Journal is brought to you for free and open access by Scholars' Mine. It has been accepted for inclusion in Electrical and Computer Engineering Faculty Research & Creative Works by an authorized administrator of Scholars' Mine. This work is protected by U. S. Copyright Law. Unauthorized use including reproduction for redistribution requires the permission of the copyright holder. For more information, please contact [email protected]. Journal of Computer Science and Cybernetics, V.30, N.4 (2014), 361{376 DOI: 10.15625/1813-9663/30/4/5510 HIDDEN MARKOV MODEL WITH INFORMATION CRITERIA CLUSTERING AND EXTREME LEARNING MACHINE REGRESSION FOR WIND FORECASTING DAO LAM1, SHUHUI LI2, AND DONALD WUNSCH1 1Department of Electrical & Computer Engineering, Missouri University of Science & Technology; dlmg4,[email protected] 2Department of Electrical & Computer Engineering, The University of Alabama; [email protected] Abstract.
    [Show full text]
  • On the Path Structure of a Semimartingale Arisingfrom
    Annales de l’Institut Henri Poincar´e- Probabilit´es et Statistiques 2008, Vol. 44, No. 2, 258–279 DOI: 10.1214/07-AIHP116 c Association des Publications de l’Institut Henri Poincar´e, 2008 www.imstat.org/aihp On the path structure of a semimartingale arising from monotone probability theory Alexander C. R. Belton1 School of Mathematics, Applied Mathematics and Statistics, University College, Cork, Ireland. Received 26 June 2006; revised 7 October 2006; accepted 5 November 2006 Abstract. Let X be the unique normal martingale such that X0 =0 and d[X]t = (1 − t − Xt−)dXt +dt and let Yt := Xt + t for all t ≥ 0; the semimartingale Y arises in quantum probability, where it is the monotone- independent analogue of the Poisson process. The trajectories of Y are examined and various probabilistic properties are derived; in particular, the level set {t ≥ 0: Yt = 1} is shown to be non-empty, compact, perfect and of zero Lebesgue measure. The local times of Y are found to be trivial except for that at level 1; consequently, the jumps of Y are not locally summable. R´esum´e. Soit X l’unique martingale normale telle que X0 =0 et d[X]t = (1 − t − Xt−)dXt +dt et soit Yt := Xt + t pour tout t ≥ 0; la semimartingale Y se manifeste dans la th´eorie des probabilit´es quantiques, o`uc’est analogue du processus de Poisson pour l’ind´ependance monotone. Les trajectoires de Y sont examin´ees et diverses propri´et´es probabilistes sont d´eduites; en particulier, l’ensemble de niveau {t ≥ 0: Yt = 1} est montr´eˆetre non vide, compact, parfait et de mesure de Lebesgue nulle.
    [Show full text]