[Stochastic Process Priors for Dynamical Systems]

[ Emily B. Fox, Erik B. Sudderth, Michael I. Jordan, and Alan S. Willsky] [Stochastic process priors for dynamical systems] arkov switching pro- INTRODUCTION cesses, such as the A core problem in statistical signal pro- hidden Markov model cessing is the partitioning of temporal (HMM) and switching data into segments, each of which per- linear dynamical sys- mits a relatively simple statistical Mtem (SLDS), are often used to describe description. This segmentation problem rich dynamical phenomena. They arises in a variety of applications, in describe complex behavior via repeat- areas as diverse as speech recognition, ed returns to a set of simpler models: computational biology, natural language imagine a person alternating between processing, finance, and cryptanalysis. walking, running, and jumping behav- While in some cases the problem is iors, or a stock index switching merely that of detecting temporal between regimes of high and low vola- changes (in which case the problem can tility. Classical approaches to identifi- be viewed as one of changepoint detec- cation and estimation of these models tion), in many other cases the temporal assume a fixed, prespecified number of segments have a natural meaning in the dynamical models. We instead exam- domain and the problem is to recognize ine Bayesian nonparametric approach- recurrence of a meaningful entity (e.g., a © PHOTODISC es that define a prior on Markov particular speaker, a gene, a part of switching processes with an unbound- speech, or a market condition). This ed number of potential model parameters (i.e., Markov leads naturally to state-space models, where the entities that are modes). By leveraging stochastic processes such as the beta to be detected are encoded in the state. and Dirichlet process (DP), these methods allow the data to The classical example of a state-space model for segmenta- drive the complexity of the learned model, while still permit- tion is the HMM [1]. The HMM is based on a discrete state vari- ting efficient inference algorithms. They also lead to general- able and on the probabilistic assertions that the state transitions izations that discover and model dynamical behaviors shared are Markovian and that the observations are conditionally inde- among multiple related time series. pendent given the state. In this model, temporal segments are equated with states, a natural assumption in some problems but a limiting assumption in many others. Consider, for example, Digital Object Identifier 10.1109/MSP.2010.937999 the dance of a honey bee as it switches between a set of turn 1053-5888/10/$26.00©2010IEEE IEEE SIGNAL PROCESSING MAGAZINE [43] NOVEMBER 2010 right, turn left, and waggle danc- WE NEED TO MOVE BEYOND THE compare state spaces of differ- es. It is natural to view these ent cardinality and it is not pos- SIMPLE DISCRETE MARKOV CHAIN dances as the temporal segments, sible to use the state space to as they “permit a relatively sim- AS A DESCRIPTION OF TEMPORAL encode a notion of similarity ple statistical description.” But SEGMENTATION. between modes. More broadly, each dance is not a set of condi- many problems involve a collec- tionally independent draws from a fixed distribution as required tion of state-space models (either HMMs or Markov switching by the HMM—there are additional dynamics to account for. processes), and within the classical framework there is no Moreover, it is natural to model these dynamics using continu- natural way to talk about overlap between models. A particu- ous variables. These desiderata can be accommodated within the lar instance of this problem arises when there are multiple richer framework of Markov switching processes, specific exam- time series, and where we wish to use overlapping subsets of ples of which include the switching vector autoregressive (VAR) the modes to describe the different time series. In the section process and the SLDS. These models distinguish between the “Multiple Related Time Series,” we discuss a concrete example discrete component of the state, which is referred to as a of this problem where the time series are motion-capture vid- “mode,” and the continuous component of the state, which cap- eos of humans engaging in exercise routines, and where the tures the continuous dynamics associated with each mode. modes are specific exercises, such as “jumping jacks” or These models have become increasingly prominent in applica- “squats.” We aim to capture the notion that two different peo- tions in recent years [2]–[8]. ple can engage in the same exercise (e.g., jumping jacks) dur- While Markov switching processes address one key limita- ing their routine. tion of the HMM, they inherit various other limitations of the To address these problems, we need to move beyond the HMM. In particular, the discrete component of the state in the simple discrete Markov chain as a description of temporal seg- Markov switching process has no topological structure mentation. In this article, we describe a richer class of sto- (beyond the trivial discrete topology). Thus it is not easy to chastic processes known as combinatorial stochastic processes that provide a useful foundation for the design of flexible models for temporal segmentation. Combinatorial stochastic processes have been studied for several decades in z z z z . z 1 2 3 4 n probability theory (see, e.g., [9]), and they have begun to play a role in statistics as well, most notably in the area of Bayesian nonparametric statistics where they yield Bayesian approaches y y y y . y 1 2 3 4 n to clustering and survival analysis (see, e.g., [10]). The work (a) that we present here extends these efforts into the time-series domain. As we aim to show, there is a natural interplay z z z z . 1 2 3 4 zn between combinatorial stochastic processes and state-space descriptions of dynamical systems. Our primary focus is on two specific stochastic processes— y y y y . y 1 2 3 4 n the DP and the beta process—and their role in describing modes in dynamical systems. The DP provides a simple descrip- (b) tion of a clustering process where the number of clusters is not fixed a priori. Suitably extended to a hierarchical DP (HDP), z z z z . z 1 2 3 4 n this stochastic process provides a foundation for the design of state-space models in which the number of modes is random and inferred from the data. In the section “Hidden Markov x x x x . x 1 2 3 4 n Models,” we discuss the HDP and its connection to the HMM. Building on this connection, the section “Markov Jump Linear . Systems” shows how the HDP can be used in the context of y y y y y 1 2 3 4 n Markov switching processes with conditionally linear dynamical (c) modes. Finally, in the section “Multiple Related Time Series,” we discuss the beta process and show how it can be used to cap- [FIG1] Graphical representations of three Markov switching ture notions of similarity among sets of modes in modeling processes: (a) HMM, (b) order 2 switching VAR process, and multiple time series. (c) SLDS. For all models, a discrete-valued Markov process zt z 5 6K z = evolves as t11| pk k51, t pzt.For the HMM, observations are y 5 6K z = F1 2 generated as t| uk k51, t uzt , whereas the switching HIDDEN MARKOV MODELS 1z 2 1z 2 1z 2 y = N1A t y A t y t 2 VAR(2) process assumes t 1 t21 1 2 t22, S . The The HMM generates a sequence of latent modes via a discrete- SLDS instead relies on a latent, continuous-valued Markov valued Markov chain [1]. Conditioned on this mode sequence, state xt to capture the history of the dynamical process as specified in (15). the model assumes that the observations, which may be IEEE SIGNAL PROCESSING MAGAZINE [44] NOVEMBER 2010 discrete or continuous valued, THERE IS A NATURAL INTERPLAY parameter space U. Depending are independent. The HMM is BETWEEN COMBINATORIAL STOCHASTIC on the form of the emission the most basic example of a distribution, various choices of PROCESSES AND STATE-SPACE Markov switching process and H lead to computational effi- forms the building block for DESCRIPTIONS OF DYNAMICAL SYSTEMS. ciencies via conjugate analysis. more complicated processes examined later. FINITE HMM Time 123 . T Let zt denote the mode of the Markov chain at time t, and pj the mode-specific transition distribution for mode j. Given the 1 θ1 θ1 θ1 θ1 θ1 mode zt, the observation yt is conditionally independent of the observations and modes at other time steps. The generative 2 θ2 θ2 θ2 θ2 θ2 process can be described as θ θ θ θ θ z 0 z | p Mode 3 3 3 3 3 3 t t21 zt21 y 0 z | F1u 2 . t t zt (1) . # for an indexed family of distributions F1 2 (e.g., multinomial for . discrete data or multivariate Gaussian for real, vector-valued K θk θk θk θk θk data), where ui are the emission parameters for mode i. The (a) notation x | F indicates that the random variable x is drawn π π from a distribution F. We use bar notation x | F | F to specify 12 13 G1 conditioned-upon random elements, such as a random distribution. The directed graphical model associated with the HMM π π is shown in Figure 1(a). 11 1K One can equivalently represent the HMM via a set of transi- θ θ θ . θ Θ g K 1 2 3 k tion probability measures Gj 5 k51 pjkduk, where du is a unit mass concentrated at u.

Load more