Reduced Complexity HMM Filtering with Stochastic Dominance Bounds: Aconvexoptimizationapproach Vikram Krishnamurthy,Fellow,IEEE,Andcristianr.Rojas,Member,IEEE

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 23, DECEMBER 1, 2014 6309 Reduced Complexity HMM Filtering With Stochastic Dominance Bounds: AConvexOptimizationApproach Vikram Krishnamurthy,Fellow,IEEE,andCristianR.Rojas,Member,IEEE

Abstract—This paper uses stochastic dominance principles It is well known [1], [2] that the optimal Bayesian filter (Hidden to construct upper and lower sample path bounds for Hidden Markov Model filter) for computing the -dimensional poste- Markov Model (HMM) filters. We consider an HMM consisting rior vector at each time is of the form of an -state Markov chain with transition matrix .Byusing convex optimization methods for nuclear norm minimization with copositive constraints, we construct low rank stochastic matrices (2) and so that the optimal filters using provably lower and upper bound (with respect to a partially ordered set) the true Here, is a diagonal -dimensional filtered distribution at each time instant. Since and are low rank (say ), the computational cost of evaluating the filtering matrix of observation likelihoods and denotes the -dimen- bounds is instead of .AMonte-Carloimportance sional column vector of ones. sampling filter is presented that exploits these upper and lower Due to the matrix-vector multiplication in (2), the com- bounds to estimate the optimal posterior. Finally, explicit bounds putational cost for evaluating the posterior at each time are given on the variational norm between the true posterior and is .Thisquadraticcomputationalcost can be ex- the upper and lower bounds in terms of the Dobrushin coefficient. cessive for large state space dimension . Index Terms—Hidden Markov model filter, stochastic dominance, copositive matrix, nuclear norm minimization, importance A. Main Results sampling filter, Dobrushin coefficient. This paper addresses the question: Can the optimal filter be approximated with reduced complexity filters with provable I. INTRODUCTION sample path bounds? We derive reduced-complexity filters with computational cost where .Thereare HIS paper is motivated by the filtering problem involving four main results in this paper. T estimating a large dimensional finite state Markov chain 1) Stochastic Dominance Bounds: Theorem 1 presented in given noisy observations. With denoting discrete time, con- Section II asserts that for any transition matrix ,onecancon- sider an -state discrete time Markov chain observed via struct two new transition matrices and ,suchthat anoisyprocess .Here where de- .Here denotes a copositive ordering defined in Section II. notes the dimension of the state space. Let denote the The Bayesian filters using and ,areguaranteedtosandwich transition matrix and denote the the true posterior distribution at any time as observation likelihood probabilities. With denoting the sequence of observations from time 1 to ,define the posterior (3) state probability mass function (pmf) where denotes the filtering recursion (2) and denotes monotone likelihood ratio (MLR) stochastic dominance defined in Section II. What (3) says is that at any time ,thetruepos- (1) terior can be sandwiched in the partially ordered set specified by the above stochastic dominance con- 1 Manuscript received June 22, 2014; revised September 16, 2014; accepted straints. Moreover, if is a TP2 matrix ,then(3)canbeglob- October 02, 2014. Date of publication October 13, 2014; date of current ver- alized to say that if ,then sion November 06, 2014. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Petr Tichavsky. This work (4) was supported in part by NSERC and SSHRC, Canada. Part of this work was completed when V. Krishnamurthy was visiting KTH in 2013. V. Krishnamurthy is with the Department of Electrical and Computer En- where and denote posteriors computed using and . gineering, University of British Columbia, Vancouver, BC V6T 1Z4 Canada The MLRstochasticorder used in (3) and (4) is a partial (e-mail: [email protected]). C. R. Rojas is with the ACCESS Linnaeus Centre, Department of Automatic order on the set of distributions. A crucial property of the MLR Control, KTH Royal Institute of Technology, SE 100 44 Stockholm, Sweden order is that it is closed under conditional expectations. This (e-mail: [email protected]). Digital Object Identifier 10.1109/TSP.2014.2362886 1TP2 matrices are defined in Definition 3.

1053-587X © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. 6310 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 23, DECEMBER 1, 2014 makes it very useful in Bayesian estimation [3]–[6]. An impor- B. Related Work tant consequence of the sandwich result (4) is that the condi- The area of constructing approximate filters for estimating tional mean state estimates are sandwiched as the state of large scale Markov chains has been well studied for all time .Indeed,thesecondandallhighermomentsalso both in discrete and continuous time. Most works [10], [11] as- are sandwiched. sume that the Markov chain has two-time scale dynamics (e.g., Finally, in Section II-D we generalize the above result to mul- the Markov chain is nearly completely decomposable). This tivariate HMMs by using the multivariate TP2 stochastic order. two-time scale feature is then exploited to construct suitable fil- Such multivariate HMMs provide a useful example of large tering approximations on the slower time scale. In comparison, scale HMMs. The TP2 order was pioneered by Karlin [7], see the framework in the current paper does not assume a two-time also Whitt’s classic paper [8]. scale Markov chain. Indeed, our contributions are finite sample 2) Construction of Low Rank Transition Matrices via Nu- results that do not rely on asymptotics. clear Norm Minimization: Section III uses state-of-the-art The main tools used in this paper are based on monotone like- convex optimization methods to construct low rank transition lihood ratio (MLR) stochastic dominance and associated mono- matrices and .Alowrank ensures that the lower and tone structural results of the Bayesian filtering update. Such re- upper bounds to the posterior can be computed with sults have been developed in the context of stochastic control rather than computational cost. The transition matrices and Bayesian games in [5], [12], [3], [13], [4] but have so far not and are constructed as low rank matrices by minimizing been exploited to devise efficient filtering approximations. To their nuclear norms.Matriceswithsmallnuclearnormexhibit the best of our knowledge, constructing upper and lower sample sparseness in the set of eigenvalues or equivalently low rank. path bounds to the optimal filter in terms of stochastic orders is The nuclear norm is the sum of the singular values of a matrix new—and the copositivity constraints presented in this paper and serves as a convex surrogate of the rank of a matrix [9]. yield a constructive realizationofthesebounds.Recently,[3] The construction of low rank transition matrices and use similar copositive characterizations to derive structural re- is formulated as a convex optimization problem on the cone sults in stochastic control. of copositive matrices.2 These computations are performed Optimizing the nuclear norm as a surrogate for rank has been offline without affecting the computational cost of the real time studied as a convex optimization problem in several papers, see filtering algorithm. for example [9]. Inspired by the seminal work of Candès and Tao 3) Stochastic Dominance Constrained Monte-Carlo Im- [14], there has been much recent interest in minimizing nuclear portance Sampling Filter: In monitoring systems, it is of norms in the context of sparse matrix completion problems. Al- interest to detect when the underlying Markov chain is close gorithms for testing for copositive matrices and copositive pro- to a target state. Using the reduced complexity filtering bounds gramming have been studied recently in [15], [16]. outlined above, a monitoring system would want to switch to There has been extensive work in signal processing on pos- the full complexity filter when the filtering bounds approach terior Cramér-Rao bounds for nonlinear filtering [17]; see also [18] for a textbook treatment. These yield lower bounds to the the target state. A natural question is: How can the reduced achievable variance of the conditional mean estimate of the complexity filtering bounds (3) or (4) be exploited to estimate optimal filter. However, unlike the current paper, such poste- the true posterior? Section IV presents an importance sampling rior Cramér-Rao bounds do not give constructive algorithms for Monte-Carlo method for matrix vector multiplication that is computing upper and lower bounds for the sample path of the inspired by recent results in stochastic linear solvers. The filtered distribution. The sample path bounds proposed in this algorithm uses Gibbs sampling to ensure that the estimated paper have the attractive feature that they are guaranteed to yield posterior lies in the partially ordered set lower and upper bounds to both hard and soft estimates of the at each time .Numericalexperimentsshowthatthissto- optimal filter. chastic dominance constrained algorithm yields estimates with substantially reduced mean square errors compared to II. STOCHASTIC DOMINANCE OF FILTERS AND COPOSITIVITY the unconstrained algorithm—in addition, by construction the CONDITIONS estimates are provably sandwiched between and . Theorem 1 below is the main result of this section—it shows 4) Analytical Bounds on Variational Distance: Given the that if stochastic matrices and are constructed such that low complexity bounds and such that , (in terms of a copositive ordering), the filtered anaturalquestion is: How tight are the bounds? Theorem 3 estimates computed using and are guaranteed to sandwich presents explicit analytical bounds on the deviation of the true the optimal filtered estimate in terms of the monotone likeli- posterior (which is expensive to compute) from the lower and hood ratio order. This section setsthestageforSectionIIIwhere upper bounds and in terms of the Dobrushin coefficient of the construction of low rank matrices and is formulated the transition matrix. This yields useful analytical bounds (that as a convex optimization problem on a copositive cone; and can be computed without evaluating the posterior )forquan- also Section IV where algorithms that exploit this result are pre- tifying how the stochastic dominance constraints sandwich the sented. true posterior as time evolves. A. Signal Model and Optimal Filter 2Asymmetricmatrix is copositive if for all positive vectors . (Thus the set of positive definite matrices is a subset of the set of copositive Consider an -state discrete time Markov chain on the matrices. In this paper, are probability mass function vectors.) state space .Suppose has a KRISHNAMURTHY AND ROJAS: REDUCED COMPLEXITY HMM FILTERING WITH STOCHASTIC DOMINANCE BOUNDS 6311 prior distribution .The -dimensional transition proba- —denoted as —if for bility matrix comprises of elements . . The following result is well known [19]. It says that MLR The Markov process is observed via a noisy process dominance implies first order stochastic dominance, and it gives where at each time or .As anecessaryandsufficient condition for stochastic dominance. is widely assumed in optimal filtering, we make the conditional Result 1 ([19]): (i) Let .Then implies independence assumption that given is statistically inde- . pendent of .Forthecase ,de- (ii) Let denote the set of all dimensional vectors with note the observation likelihood probabilities as nondecreasing components, i.e., .Then .Forthecase is the conditional proba- iff for all . bility density. (For readability toanengineeringaudience,uni- Definition 3 (Total Positivity of Order 2): Atransitionmatrix fied notation with respect to the Lebesgue and counting mea- is totally positive of order 2 (TP2) if every second order minor sures is avoided.) of is non-negative. Equivalently, every row is dominated by With denoting the posterior defined in (1), the optimal every subsequent row with respect to the MLR order. filter is given by (2). Note that the posterior lives in an 2) Copositivity: The following definitions of copositive or- dimensional unit simplex comprising of -dimensional prob- dering of stochastic matrices will be used extensively in the ability vectors .Thatis, paper. Let denote the subset of comprised of non-negative vectors—this is termed as the positive orthant. (5) Definition 4 (Copositivity on Simplex): An arbitrary matrix is copositive if for all ,orequiva- Finally, since the state space is ,theconditional lently, if for all . mean estimate of the state computed using the observations The definition says that copositivity on the unit simplex and pos- is (we avoid using the notation of sigma algebras) itive orthant are equivalent. Clearly, positive semidefinite matrices and non-negative matrices are copositive. Given two dimensional transition matrices and , (6) we now define a sequence of matrices ,indexed by ,asfollows:Each is a In some applications, rather than the “soft” state estimate pro- symmetric matrix of the form: vided by the conditional mean, one is interested in the “hard” valued maximum aposteriori estimate defined as

(7) (9)

We refer to the posterior in (2) and state estimates (6), Here and ,respectively,denotethe -th column of ma- (7) computed using transition matrix as the “optimal filtered trix and . estimates” to distinguish them from the lower and upper bound Definition 5 (Copositive Ordering of Stochastic Matrices): filters. Given two transition matrices and ,wesaythat (equivalently, )ifallthematrices B. Some Preliminary Definitions ,defined in (9), are copositive on the simplex . We introduce here some key definitions that will be used in Intuition: It will be proved in Theorem 1 below that the rest of the paper. 1) Stochastic Dominance: We start with the following stan- (10) dard definitions involving stochastic dominance [19]. Recall That is, the ordering of transition matrices is equivalent that is the unit simplex defined in (5). to the MLR ordering of the optimal predictor updates for Definition 1 (Monotone Likelihood Ratio (MLR) Domi- all .Moreover,(10)isequivalenttotheoptimalfilter up- nance): Let be any two probability vectors. Then dates satisfying for any observation is greater than with respect to the MLR ordering—de- and posterior .Inotherwordsthe ordering of transition ma- noted as —if trices preserves the MLR ordering of posterior distributions computed via the optimal filter. This is a crucial property that will be used subsequently in deriving lower and upper bounds (8) to the optimal filtered posterior. It is easily verified that is a Similarly if in (8) is replaced by a . partial order over the set of stochastic matrices, i.e., satisfies The MLR stochastic order is useful since it is closed reflexivity, antisymmetry and transitivity. under conditional expectations. That is, implies C. Upper and Lower Sample Path Stochastic Dominance for any two random variables and Bounds to Posterior sigma-algebra [12], [7], [8], [19]. Definition 2 (First Order Stochastic Dominance, [19]): With the above definitions, we are now ready to state the main Let .Then first order stochastically dominates result of this section. Recall that the original filtering problem 6312 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 23, DECEMBER 1, 2014 seeks to compute using the filtering a) The “soft” conditional mean state estimates defined in update (2) with transition matrix and involves mul- (6), (11) satisfy . tiplications. This can be excessive for large .Ourgoalisto b) The “hard” MAP state estimates defined in (7), (12) construct low rank transition matrices and such that the satisfy . filtering recursion using these matrices form lower and upper bounds to in the MLR stochastic dominance sense. Due to Statement 1 says that for any transition matrix ,thereal- the low rank of and ,thecostinvolvedincomputingthese ways exist transition matrices and such that lower and upper bounds to at each time will be (copositivity dominance). Actually if is TP2, then one can where (for example, ). trivially construct the tightest rank 1 bounds and as shown Since we plan to compute filtered estimates using and in Section III-A. instead of the original transition matrix ,weneedfurtherno- Given existence of and ,thenextstepistooptimizethe tation to distinguish between the posteriors and estimates com- choice of and —that is the subject of Section III where nu- puted using and .Let clear norm minimization is used to construct sparse eigenvalue matrices and . Statement 2 says that for any prior and observation ,the one step filtering updates using and constitute lower and upper bounds to the original filtering problem. Statement 3 globalizes Statement 2 and asserts that with the additional assumption that the transition matrix of the original filtering problem is TP2, then the upper and lower bounds hold denote the posterior updated using the optimal filter (2) with for all time. Since MLR dominance implies first order stochastic transition matrices and ,respectively.Also,asin(6), dominance (see Result 1), the conditional mean estimates satisfy with ,theconditionalmeanestimatesofthe . underlying state computed using and ,respectively,willbe Why MLR Dominance?: The proof of Theorem 1 in the denoted as Appendix uses the result that implies that the filtered update .Sucharesultdoesnothold with first order stochastic dominance —that is, does (11) not imply that .Inotherwords,the MLR order is closed with respect to conditional expectations. In analogy to (7), denote the “hard” MAP state estimates com- This the reason why we use the MLR order (and its multivariate puted using and as generalization called the TP2 order defined below) in this paper. Examples of TP2 Matrices: Several classes of transition matrices satisfy the TP2 property, see [20], [21], [7], ([22], pp. (12) 99–100) and Karlin’s classic book [23]. They are widely used in structural results in stochastic control [5], [12]. The left-to-right The following is the main result of this section. Recall the def- Bakis HMM used in speech recognition [24] has an upper tri- initions of copositivity ordering ,MLRdominanceandTP2in angular transition matrix which has a TP2 structure under mild Section II-B. conditions, e.g., if the upper triangular elements in row are Theorem 1 (Stochastic Dominance Sample-Path Bounds): then is TP2 if . Consider the filtering updates and In the numerical examples section, we use the fact that the where is defined in (2) and denotes the matrix exponential of any tridiagonal generator matrix is TP2 transition matrix of the original filtering problem. ([25], pp. 154). TP2 matrices can be constructed systematically 1) For any transition matrix ,thereexisttransitionmatrices starting with the first row and then generating each subsequent and such that (recall is defined in row to MLR dominate the previous row. Each new row can be Definition 5). constructed via a LP feasibility test since given the previous row, 2) Suppose transition matrices and are constructed such the elements of the next row lie in a convex polytope. that .Thenforany and ,thefiltering updates satisfy the sandwich result D. Stochastic Dominance Bounds for Multivariate HMMs

We conclude this section by showing how the above bounds can be generalized to multivariate HMMs—the main idea is 3) Suppose is TP2 (Definition 3). Assume the filters that MLR dominance is replaced by the multivariate TP2 (to- and are initialized with tally positive of order 2) stochastic dominance [19], [8], [7]. common prior .Thentheposteriorssatisfy We consider a highly stylized example which will serve as a reproducible way of constructing large scale HMMs in numerical studies of Section VI. Consider independent Markov chains, As a consequence for all time , with transition matrices .Define the joint process KRISHNAMURTHY AND ROJAS: REDUCED COMPLEXITY HMM FILTERING WITH STOCHASTIC DOMINANCE BOUNDS 6313

.Let and We need to qualify statement (ii) of Theorem 2 since for mul- denote the vector indices where each index tivariate HMMs, the conditional mean and MAP estimate . are -dimensional vectors. The inequality of Suppose the observation process recorded at a sensor has statement (ii) is interpreted as thecomponent-wisepartialorder the conditional probabilities .Even on ,namely, for all .(Asimilar though the individual Markov chains are independent of each result applies for the upper bounds.) other, since the observation process involves all Markov chains, computing the filtered estimate of ,requirescom- III. CONVEX OPTIMIZATION TO COMPUTE LOW RANK puting and propagating the joint posterior .Thisis TRANSITION MATRICES equivalent to HMM filtering of the process with transition matrix where denotes Kronecker It only remains to give algorithms for constructing low rank product. If each process has states, then is an transition matrices and that yield the lower and upper matrix and the computational cost of the HMM filter at each bounds and .Theseinvolveconvexoptimization[26],[27] time is which is excessive for large . for minimizing the nuclear norm. The computation of and Anaiveapplicationoftheresultsoftheprevioussections is independent of the observation sample path and so the asso- will not work, since the MLR ordering does not apply to the ciated computational cost isirrelevanttotherealtimefiltering. multivariate case (in general, mapping the vector index into a Recall that the motivation is as follows: If and have rank , scalar index does not yield univariate distributions that are MLR then the computational cost of the filtering recursion is orderable). We use the totally positive (TP2) stochastic order, instead of at each time . which is a natural multivariate generalization of the MLR order [19], [8], [7]. Denote the element-wise minimum and maximum A. Construction of Without Rank Constraint vectors Given a TP2 matrix ,thetransitionmatrices and such that can be constructed straightforwardly via an LP solver. With denoting the rows of , asufficient condition for is that for any row (13) .Hence,therows satisfy linear constraints with respect to and can be straightforwardly constructed via an LP solver. Denote the -variate posterior at time as Asimilarconstructionholdsfortheupperbound ,whereitis sufficient to construct . Rank 1 Bounds: If is TP2, an obvious construction is to construct and as follows: Choose rows and and let denote the space of all such -variate posteriors. for .Theseyieldrank1matrices and .It Definition 6 (TP2 Ordering3): Let and denote any is clear from Theorem 1 that and constructed in this manner two -variate probability mass functions. Then if are the tightest rank 1 lower and upper bounds. . B. Nuclear Norm Minimization Algorithms to Compute Low If and are univariate, then this definition is equivalent to Rank Transition Matrices the MLR ordering .Indeed,justliketheMLRorder, the TP2 order is closed under conditional expectations [7]. In this subsection we construct and as low rank transition Next define such that its -th component is .In matrices subject to the condition .Tosavespacewe analogy to Definition 5 and (10), given two transition matrices consider the lower bound transition matrix ;constructionof and ,wesaythat is similar. Consider the following optimization problem for :

(14) (15)

subject to the constraints for The main result regarding filtering of multivariate HMMs is ,wherefor as follows: Theorem 2: Consider an -variate HMM where each transition matrix satisfies for (where is (16a) interpreted as in Definition 5). Then (16b) (i) . (16c) (ii) Theorem 1 holds for the posterior and state estimates with replaced by . Recall is defined in (9) and (16a) is equivalent to .The constraints are convex in matrix ,since(16a) 3 In general the TP2 order is not reflexive. A multivariate distribution is and (16c) are linear in the elements of ,and(16b)isconvex said to be multivariate TP2 (MTP2) if holds, i.e., (because norms are convex). The constraints (16a), (16c) are ex- .Actuallythisdefinition of reflexivity also applies to stochastic matrices. That is, if are scalar indices then MTP2 and TP2 actly the conditions of Theorem 1, namely that is a stochastic (Definition 3) are identical for a stochastic matrix, see [7]. matrix satisfying . 6314 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 23, DECEMBER 1, 2014

The convex constraint (16b) is equivalent to , tion method detailed in [16]. The nice key idea used in [16] is where denotes the induced 1-norm for matrices.4 summarized in the following proposition. To solve the above problem, we proceed in two steps: Proposition 1 ([16]): Let denote any sub-simplex of the 1) The objective (15) is replaced with the reweighted nuclear belief space ,withvertices .Thenasufﬁcient norm (see Section III-B1 below). condition for copositive condition (16a) to hold on is that 2) Optimization over the copositive cone (16a) is achieved via for all pairs of vertices . asequenceofsimplicialdecompositions(SectionIII-B2 Proposition 1 allows us to replace the constraints (16a)–(16c) below). with constraints of the type 1) Reweighted Nuclear Norm: Since the rank is a non-convex function of a matrix, direct minimization of the rank (15) is computationally intractable. Instead, we follow the approach developed by Boyd and coworkers [26], [27] to minimize the iteratively reweighted nuclear norm. As mentioned earlier, inspired by Candès andTao[14],therehasbeenmuch recent interest in minimizing nuclear norms for constructing Let denote the set of subsimplices matrices with sparse eigenvalue sets or equivalently low rank. at iteration that constitute a partition of ,i.e.,foreach Here we compute by minimizing their nuclear norms is the convex hull subject to copositivity conditions that ensure . of the vertices .Proposition1alongwiththe Let denote the nuclear norm, which corresponds to the nuclear norm minimization leads to a ﬁnite dimensional convex sum of the singular values of a matrix, The re-weighted nuclear optimization problem that can be solved via the following 2 norm minimization proceeds as a sequence of convex optimiza- step algorithm: tion problems indexed by .Initialize .For ,compute matrix for iterations , 1) Solve the sequence of convex optimization problems (17), with constraints

. (17) 2) if nuclear norm decreases compared to that in iteration by more than Notice that at iteration ,thepreviousestimate, ap- apre-defined tolerance, partition into as pears in the cost function of (17) in terms of weighting matrices detailed in [16]: Among all subsimplices and pairs .Theseweightingmatricesareevaluatediteratively of vertices, choose the pair satisfying as with the longest distance between them and pick the two subsimplices that share the edge formed by these vertices, . Then, subdivide these subsimplices along the midpoint of this edge, i.e., (18) if , replace it in by the two new subsimplices and Here is a reduced singular value de- , composition, starting with and . and similarly for . Also is a small positive constant in the regularization term Set and go to Step 1. .InnumericalexamplesofSectionVI,weusedYALMIP else Stop. with MOSEK and CVX to solve the above convex optimization problem. The iterations of the above simplicial algorithm lead to a se- The intuition behind the reweighting iterations is that as the quence of decreasing costs ,hencethealgo- estimates converge to the limit ,thecostfunction rithm can be terminated as soon as the decrease in the cost be- becomes approximately equal to the rank of . comes smaller than a pre-defined value (set by the user); please 2) Simplicial Decomposition for Copositive Programming: see [16] for details on simplex partitioning. We emphasize again Problem (17) is a convex optimization problem in .However, that the algorithms in this section for computing and are one additional issue needs to be resolved: the constraints (16a) off-line and do not affect the real time filtering computations. involve a copositive cone and cannot be solved directly by standard interior point methods. To deal with the copositive con- IV. STOCHASTIC DOMINANCE CONSTRAINED IMPORTANCE straints (16a), we use the state-of-the-art simplicial decomposi- SAMPLING FILTER 4The three statements and are all equivalent since because is a probability So far we have constructed reduced complexity lower and vector (pmf) upper stochastic dominance bounds that confine the posterior KRISHNAMURTHY AND ROJAS: REDUCED COMPLEXITY HMM FILTERING WITH STOCHASTIC DOMINANCE BOUNDS 6315 sample path of the optimal filter to the partially ordered set if the importance pmf is chosen as ,then is simulated at each time .Thenextquestionis:Giventhese from and . estimates and ,howtoconstructanalgorithmtoestimate The set consists of the indices of accepted samples ?Thatis,howcanthereduced complexity bounds and that satisfy the constraints. Finally, be exploited to estimate the posterior ?Wepresentafiltering (22) is the average of these accepted samples. algorithm that is inspired by recent results in stochastic linear solvers [28], [29]. The algorithm uses importance sampling for Algorithm 1 Stochastic Dominance Constrained matrix-vector multiplication together with Gibbs sampling to Importance Sampling Filter at time ensure that the estimated posterior lies in the partially ordered set .Theresultingestimatethatexploitsthe Aim: Given posterior estimate ,lowerbound upper and lower bounds has a lower variance than the uncondi- and upper bound ,evaluate . tional estimator since . Why? Running a reduced complexity estimator and then Step 0 (offline):GivenTP2transitionmatrix ,compute switching to a high resolutionestimatorwhenaneventof low rank and with by minimizing nuclear interest occurs, arises in monitoring systems, cued sensing in norm (Section III-B). adaptive target tracking systems [30] and body area networks Step 1:Evaluatepredicted&filtered upper/lower bounds [31], [32]. In these examples, it is of interest to detect when the underlying Markov chain is close to a target state. A sensor monitoring the state of a noisy Markov chain can compute the reduced complexity filtering bounds cheaply. Since the reduced complexity bounds provably sandwich the true posterior, as (19) soon as these bounds get close to a target state, the sensor switches to a higher resolution (complexity) estimator. In cued Step 2:Computepredictedestimate using . target tracking, when a target’s state approaches a high threat level, the reduced complexity tracker can cue (deploy) a higher for to do resolution (complexity) tracker. Evaluate stochastic dominance path bounds and : and for A. Stochastic Dominance Constrained Importance Sampling Filtering Algorithm Suppose at time ,wehaveanestimate of the optimal filtered estimate together with the reduced complexity bounds and such that . Algorithm 1 computes an estimate of the optimal filtered estimate .Steps0and1weredetailedinSectionsIIand (20) III. Step 2 computes an estimate of the optimal predictor using Monte-Carlo importance sampling for for iterations to do matrix-vector multiplication so that the constraint holds. Step 3 then computes the Sample from importance pmf . filtered posterior at time with computations as if then (Bayes rule). By Theorem 1, this updated posterior is guaranteed to satisfy . (21) Discussion of Step 2 in Algorithm 1: Step 2 computes the components of predicted estimate .At the -th iteration of Step 2, the aim is to simulate samples end if from the -th row of end for so that these samples satisfy .Itis Compute estimated predictor as easily seen that this is equivalent to the MLR constraint for the -th component in terms (22) of the -th component. Building up these samples se- quentially for constitutes a Gibbs sampling approach to ensure that the constraints hold and is a special case (If is empty, set .) of adaptive importance sampling.5 Equation (20) together with end for the if-then statement preceding(21)achievesthisobjective. Next, the update in (21), namely ,is Step 3:Computefiltered estimate . simply an importance sampling Monte-Carlo estimator for the -th component of matrix vector product .Forexample, The key point in Algorithm 1 is thereducedvariancecompared 5We thank Eric Moulines and Olivier Cappe of ENST for mentioning this. to the unconstrained estimator since 6316 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 23, DECEMBER 1, 2014

.Ifthestochasticdominanceconstraintsarenotex- efficient way of enforcing the MLR constraints ploited, then instead of (20), in the computation of .(Ifwepropagatedthemarginals,then the algorithm becomes identical to Algorithm 1.) Also, since (23) MLR comparison of two -dimensional posteriors involves multiplications, projecting particles to the polytope Choice of Importance Distribution: In Algorithm 1, the im- involves computations. Finally, in the portance distribution is an -dimension probability vector. particle filtering folklore, the so called ‘optimal’ choice for the There are several choices for the importance distribution . importance density is 1) An obvious choice is ,inwhichcase(21) with particle weight update . becomes: If then . For each particle, this requires computations and hence 2) The optimal importance function, which minimizes the for particles. variance of ,is .Thisisnot useful since evaluating it requires multiplications for each and therefore multiplications in total. V. A NALYSIS OF BOUNDS 3) A near optimal choice is to choose or .Thesehavealreadybeencomputed Algorithm 2 Importance Sampling Filter for estimating and therefore no extra computations are required. These lower bound at time are particularly useful when and are constructed to minimize the distance between the bounds and the actual Aim: Given lower bound estimate ,evaluatelower posterior (as discussed in Section III-B). bound . One can add an optional step below (21) to increase the sampling efficiency—if a particular index does not satisfy the for to do constraint, then there is no need to simulate it again; simula- for iterations to do tion of this index can be eliminated by setting the corresponding probability . Sample from importance pmf . Note that Algorithm 1 is not a particle filter. In particular, (21) Set . is simply a Monte-Carlo evaluation of the matrix multiplication and is motivated by techniques in [29], [28]. Without the end for sample path constraints, (21) reduces to Algorithm 1 of [28]. De- generacy issues that plague particle filtering do not arise. For Set . iterations at each time instant ,Algorithm1has end for computational cost where is the rank of .Incomparisonapar- ticle filter with particles involves computational cost. Set (where the vectors are precomputed). B. Importance Sampling Filter for Computing Lower Bound Compute filtered posterior . Given the lower bound matrix of rank can be computed exactly using (19) with computations. An al- The main result, namely, Theorem 1 above, was an ordinal ternative method is to exploit the rank and estimate by bound: It said that we can compute reduced complexity fil- using Monte-Carlo importance sampling methods similar to Al- ters and such that the posterior of the original gorithm 1. Consider the singular value decomposition of : filtering problem is lower and upper bounded on the partially ordered set for all time .Moreover,bymini- (24) mizing (17), we computed transition matrices and so that and . where we have minimized rank via the nuclear norm mini- In this section we construct cardinal bounds—that is, an ex- mization algorithm of Section III-B. Algorithm 2 presents the plicit analytical bound is developed for in terms of importance sampling filter for (computing the upper bound .TheseboundstogetherwithTheorem1giveacompletechar- is similar). acterization of the reduced complexity filters. The choice of importance sampling distributions is similar In order to present the main result, we first define the Do- to that for Algorithm 1. brushin coefficient: Definition 7 (Dobrushin Coefficient): For a transition matrix C. Stochastic Dominance Constrained Particle Filter—A ,theDobrushincoefficient of ergodicity is Non-Result Given the abundance of publications in particle filtering, it is (25) of interest to obtain a particle filtering algorithm that exploits the upper and lower bound constraints to estimate the poste- Note that lies in the interval .Also implies rior. Unfortunately, since particle filters propagate trajectories that the process is independent and identically distributed and not marginals, we were unable to find a computationally (iid). In words: the Dobrushin coefficient of ergodicity is KRISHNAMURTHY AND ROJAS: REDUCED COMPLEXITY HMM FILTERING WITH STOCHASTIC DOMINANCE BOUNDS 6317 the maximum variational norm6 between two rows of the tran- VI. NUMERICAL EXAMPLES sition matrix . The following is the main result of this section: In this section we present numerical examples to illustrate Theorem 3: Consider an HMM with transition matrix and the behavior of the reduced complexity filtering algorithms pro- state levels .Let denote the user defined parameter in posed in this paper. To give the reader an easily reproducible nu- constraint (16b) of convex optimization problem (17) and let merical example of large dimension, we construct a 3125 state denote the solution. Then Markov chain according to the multivariate HMM construction 1) The expected absolute deviation between one step of fil- detailed in Section II-D. Consider independent Markov tering using versus is upper bounded as: chains ,eachwith5states.Theobservation process is

(26)

Here is with respect to the measure which is the denomi- where the observation noise is zero mean iid Gaussian nator of the Bayesian filtering update . with variance .Sincetheobservationprocessinvolvesall 2) The sample paths of the filtered posteriors have the fol- 5Markovchains,computingthefiltered estimate requires lowing explicit bounds at each time propagating the joint posterior. This is equivalent to defining a state Markov chain with transition matrix where denotes Kronecker product. The

(27) optimal HMM filter incurs million computations at each time step . Here denotes the Dobrushin coefficient of the transi- 1) Generating TP2 Transition Matrix: To illustrate the re- tion matrix and is the posterior computed using the duced complexity global sample path bounds developed in The- HMM filter with ,and orem 1, we consider the case where is TP2. We used the fol- (28) lowing approach to generate :Firstconstruct , where is a tridiagonal generator matrix (nonnegative off-diagonal entries and each row adds to 0) and .Karlin’sclassic Theorem 3 gives explicit upper bounds between the filtered book ([25], pp. 154) shows that is then TP2. Second, as shown distributions using transition matrices and .Similarbounds in [7], the Kronecker products of preserve the TP2 property hold for and are omitted. The approach used here in terms implying that is TP2. of the Dobrushin coefficient is similar to that in [1]. We also Using the above procedure, we constructed a 3125 3125 refer to [33] for stability results on filters involving mismatch TP2 transition matrix as shown in (29) at the bottom of the between the true system and the approximating filter model. In page. [34] stability results are presented for filters involving mismatch 2) Off-Line Optimization of Lower Bound via Convex Opti- between the true system and the averaged system for two-time mization: We used the semidefinite optimization solvesdp scale systems. solver from MOSEK with YALMIP and CVX to solve7 the convex The bounds are useful since their computation involves optimization problem (17) for computing the upper and lower the reduced complexity filter with transition matrices —the bound transition matrices and .Toestimatetherankofthe original transition matrix is not used. In numerical examples resulting transition matrices, weconsiderthecosts(17),which below, we illustrate (26). correspond approximately to the number of singular values larger than (defined in (18)). The reweighted nuclear norm 6Itis conventional to usethe variationalnorm to measurethe distance between two probability distributions. Recall that given probability mass functions and algorithm is run for 5 iterations, and the simplicial algorithm is on ,thevariationalnormis stopped as soon as the cost decreased by less than 0.01. . So the variational norm is just half the norm between two probability mass functions. 7See http://web.cvxr.com/cvx/doc/ for a complete documentation of CVX.

(29) 6318 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 23, DECEMBER 1, 2014

TABLE I RANKS OF LOWER BOUND TRANSITION MATRICES EACH OF DIMENSION 3125 3125 OBTAINED AS SOLUTIONS OF THE NUCLEAR NORM MINIMIZATION PROBLEM (17) FOR SIX DIFFERENT CHOICES OF APPEARING IN CONSTRAINT (16B). NOTE CORRESPONDS TO AND CORRESPONDS TO THE IID CASE

Fig.2. MeanSquareErroroflowerboundreducedcomplexity filters computed using five different transition matrices summarized in Table I. The transition matrix of dimension 3125 3125 is specified in (29). The four solid lines (lowest to highest curve) are for . The optimal filter corresponds to , while the iid approximation corresponds to .

mean square error of the state estimate. These average mean square error values are displayed in Fig. 2. As might be intu- Fig. 1. Plot of 3125 singular values of and singular values of five different itively expected, Fig. 2 shows that the reduced complexity filters transition matrices parametrized by in Table I. The transition matrix (corresponding to ) of dimension 3125 3125 is specified in (29). yield a mean square error that liesbetweentheiidapproximation and the optimal filter .Inallcases,asmentioned in Theorem 1, the estimate provably lower bounds the true To save space we present results only for the lower bounds. posterior as for all time .Thereforetheconditional We computed8 5differentlowerboundtransitionmatrices by mean estimates satisfy for all . solving the nuclear norm minimization problem (17) for 5 dif- 4) Stochastic Dominance Constrained Importance Sampling ferent choices of defined in constraint Algorithm 1: Here we illustrate the performance of three dif- (16b). ferent suboptimal state predictors compared to the optimal pre- Table I displays the ranks of these 5 transition matrices , dictor, namely and also the rank of which corresponds to the case . 1) Stochastic dominance predictor based on the upper/lower The low rank property of can be visualized by displaying the bounds using (22) of Algorithm 1. We ran this for 5 dif- singular values. Fig. 1 displays the singular values of and . ferent values of ,namely,2,4,6,8,10iterationsateach When ,therankof is 1 and models an iid chain; time. then simply comprises of repetitions of the first row of .As 2) Stochastic dominance predictor without exploiting bounds, is made smaller the number of singular values increases. For namely (22), (23), again for the 5 values of . coincides with . 3) Reduced complexity lower bound predictor com- 3) Performance of Lower Complexity Filters: At each time puted using the lower bound transition matrix . ,thereducedcomplexityfilter incurs With denoting any of the above three suboptimal predic- computational cost of where and is spec- tors, we computed the mean square error averaged over a million ified in Table I. For each matrix and noise variances in simulations as follows: the range we ran the reduced complexity HMM filter for a million iterations and computed the average

8In practice the following preprocessing is required: computed via the semideﬁnite optimization solver has several singular values close to zero—this (30) is the consequence of nuclear norm minimization. These small singular values need to be truncated exactly to zero thereby resulting in the computational sav- Here denotes the optimal predictor and was sam- ings associated with the low rank. How to choose the truncation threshold? We truncated those singular values of to zero so that the resulting matrix pled uniformly from the dimensional unit simplex. (after rescaling to a stochastic matrix) satisﬁes the normalized error bound Fig. 3(a) and (b) display MSE of all 3 predictors listed above. , i.e., negligible error. The rescaling to a Fig. 3(a) corresponds to ,resultingin of rank 1. Fig. 3(b) stochastic matrix involved subtracting the minimum element of the matrix (so corresponds to ,resultingin of rank 40. For the every element is non-negative) and then normalizing the rows. This transforma- tion does not affect the rank of the matrix thereby maintaining the low rank. For constrained and unconstrained algorithms, naturally, more itera- notational convenience, we continue to use instead of . tions per time step yield more accurate estimates and a smaller KRISHNAMURTHY AND ROJAS: REDUCED COMPLEXITY HMM FILTERING WITH STOCHASTIC DOMINANCE BOUNDS 6319

Fig. 4. The “upper bound” in the figure denotes the right hand side of (26) normalized by .Thevaluesdisplayedareforfive different values of corresponding to five different transition matrices whose ranks are given in Table I. The observation matrix parametrized by is specified in (31).

and .Thefigure shows that the bounds have two properties that are intuitive: First as get smaller, the approximation gets tighter and so one would expect that is smaller. This is reflected in the upper bound displayed in the figure. Second, for larger values of ,the“smaller”thenoiseandsothehighertheesti- mation accuracy. Again the bounds reflect this.

VII. DISCUSSION Reduced Complexity Predictors: If one were interested in constructing reduced complexity HMM predictors (instead of filters), the results in this paper are straightforwardly relaxed using first order dominance instead of MLR dominance Fig. 3. Average mean square error (MSE) defined in (30) between optimal pre- as follows: Construct by nuclear norm minimization as in dictor and three suboptimal predictors described in Section VI-4, namely the (17), where (16a) is replaced by the linear constraints , constrained importance sampling predictor (denoted as “constrained”), the un- on the rows ,and(16b),(16c)hold.Thusthecon- constrained importance sampling predictor (denoted as “unconstrained”) and the reduced complexity lower bound predictor (denoted as “lower bound”). The struction of is a standard convex optimization problem and transition matrix of dimension 3125 3125 is specified in (29). Recall from the bound holds for the optimal predictor for all Table I that corresponds to the iid transition matrix of rank 1, while . corresponds to of rank 40. (a) ,(b) . Further, if is chosen so that its rows satisfy the linear constraints ,thenthefollowingglobal MSE. The MSE of the reduced complexity lower bound pre- bound holds for the optimal predictor: for all dictor is displayed with a dashed line. (Recall the performance time and .Asimilarresultholdsfortheupperbounds of the lower bound estimates with these transition matrices were in terms of . reported in Section VI-3.) The figures show that substantial re- It is instructive to compare this with the filtering case, where ductions in the mean square error occur by exploiting the sto- we imposed a TP2 condition on for the global bounds (4) to chastic dominance constraints, even for the iid lower bound case hold wrt .WecouldhaveequivalentlyimposedaTP2con- . straint on and allow to be arbitrary for the global filtering 5) Explicit Bounds: We now illustrate the explicit bound bounds (4) to hold, however the TP2 constraint is non-convex. (26). We chose the same 3125 state Markov chain with Finally, keep in mind that the predictor bounds in terms of as above and a tridiagonal observation matrix do not hold if a filtering update is performed since is not closed wrt conditional expectations (unlike ). (31) Summary: The main idea of the paper was to develop reduced complexity HMM filtering algorithms with provable sample path bounds. At each iteration, the optimal HMM We evaluated the right hand side of the bound (26) normalized filter has computations and our aim was to derive by for 5 different choices of de- reduced complexity upper and lower bounds with complexity fined in constraint (16b). (Recall from Table I that these corre- where .Thepaperconsistedof4mainresults. spond to 5 different choices of .) Fig. 4 displays these bounds Theorem 1 showed that one can construct transition matrices for three different observation matrices, namely and and lower and upper bound beliefs and that 6320 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 23, DECEMBER 1, 2014 sandwich the true posterior as ,foralltime B. Proof of Theorem 2 .Theorem2generalizedthistomultivariateTP2 If suffices to show that . orders. Section III used copositive programming methods to (The proof for repeated Kronecker products then follows by in- construct low rank transition matrices and of rank by duction.) Consider the TP2 ordering in Definition 6. The in- minimizing the nuclear norm to guarantee dices and are each two dimensional. and over the space of all posteriors . There are four cases: Finally, Theorem 3 derived explicit bounds between the optimal .TP2dominanceforthefirst and estimates and the reduced complexity estimates. last cases are trivial to establish. We now show TP2 dominance It is interesting that the derivation of MLR stochastic dominance bounds in this paper involves copositivity conditions. for the third case (the second case follows similarly): Choosing There is a rich literature in copositivity including computational the indices and ,itfollowsthat aspects [15], [16]. In future work it is worthwhile extending the is equivalent to bounds in this paper to copositive kernels for continuous state filtering problems. Such results could yield guaranteed sample path bounds for general nonlinear filtering problems.

APPENDIX So a sufficient condition is that for any non-negative numbers and which is equivalent to by Definition 5. A. Proof of Theorem 1 C. Proof of Theorem 3 and Auxiliary Results 1. Choose and where is the unit -dimensional vector with 1 in the th position. We start with the following theorem that characterizes the Then clearly, .Thesecorrespondtoextremepointson (equivalently, variational distance) in the classical Bayes’ rule. the space of matrices with respect to copositive dominance. Recall that the Bayes’ rule update using prior and observation 2. We show this in 2 steps. In the first step, we show that for is the predictor: .Bydefinition is equivalent to Theorem 4: Consider any two posterior probability mass functions .Then: 1) The variational distance in the Bayesian update satisfies for each .Thisisequivalenttotheabovecondition holding for and .Thisisequivalent to the copositivity ordering of Definition 5. (Recall that the variational distance is half the norm). In the second step, we show that the filtered updates satisfy 2) The normalization terminBayes’rulesatisfies .Denote and and from the first step .Itisstraightforwardlyverified that implies for any observation likelihood .(Thiscrucialpropertyofclosure under Bayes’ rule makes Proof: We refer to [1] for a textbook treatment of similar the MLR stochastic order ideal for this paper.) proofs on more general spaces. 3. Suppose .ThenbyStatement2, 1) Statement 1: For any .Nextsince is TP2, it follows that implies .Combiningthe two inequalities yields ,or equivalently .Finally,MLRdominanceimplies first order dominance which by Result 1 implies dominance of (32) means thereby proving 3(a). To prove 3(b) we need to show that implies Applying the result9 (see Theorem 5 below for proof) that for .Thisisshownbycontra- any vector diction: Let and .Suppose .Then implies .Since (33)

,wehave which is a contradiction 9This inequality is tighter than Holder’s inequality which is since is the argmax for . . KRISHNAMURTHY AND ROJAS: REDUCED COMPLEXITY HMM FILTERING WITH STOCHASTIC DOMINANCE BOUNDS 6321 to the right hand side of the above equation yields, Part 2:Thetriangleinequalityfornormsyields

where and . (35) So Consider the ﬁrst normed term in the right hand side of (35). Applying Theorem 4(1) with the notation and yields Since is a probability vector, clearly .Thistogetherwiththefactthat are non-negative implies

So denoting ,wehave (36)

The second last inequality follows from the construction of satisfying (16b) (recall the variational norm is half the norm). Finally applying the result that for The last inequality follows from Theorem 4(2). (see ([35], pp. 267)), yields Consider the second normed term in the right hand side of (35). Applying Theorem 4(i) with notation and yields

2) Statement 2: Applying Holder’s inequality yields (37)

where the last inequality follows from the submultiplicative implying that property of the Dobrushin coefﬁcient. Substituting (36) and (37) into the right hand side of the triangle inequality (35) (34) proves the result. Proof of (33): This is given in the following theorem. See Also clearly .Combining ([1], pp. 93) for a more general setting. this with (34) proves the result. Theorem 5: . 3) Proof of Theorem 3: With the above results we are now Let denote those elements of ready to prove the theorem. that are non-negative. Let denote the Part 1:Since and is a magnitude of elements of that are negative. Then vector with increasing elements, therefore .Applying(32)with and yields

where .Then(33)yields

Since ,takingexpectationswithrespectto the measure ,completestheproof. But . 6322 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 23, DECEMBER 1, 2014

ACKNOWLEDGMENT [23] S. Karlin,TotalPositivity. Stanford, CA, USA: Stanford Univ. Press, 1968. The authors would like to thank B. Wahlberg at KTH for fa- [24] L. Rabiner, “A tutorial on hidden Markov models and selected applica- cilitating the visit of the first author. tions in speech recognition,” Proc. IEEE, vol. 77, no. 2, pp. 257–285, Feb. 1989. [25] S. Karlin and H. M. Taylor, A Second Course in Stochastic Pro- cesses.NewYork,NY,USA:Academic,1981. REFERENCES [26] M. Fazel, H. Hindi, and S. P. Boyd, “A rank minimization heuristic with application to minimum order system approximation,” in Proc. [1] O. Cappe, E. Moulines, and T. Ryden,InferenceinHidden Markov Amer. Control Conf. (ACC’01), 2001, vol. 6, pp. 4734–4739. Models. Berlin, Germany: Springer-Verlag, 2005. [27] M. Fazel, H. Hindi, and S. P. Boyd, “Log-det heuristic for matrix rank [2] R. Elliott, L. Aggoun, and J. Moore, Hidden Markov Models—Estima- minimization with applications to hankel and euclidean distance mation and Control.Berlin,Germany:Springer-Verlag, 1995. trices,” presented at the 2003 Amer. Contr. Conf., 2003. [3] V. Krishnamurthy, “Bayesian sequential detection with phase-dis- [28] S. Eriksson-Bique, M. Solbrig, M. Stefanelli, S. Warkentin, R. Abbey, tributed change time and nonlinear penalty—A Lattice programming and I. Ipsen, “Importance sampling for a Monte Carlo matrix multipli- POMDP approach,” IEEE Trans. Inf. Theory,vol.57,no.10,pp. cation algorithm, with application to information retrieval,” SIAM J. 7096–7124, Oct. 2011. Sci. Comput., vol. 33, no. 4, pp. 1689–1706, 2011. [4] V. Krishnamurthy, “How to schedule measurements of a noisy Markov [29] P. Drineas, R. Kannan, and M. W. Mahoney, “Fast Monte Carlo algo- chain in decision making?,” IEEE Trans. Inf. Theory, vol. 59, no. 9, pp. rithms for matrices I: Approximating matrix multiplication,” SIAM J. 4440–4461, Sep. 2013. Comput., vol. 36, no. 1, pp. 132–157, 2006. [5] W.Lovejoy,“Somemonotonicityresults forpartially observed Markov [30] S. Blackman and R. Popoli, Design and Analysis of Modern Tracking decision processes,” Oper. Res., vol. 35, no. 5, pp. 736–743, Sep.–Oct. Systems. Cedar Knolls, NJ, USA: Artech House, 1999. 1987. [31] J. Boger, P. Poupart, and J. Hoey, “A decision-theoretic approach to [6] W. Whitt, “A note on the influence of the sample on the posterior dis- task assistance for persons with dementia,” in Proc. Int. Joint Conf. tribution,” J. Amer. Statist. Assoc., vol. 74, pp. 424–426, 1979. Artif. Intell., 2005, pp. 1293–1299. [7] S. Karlin and Y. Rinott, “Classes of orderings of measures and related [32] M. Pollack,L. Brown,and D.Colbry,“Autominder: Anintelligentcog- correlation inequalities. I. Multivariate totally positive distributions,” nitive orthotic system for people with memory impairment,” Robot. J. Multivar. Anal., vol. 10, no. 4, pp. 467–498, Dec. 1980. Autonom. Syst., vol. 44, pp. 273–282, 2003. [8] W. Whitt, “Multivariate monotone likelihood ratio and uniform condi- [33] O. Techakesari, J. J. Ford, and D. Nešić, “Practical stability of approx- tional stochastic order,” J. Appl. Probab., vol. 19, pp. 695–701, 1982. imating discrete-time filters with respect to model mismatch,” Auto- [9] Z. Liu and L. Vandenberghe, “Interior-point method for nuclear norm matica, vol. 48, no. 11, pp. 2965–2970, 2012. approximation with application to system identification,” SIAM J. Ma- [34] H. Kushner, Weak Convergence and Singularly Perturbed Stochastic trix Anal. Appl., vol. 31, no. 3, pp. 1235–1256, 2009. Control and Filtering Problems.Boston,MA,USA:Birkhauser, [10] Q. Zhang, G. Yin, and J. Moore, “Two-time-scale approximation 1990. for Wonham filters,” IEEE Trans. Inf. Theory, vol. 53, no. 5, pp. [35] R. Horn and C. Johnson,MatrixAnalysis, 2nd ed. Cambridge, U.K.: 1706–1715, May 2007. Cambridge Univ. Press, 2012. [11] G. Yin, Q. Zhang, J. Moore, and Y. Liu, “Continuous-time tracking algorithms involving two-time-scale Markov chains,” IEEE Trans. Signal Process., vol. 53, no. 12, pp. 4442–4452, 2005. [12] U. Rieder, “Structural results for partially observed control models,” Methods Models Oper. Res., vol. 35, no. 6, pp. 473–490, 1991. Vikram Krishnamurthy (S’90–M’91–SM- [13] V. Krishnamurthy, “Quickest detection POMDPs with social learning: 99–F’05) received his Ph.D. from the Australian Interaction of local and global decision makers,” IEEE Trans. Inf. National University in 1992. Currently he is apro- Theory, vol. 58, no. 8, pp. 5563–5587, Aug. 2012. fessor and Canada Research Chair at the University [14] E. J. Candès and T. Tao, “The power of convex relaxation: Near-op- of British Columbia, Vancouver, Canada. timal matrix completion,” IEEE Trans. Inf. Theory, vol. 56, no. 5, pp. His research interests include statistical signal 2053–2080, May 2009. processing, biomolecular simulation and thedy- [15] S. Bundfuss and M. Dür, “Algorithmic copositivity detection by sim- namics and control of social networks. He served plicial partition,” Linear Algebra Appl., vol. 428, no. 7, pp. 1511–1523, as distinguished lecturer for the IEEE Signal 2008. Processing Society and Editor in Chief of IEEE [16] S. Bundfuss and M. Dür, “An adaptive linear approximation algorithm JOURNAL SELECTED TOPICS IN SIGNAL PROCESSING. for copositive programs,” SIAM J. Optimiz., vol. 20, no. 1, pp. 30–53, He received an honorary doctorate from KTH (Royal Institute of Technology), 2009. Sweden in 2013. [17] P. Tichavsky, C. Muravchik, and A. Nehorai, “Posterior Cramer-Rao bounds for discrete-time nonlinear filtering,” IEEE Trans. Signal Process., vol. 46, no. 5, pp. 1386–1396, May 1998. [18] B. Ristic, S. Arulampalam, and N. Gordon, Beyond the Kalman Filter: Cristian R. Rojas (M’13) was born in 1980. He Particle Filters for Tracking Applications. Cedar Knolls, NJ, USA: received the M.S. degree in electronics engineering Artech, 2004. from the Universidad Técnica Federico Santa María, [19] A. Muller and D. Stoyan,Comparison Methods for Stochastic Models Valparaíso, Chile, in 2004, and the Ph.D. degree in and Risk.NewYork,NY,USA:Wiley, 2002. electrical engineering at The University of New- [20] M. Kijima, Markov Processes for Stochastic Modelling.BocaRaton, castle, NSW, Australia, in 2008. FL, USA: Chapman and Hall, 1997. Since October 2008, he has been with the Royal [21] J. Keilson and A. Kester, “Monotone matrices and monotone Markov Institute of Technology, Stockholm, Sweden, where processes,” Stochastic Processes Appl., vol. 5, no. 3, pp. 231–241, he is currently an Assistant Professor of the Auto- 1977. matic Control Lab, School of Electrical Engineering. [22] F. Gantmacher,MatrixTheory. New York, NY, USA: Chelsea Pub. His research interests lie in system identification and Co., 1960, vol. 2. signal processing.