Learning Hidden Quantum Markov Models
Total Page:16
File Type:pdf, Size:1020Kb
Learning Hidden Quantum Markov Models Siddarth Srinivasan GeoffGordon Byron Boots Georgia Institute of Technology Carnegie Mellon University Georgia Institute of Technology Abstract quantum circuits to simulate classical Hidden Markov Models (HMMs); (2) what happens if we take full ad- vantage of this quantum circuit instead of enforcing the Hidden Quantum Markov Models (HQMMs) classical probabilistic constraints; and (3) how do we can be thought of as quantum probabilistic learn the parameters for quantum models from data? graphical models that can model sequential data. We extend previous work on HQMMs The paper is structured as follows: first we describe with three contributions: (1) we show how related work and provide background on quantum in- classical hidden Markov models (HMMs) can formation theory as it relates to our work. Next, we be simulated on a quantum circuit, (2) we re- describe the hidden quantum Markov model and com- formulate HQMMs by relaxing the constraints pare our approach to previous work in detail, and give for modeling HMMs on quantum circuits, and a scheme for writing any hidden Markov model as an (3) we present a learning algorithm to esti- HQMM. Finally, our main contribution is the intro- mate the parameters of an HQMM from data. duction of a maximum-likelihood-based unsupervised While our algorithm requires further optimiza- learning algorithm that can estimate the parameters tion to handle larger datasets, we are able to of an HQMM from data. Our implementation is slow evaluate our algorithm using several synthetic to train HQMMs on large datasets, and will require datasets generated by valid HQMMs. We further optimization. Instead, we evaluate our learn- show that our algorithm learns HQMMs with ing algorithm for HQMMs on several simple synthetic the same number of hidden states and predic- datasets by learning a quantum model from data and tive accuracy as the HQMMs that generated filtering and predicting with the learned model. We the data, while HMMs learned with the Baum- also compare our model and learning algorithm to max- Welch algorithm require more states to match imum likelihood for learning hidden Markov models the predictive accuracy. and show that the more expressive HQMM can match HMMs’ predictive capability with fewer hidden states on data generated by HQMMs. 1INTRODUCTION 2 BACKGROUND We extend previous work on Hidden Quantum Markov Models (HQMMs), and propose a novel approach to 2.1 Related Work learning these models from data. HQMMs can be Hidden Quantum Markov Models were introduced by thought of as a new, expressive class of graphical mod- Monras et al. [2010], who discussed their relationship to els that have adopted the mathematical formalism for classical HMMs, and parameterized these HQMMs us- reasoning about uncertainty from quantum mechanics. ing a set of Kraus operators. Clark et al. [2015] further We stress that while HQMMs could naturally be imple- investigated HQMMs, and showed that they could be mented on quantum computers, we do not need such viewed as open quantum systems with instantaneous a machine for these models to be of value. Instead, feedback. We arrive at the same Kraus operator repre- HQMMs can be viewed as novel models inspired by sentation by building a quantum circuit to simulate a quantum mechanics that can be run on classical com- classical HMM and then relaxing some constraints. puters. In considering these models, we are interested in answering three questions: (1) how can we construct Our work can be viewed as extending previous work by Zhao and Jaeger [2010] on Norm-observable opera- Proceedings of the 21st International Conference on Artifi- tor models (NOOM) and Jaeger [2000] on observable- cial Intelligence and Statistics (AISTATS) 2018, Lanzarote, operator models (OOM). We show that HQMMs can Spain. JMLR: W&CP volume 7X. Copyright 2018 by the be viewed as complex-valued extensions of NOOMs, author(s). formulated in the language of quantum mechanics. We Learning Hidden Quantum Markov Models use this connection to adapt the learning algorithm The density matrix is the general quantum equivalent for NOOMs in Zhao and Jaeger [2007] into the first of the classical belief state ~x and has diagonal elements known learning algorithm for HQMMs, and demon- representing the probabilities of being in each system strate that the theoretical advantages of HQMMs also state. Consequently, the normalization condition is hold in practice. tr(⇢ˆ)=1. The off-diagonal elements represent quan- tum coherences, which have no classical interpretation. Schuld et al. [2015a] and Biamonte et al. [2016] provide The density matrix ⇢ˆ is Hermitian and can be used to general overviews of quantum machine learning, and describe the state of any quantum system. describe relevant work on HQMMs. They suggest that developing algorithms that can learn HQMMs from The density matrix can also be extended to represent data is an important open problem. We provide just the joint state of multiple variables, or that of ‘multi- such a learning algorithm in Section 4. particle’ systems, to use the physical interpretation. If we have density matrices ⇢ˆ and ⇢ˆ for two qudits Other work at the intersection of machine learning and A B (a d-state quantum system, akin to qubits or ‘quan- quantum mechanics includes Wiebe et al. [2016] on tum bits’ which are 2-state quantum systems) A and quantum perceptron models and learning algorithms. B,wecantakethetensorproducttoarriveatthe Schuld et al. [2015b] discuss simulating a perceptron density matrix for the joint state of the particles, as on a quantum computer. ⇢ˆ = ⇢ˆ ⇢ˆ .Asavaliddensitymatrix,thediagonal AB A ⌦ B 2.2 Belief States and Quantum States elements of this joint density matrix represent probabil- ities; tr (ˆ⇢ ) =1,andtheprobabilitiescorrespondto Classical discrete latent variable models represent un- AB the states in the Cartesian product of the basis states certainty with a probability distribution using a vector of the composite particles. In this paper, the joint den- ~x whose entries describe the probability of being in sity matrix will serve as the analogue to classical joint the corresponding system state. Each entry is real and probability distribution, with the off-diagonal terms non-negative, and the entries sum to 1. In general, we encoding extra ‘quantum’ information. refer to the run-time system component that maintains astateestimateofthelatentvariableasan‘observer’, Given the joint state of a multi-particle system, we can and we refer to the observer’s state as a ‘belief state.’ examine the state of just one or few of the particles using the ‘partial trace’ operation, where we ‘trace over’ In quantum mechanics, the quantum state of a parti- the particles we wish to disregard. This lets us recover cle A can be written using Dirac notation as ,a | iA a‘reduceddensitymatrix’forasubsystemofinterest. column-vector in some orthonormal basis (the row- The partial trace for a two-particle system ⇢ˆ where vector is the complex-conjugate transpose = AB Ah | we trace over the second particle to obtain the state of ( )†) with each entry being the ‘probability ampli- | iA the first particle is: tude’ corresponding to that system state. The squared norm of the probability amplitude for a system state is ⇢ˆA = trB (ˆ⇢AB)= B j ⇢ˆAB j B (2) h | | i j the probability of observing that state, so the sum of X squared norms of probability amplitudes over all the For our purposes, this operation will serve as the quan- system states must be 1 to conserve probability. For tum analogue of classical marginalization. 1 i † example, = − is a valid quantum state, | i p2 p2 Finally, we discuss the quantum analogue of ‘condition- with basis statesh 0 andi 1 having equal probability ing’ on an observation. In quantum mechanics, the 2 2 1 i 1 act of measuring a quantum system can change the p = p = 2 .However,unlikeclassicalbelief 2 2 underlying distribution, i.e., collapses it to the observed states such as ~x = 1 1 T , where the probability of 2 2 state in the measurement basis, and this is represented di fferent states reflects ignorance about the underlying ⇥ ⇤ mathematically by applying von Neumann projection system, a pure quantum state like the one described operators (denoted Pˆy in this paper) to density matrices above is the true description of the system. describing the system. One can think of the projection But how can we describe classical mixtures of quantum operator as having ones in the diagonal entries corre- systems (‘mixed states’), where we also have classi- sponding to observed system states and zeros elsewhere. cal uncertainty about the underlying quantum states? If we only observe part of a larger system, the system Such information can be captured by a ‘density ma- collapses to the states where that subsystem had the ob- trix.’ Given a mixture of N quantum systems, each served result. For example, suppose we have the follow- with probability pi,thedensitymatrixforthisensemble ing density matrix for a two-state two-particle system is defined as follows: with basis 0 A 0 B, 0 A 1 B, 1 A 0 B, 1 A 1 B : N {| i | i | i | i | i | i | i | i } 0.25 0 0 0 00.25 0.50 ⇢ˆ = pi i i (1) ⇢ˆ = | ih | AB 2 0 0.50−.25 0 3 (3) i − X 6 00 00.257 4 5 Siddarth Srinivasan, GeoffGordon, Byron Boots Table 1: Comparison between classical and quantum representations Classical probability Quantum Analogue Description Representation Representation Description Belief State ~x ⇢ˆ Density Matrix Joint Distribution ~x 1 ~x 2 ⇢ˆX ⇢ˆX Multi-particle Density Matrix ⌦ 1 ⌦ 2 Marginalization ~x = y(~x ~y ) ⇢ˆ = trY (ˆ⇢X ⇢ˆY ) Partial Trace P⌦(y,~x) ⌦ Conditional probability P (~x y)= P (states y) trY (Pˆy⇢ˆXY Pˆ†) Projection + Partial Trace |P P (y) | / y Suppose we measure the state of particle B,andfindit The probabilities of observing each output at time t is to be in state 1 B.