Quantum Inference on Bayesian Networks
Total Page:16
File Type:pdf, Size:1020Kb
Quantum inference on Bayesian networks The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Low, Guang Hao, Theodore J. Yoder, and Isaac L. Chuang. “Quantum Inference on Bayesian Networks.” Phys. Rev. A 89, no. 6 (June 2014). © 2014 American Physical Society As Published http://dx.doi.org/10.1103/PhysRevA.89.062315 Publisher American Physical Society Version Final published version Citable link http://hdl.handle.net/1721.1/88648 Terms of Use Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. PHYSICAL REVIEW A 89, 062315 (2014) Quantum inference on Bayesian networks Guang Hao Low, Theodore J. Yoder, and Isaac L. Chuang Massachusetts Institute of Technology, 77 Massachusetts Avenue, Cambridge, Massachusetts 02139, USA (Received 28 February 2014; published 13 June 2014) Performing exact inference on Bayesian networks is known to be #P -hard. Typically approximate inference techniques are used instead to sample from the distribution on query variables given the values e of evidence variables. Classically, a single unbiased sample is obtained from a Bayesian network on n variables with at most m parents per node in time O(nmP (e)−1), depending critically on P (e), the probability that the evidence might occur in the first place. By implementing a quantum version of rejection sampling, we obtain a square-root m − 1 speedup, taking O(n2 P (e) 2 ) time per sample. We exploit the Bayesian network’s graph structure to efficiently construct a quantum state, a q-sample, representing the intended classical distribution, and also to efficiently apply amplitude amplification, the source of our speedup. Thus, our speedup is notable as it is unrelativized—we count primitive operations and require no blackbox oracle queries. DOI: 10.1103/PhysRevA.89.062315 PACS number(s): 03.67.Ac, 02.50.Tt I. INTRODUCTION tions of sampling algorithms, for which promising results such as natively probabilistic computing with stochastic How are rational decisions made? Given a set of possible logic gates have been reported [10]. In this same vein, actions, the logical answer is the one with the largest corre- we could also consider physical systems that already de- sponding utility. However, estimating these utilities accurately scribe probabilities and their evolution in a natural fash- is the problem. A rational agent endowed with a model and ion to discover whether such systems would offer similar partial information of the world must be able to evaluate the benefits. probabilities of various outcomes, and such is often done Quantum mechanics can in fact describe such naturally through inference on a Bayesian network [1], which efficiently probabilistic systems. Consider an analogy: If a quantum state encodes joint probability distributions in a directed acyclic is like a classical probability distribution, then measuring it graph of conditional probability tables. In fact, the standard should be analogous to sampling, and unitary operators should model of a decision-making agent in a probabilistic time- be analogous to stochastic updates. Though this analogy is discretized world, known as a partially observable Markov qualitatively true and appealing, it is inexact in ways yet to decision process, is a special case of a Bayesian network. be fully understood. Indeed, it is a widely held belief that Furthermore, Bayesian inference finds application in processes quantum computers offer a strictly more powerful set of tools as diverse as system modeling [2], model learning [3,4], data than classical computers, even probabilistic ones [11], though analysis [5], and decision making [6], all falling under the this appears difficult to prove [12]. Notable examples of the umbrella of machine learning [1]. Unfortunately, despite the vast space of applications, power of quantum computation include exponential speedups Bayesian inference is difficult. To begin with, exact inference for finding prime factors with Shor’s algorithm [13], and is #P -hard in general [1]. It is often far more feasible to square-root speedups for generic classes of search problems perform approximate inference by sampling, such as with through Grover’s algorithm [14]. Unsurprisingly, there is a the Metropolis-Hastings algorithm [7] and its innumerable ongoing search for ever more problems amenable to quantum specializations [8], but doing so is still NP-hard in general [9]. attack [15–17]. This can be understood by considering rejection sampling, a For instance, the quantum rejection sampling algorithm for primitive operation common to many approximate algorithms approximate inference was only developed quite recently [18], that generates unbiased samples from a target distribution alongside a proof, relativized by an oracle, of a square-root P (Q|E) for some set of query variables Q conditional on speedup in runtime over the classical algorithm. The algorithm, some assignment of evidence variables E = e. In the general just like its classical counterpart, is an extremely general case, rejection sampling requires sampling from the full method of doing approximate inference, requiring preparation joint distribution P (Q,E) and throwing away samples with of a quantum pure state representing the joint distribution incorrect evidence. In the specific case in which the joint P (Q,E) and amplitude amplification to amplify the part of distribution is described by a Bayesian network with n nodes the superposition with the correct evidence. Owing to its each with no more than m parents, it takes time O(nm)to generality, the procedure assumes access to a state-preparation ˆ generate a sample from the joint distribution, and so a sample oracle AP , and the runtime is therefore measured by the from the conditional distribution P (Q|E) takes average time query complexity [19], the number of times the oracle must O(nmP (e)−1). Much of the computational difficulty is related be used. Unsurprisingly, such oracles may not be efficiently to how the marginal P (e) = P (E = e) becomes exponentially implemented in general, as the ability to prepare arbitrary states small as the number of evidence variables increases, since only allows for witness generation to QMA-complete problems, samples with the correct evidence assignments are recorded. the quantum analog to classically deterministic NP-complete One very intriguing direction for speeding up approx- problems [18,20]. This also corresponds consistently to the imate inference is in developing hardware implementa- NP-hardness of classical sampling. 1050-2947/2014/89(6)/062315(7) 062315-1 ©2014 American Physical Society GUANG HAO LOW, THEODORE J. YODER, AND ISAAC L. CHUANG PHYSICAL REVIEW A 89, 062315 (2014) FIG. 1. (Color online) (a) An example of a directed acyclic graph which can represent a Bayesian network by associating with each node a conditional probability table. For instance, associated with the node X1 is the one value P (X1 = 1), while that of X5 consists of four values, the probabilities X5 = 1 given each setting of the parent nodes, X2 and X3. (b) A quantum circuit that efficiently prepares the q-sample representing the full joint distribution of panel (a). Notice in particular how the edges in the graph are mapped to conditioning nodes in the circuit. The |ψj represent the state of the system after applying the operator sequence U1,...,Uj to the initial state |0000000. In this paper, we present an unrelativized (i.e., no oracle) We adopt the standard convention of capital letters (e.g., X) square-root, quantum speedup to rejection sampling on a representing random variables while lowercase letters (e.g., a) Bayesian network. Just as the graphical structure of a Bayesian are particular fixed values of those variables. For simplicity, network speeds up classical sampling, we find that the same the random variables are taken to be binary. Accordingly, structure allows us to construct the state-preparation oracle AˆP probability vectors are denoted P (X) ={P (X = 0),P (X = 1)} efficiently. Specifically, quantum sampling from P (Q|E = e) while P (x) ≡ P (X = x). Script letters represent a set of m −1/2 −1 takes time O(n2 P (e) ), compared with O(nmP (e) ) random variables X ={X1,X2,...,Xn}. for classical sampling, where m is the maximum number An arbitrary joint probability distribution P (x1,x2,...,xn) of parents of any one node in the network. We exploit the on n bits can always be factored by recursive application of structure of the Bayesian network to construct an efficient Bayes’s rule P (X,Y ) = P (X)P (Y |X), ˆ m quantum circuit AP composed of O(n2 ) controlled-NOT gates n and single-qubit rotations that generates the quantum state P (x1,x2,...,xn) = P (x1) P (xi |x1,...,xi−1). (1) |ψ P Q,E P representing the joint ( ). This state must then i=2 be evolved to |Q representing P (Q|E = e), which can be done by performing amplitude amplification [21], the source However, in most practical situations a given variable Xi will of our speedup and heart of quantum rejection sampling in be dependent on only a few of its predecessors’ values, those ⊆{ } general [18]. The desired sample is then obtained in a single we denote by parents (Xi ) x1,...,xi−1 [see Fig. 1(a)]. measurement of |Q. Therefore, the factorization above can be simplified to We better define the problem of approximate inference n with a review of Bayesian networks in Sec. II. We discuss P (x1,x2,...,xn) = P (x1) P (xi | parents(Xi )). (2) the sensible encoding of a probability distribution in a i=2 quantum state axiomatically in Sec. III. This is followed A Bayes net diagrammatically expresses this simplification, by an overview of amplitude amplification in Sec.