Improved Boltzmann Machines with Error Corrected Quantum Annealing

Limitations of error corrected quantum annealing in improving the performance of Boltzmann machines Richard Y. Li,1, 2 Tameem Albash,3, 4, 5 and Daniel A. Lidar1, 2, 6, 7 1Department of Chemistry, University of Southern California, Los Angeles, California 90089, USA 2Center for Quantum Information Science & Technology, University of Southern California, Los Angeles, California 90089, USA 3Department of Electrical & Computer Engineering, University of New Mexico, Albuquerque, New Mexico 87131, USA 4Department of Physics and Astronomy, University of New Mexico, Albuquerque, New Mexico 87131, USA 5Center for Quantum Information and Control, University of New Mexico, Albuquerque, New Mexico 87131, USA 6Department of Electrical Engineering, University of Southern California, Los Angeles, California 90089, USA 7Department of Physics and Astronomy, University of Southern California, Los Angeles, California 90089, USA (Dated: July 8, 2020) Boltzmann machines, a class of machine learning models, are the basis of several deep learning methods that have been successfully applied to both supervised and unsupervised machine learning tasks. These models assume that some given dataset is generated according to a Boltzmann distribution, and the goal of the training procedure is to learn the set of parameters that most closely match the input data distribution. Training such models is difficult due to the intractability of traditional sampling techniques, and proposals using quantum annealers for sampling hope to mitigate the cost associated with sampling. However, real physical devices will inevitably be coupled to the environment, and the strength of this coupling affects the effective temperature of the distributions from which a quantum annealer samples. To counteract this problem, error correction schemes that can effectively reduce the temperature are needed if there is to be some benefit in using quantum annealing for problems at a larger scale, where we might expect the effective temperature of the device to be too high. To this end, we have applied nested quantum annealing correction (NQAC) to do unsupervised learning with a small bars and stripes dataset, and to do supervised learning with a coarse-grained MNIST dataset, which consists of black-and-white images of hand-written integers. For both datasets we demonstrate improved training and a concomitant effective temperature reduction at higher noise levels relative to the unencoded case. We also find better performance overall with longer anneal times and offer an interpretation of the results based on a comparison to simulated quantum annealing and spin vector Monte Carlo. A counterintuitive aspect of our results is that the output distribution generally becomes less Gibbs-like with increasing nesting level and increasing anneal times, which shows that improved training performance can be achieved without equilibration to the target Gibbs distribution. I. INTRODUCTION multiple times for a shorter anneal time, such that the time it takes to find the ground state with high probability is mini- mized [9]. The existence of commercially available quantum anneal- Physical implementations of QA, however, will be coupled ers of a non-trivial size [1–3] along with the experimental to the environment, which may introduce additional sources verification of entanglement [4] and multi-qubit tunneling [5] of errors, such as dephasing errors and thermal excitation er- have ignited interest and a healthy debate concerning whether rors [10–12]. Theoretical and experimental studies have in- quantum annealing (QA) may provide an advantage in solv- dicated that due to relatively strong coupling to a thermal ing classically hard problems. QA can be considered a special bath, current quantum annealing devices operate in a quasi- case of adiabatic quantum computation (AQC) (for a review static regime [13–17]. In this regime there is an initial phase see Ref. [6]). In AQC, computation begins from an initial of quasi-static evolution in which thermalization times are Hamiltonian, whose ground state is easy to prepare, and ends much shorter than the anneal time, and thus the system closely arXiv:1910.01283v2 [quant-ph] 7 Jul 2020 in a final Hamiltonian, whose ground states encodes the solu- matches a Gibbs distribution of the instantaneous Hamilto- tion to the computational problem. In a closed-system setting nian. Towards the end of the anneal, thermalization times with no coupling to the external environment, the adiabatic grow and eventually become longer than the anneal time, and theorem guarantees that the system will remain in an instan- the system enters a regime in which the dynamics are frozen. taneous ground state, provided the interpolation from initial The states returned by a quantum annealer operating in this to final Hamiltonian is sufficiently slow, such that all non- regime therefore more closely match a Gibbs distribution not adiabatic transitions are suppressed. The runtime to ensure of the final Hamiltonian, but of the Hamiltonian at the freezing this happens is related to the inverse of the minimum gap en- point. countered during computation [7,8]. In this setting, errors The fact that open-system QA prepares a Gibbs state may arise only from non-adiabatic transitions and any control er- be a bug for optimization problems [17] but it could be a rors (e.g., in the annealing schedule or in the initial and final feature for sampling applications. Recently, there has been Hamiltonians). In QA, instead of running the computation interest in using QA to sample from classical or quantum once at a sufficiently long anneal time such that the adiabatic Gibbs distributions (see Ref. [18] for a review), and there is theorem is obeyed, one may choose to run the computation interest in whether QA can prepare such distributions faster 2 than using temperature annealing methods [19]. One applica- in a uniform superposition over all input computational basis z tion where sampling plays an important role is the training of states (defined in the σ basis). HP , the problem Hamilto- Boltzmann machines (BMs). These are a class of probabilistic nian, is defined on a graph G = (V; E) composed of a set of energy-based graphical models which are the basis for power- N = jVj vertices and edges E: ful deep learning models that have been used for both supervised (with labels) and unsupervised (without labels) learning X z X z z tasks [20, 21]. HP = hiσi + Jijσi σj : (2) As its name suggests, a Boltzmann machine assumes that i2V (i;j)2E the target dataset is generated by some underlying probability distribution that can be modeled by a Boltzmann or Gibbs dis- The local fields fhig and the couplings fJijg are used to rep- tribution. Whereas the temperature does not play a large role resent the computational problem, and are programmable pa- for classical methods, as the values of model parameters can rameters in hardware implementations of quantum annealing. simply be scaled as needed, physical constraints on current In the remainder of this section we present an overview of (and future) quantum annealers limit the range of model pa- NQAC and Boltzmann machines. rameters that can be programmed. As such, an important pa- rameter for using a physical quantum annealer is the effective temperature Teff , corresponding to the best-fit classical Gibbs A. NQAC temperature for the distribution output by the annealer. For example, Teff is effectively infinite for the initial state of the NQAC is an implementation of a repetition code that can quantum annealer, the uniform-superposition state, since the encode problems with arbitrary connectivity, allows for a vari- samples drawn from a quantum annealer at this point would be able code-size and also can be implemented on a generic nearly random, and this would make training of a Boltzmann quantum annealing device [22]. machine nearly impossible. In general, to implement QAC, we encode the original (or Nested quantum annealing correction (NQAC) [22] is a “logical”) quantum annealing Hamiltonian H(s) in an “en- form of quantum annealing correction (QAC) [23–26] tai- coded physical Hamiltonian” H¯ (s), using a repetition code lored for use on quantum annealing devices, including com- [23, 24, 32]: mercially available ones, that was developed to address some of these concerns. NQAC achieves error suppression by in- H¯ (s) = A(s)HX + B(s)H¯P ; s 2 [0; 1]; (3) troducing an effective temperature reduction [22, 27, 28], and previous work has shown that NQAC can be used to improve where H¯ is the “encoded physical problem Hamiltonian” optimization performance and obtain more accurate estimates P and all terms in H¯P are defined over a set of physical qubits of the gradient in the training step of a BM [27]. In this that is larger than the number of logical qubits in the original work we apply NQAC to an entire training procedure of fully- unencoded problem. The states of the logical problem Hamil- visible Boltzmann machines. We demonstrate an improve- tonian H can then by recovered by properly decoding the ment for both supervised and unsupervised machine learn- P states of H¯P . Encoding the driver Hamiltonian would make ing tasks using NQAC, explore the effects of increased an- this a full stabilizer code [33] and would provide improved neal times, and make comparisons to spin-vector Monte Carlo performance with error correction since it would enable the (SVMC) [29] and simulated quantum annealing (SQA) [30] to implementation of fully encoded adiabatic quantum compu- probe the underlying physics of using a D-Wave (DW) quan- tation, for which rigorous error suppression results have been tum annealer as a sampler. proven [34–39]. However, unfortunately this is not possible The remainder of the paper is structured as follows. We with present implementations of quantum annealers; i.e., only provide some more technical background, both of Boltzmann HP is encoded in QAC. machines and the NQAC construction in Sec.II.

Load more