<<

New Trends in Quantum Machine Learning

Lorenzo Buffoni1, 2 and Filippo Caruso1, 3 1Dipartimento di Fisica e Astronomia, Universit´adi Firenze, I-50019 Sesto Fiorentino, Italy 2Dipartimento di Ingegneria dell’Informazione, Universit´adi Firenze, I-50139 Firenze, Italy 3LENS, QSTAR and CNR-INO, I-50019 Sesto Fiorentino, Italy Here we will give a perspective on new possible interplays between Machine Learning and Quantum , including also practical cases and applications. We will explore the ways in which machine learning could benefit from new quantum technologies and algorithms to find new ways to speed up their computations by breakthroughs in physical hardware, as well as to improve existing models or devise new learning schemes in the quantum domain. Moreover, there are lots of experiments in quantum physics that do generate incredible amounts of data and machine learning would be a great tool to analyze those and make predictions, or even control the experiment itself. On top of that, data visualization techniques and other schemes borrowed from machine learning can be of great use to theoreticians to have better intuition on the structure of complex manifolds or to make predictions on theoretical models. This new research field, named as Quantum Machine Learning, is very rapidly growing since it is expected to provide huge advantages over its classical counterpart and deeper investigations are timely needed since they can be already tested on the already commercially available quantum machines.

I. INTRODUCTION trends of machine learning in the quantum domain, cov- ering all the main learning paradigms and listing some Machine learning (ML) [1–4] is a broad field of study, very recent results. Thus we will try to briefly define with multifaceted applications of cross-disciplinary the scope of each one of these ML domains, referring to breadth. ML ultimately aims at developing computer the literature for a complete overview of the specific top- algorithms that improve automatically through experi- ics. The three main classes of learning algorithms are the ence. The core idea of artificial intelligence (AI) tech- following ones: nology is that systems can learn from data, so as to • Supervised learning: In supervised learning identify distinctive patterns and make consequently de- [1, 33] one deals with an annotated dataset cisions, with minimal human intervention. The range of N {(xi, yi)}i=1. Each element xi is called an input applications of ML methodologies is extremely vast [5– or feature vector. It can be the vector of pixel val- 8], and still growing at a steady pace due to the pressing ues of an image or a feature such as height, weight need to cope with the efficiently handling of big data [9]. and gender and so on. All input data xi of the Training and deployment of large-scale machine learn- same dataset share the same features (with differ- ing models faces computational challenges [10] that are ent values). The label yi it is the ground truth only partially met by the development of special purpose upon which we build the knowledge of our learn- classical computing units such as GPUs. This has led to ing algorithm. It can be a discrete class in a set of an interest in applying to machine possible objects or a real number representing some learning tasks [11–15] and to the development of several property we want to predict, or even some complex quantum algorithms [16–19] with the potential to acceler- data structure. For example if we want to build ate training. Most quantum machine learning algorithms a spam classifier the labels will be yi = 1 (spam) need fault-tolerant quantum computation [20–22], which or yi = 0 (not spam). The goal of a supervised requires the large-scale integration of millions of qubits learning algorithm is to use the dataset to produce and is still not available today. It is however possible a model that, given an input vector x, can predict arXiv:2108.09664v1 [quant-ph] 22 Aug 2021 that quantum machine learning (QML) will provide the the correct label y. first breakthrough algorithms to be implemented on com- mercially available Noisy Intermediate Scale Quantum • Unsupervised learning: In unsupervised learn- (NISQ) devices [23–26]. Indeed, a number of interesting ing [34–36] the dataset is a collection of unlabeled N breaktrhoughs have alreasy been made at the interface of vectors {xi}i=1. The goal of unsupervised learning Quantum Physics and Machine Learning [27]. For exam- is to take this input vector and extract some useful ple in many-body quantum physics Machine Learning has property from the single data or the overall data been successfully employed to speed up simulations [28], distribution of the dataset. Examples of unsuper- predict phases of matter [29] or find variational ansatz for vised learning are: clustering where the predicted many body quantum states [30]. Similarly in quantum property is the cluster assignment; dimensionality computation Machine Learning has been recently found reduction where the distribution of data is mapped success in quantum control [31] and to provide error cor- in a lower dimensional manifold; outlier detection rection [32]. where the property predicted is the ’typicality’ of In this paper we will give our perspective on new possible the data with respect to its distribution; and gen- 2

erative models where we want to learn to gener- ate new points from the same distribution of the dataset.

• Reinforcement learning: The ML subfield of reinforcement learning [5] assumes that the ma- chine ’lives’ in an environment and can probe the state of the environment as a feature vector. The machine can perform different actions at different states bringing to different rewards. The goal of this machine (or agent) is to learn a policy/strategy. A policy is a function that associates to a partic- ular feature vector, representing the state of the environment, the best action to execute. The op- timal policy maximizes the expected average re- ward. Reinforcement learning has been widely em- ployed in scenarios where decision making and long- FIG. 1: Scheme of the various types of QML. The data to term goals are crucial, for example in playing chess, be processed can be either classical or quantum and the algo- controlling robots or logistics in complex environ- rithm that process the data itself can also be either classical ments. or quantum. That gives four possible combinations of algo- rithms, three of which (QQ, QC, CQ) usually fall under the Now it is important to point out that there is no rigor- umbrella of QML. ous universal definition of QML and instead it has as- sumed different meanings in the context where it has into a single qubit state |xi could be the following: been studied in literature. The main subjects are the data that can be either classical or quantum, and the al- RX (x)RY (θ3)RX (x)RY (θ2)RX (x)RY (θ1)RX (x) |0i −→ |xi . gorithm used to perform ML on the data itself, which can (1) also be classical or quantum. One indeed can define four where RX and RY are rotation operator along the axes X macro categories of algorithms [37] as depicted in Fig.1, and Y , respectively, while the rotation angles {θ1, θ2, θ3} three of which are generally considered QML. There are are the trainable parameters of our learning model. Once N also a lot of gray areas where the algorithms are hybrid a dataset {xi}i=1 is embedded, one can compute (using a 2 quantum-classical or only a subroutine or an optimiza- SWAP test) the overlaps | hxi|xji | . Points (states) be- tion task is carried out by the quantum processor. An longing to the same class will have an overlap close to exhaustive list of all possible interplays between quantum 1 while points from different classes will have an overlap and classical ML algorithms is outside the scope of this close to 0, hence enabling the classification of the dataset. paper, so we will stick to this simplified representation The training can be done using software such as Penny- for the sake of clarity. lane [48] that computes the gradients of the parametric quantum gates. In Fig. 2 we show an embedding cir- cuit for an one-dimensional dataset that can be trained by small quantum processors with the circuit proposed II. SUPERVISED LEARNING above. In Ref. [49] we train an embedding of that dataset There are already in literature several examples of su- and test it with 10 validation points on a real quantum pervised learning applications for NISQ devices. For in- processor. The process has been carried out in the IBM stance, small gate-model devices and quantum annealers quantum platform using the Valencia QPU (Quantum have been used to perform quantum heuristic optimiza- Processing Unit) composed of 5 qubits. The number of tion [26, 38–42] and to solve classification problems [43– samples was rather small, namely 100 samples per point. 46]. An interesting approach to classification was devised That is due to the queues to access the system and to the in Ref. [47], where classical data are embedded into a fact that we need to compute 100 overlaps to build the larger quantum (Hilbert) space describing the state of a Gram matrix for our 10 test points. As one can see in quantum system. The idea, which is similar in spirit to Fig. 3 our results [49], even if they contain some noise, classical Support Vector Machines (SVMs) [1], is to map are good and able to achieve a good classification bound- classical data into an high dimensional space where the ary between the two classes. That is quite remarkable classes can be well separated by an hyperplane. In the as this scheme is implemented on a real NISQ device via quantum case the embedding is done by a quantum cir- the cloud. Let us also point out that the error on the cuit composed of single- and multi-qubit gates, effectively overlap depends on the number of samples we take and mapping the classical data into the Hilbert space of the it could be the case that our 100 samples do not contain qubits. For example, an embedding of a classical point x enough statistics. This method has then been proved ef- 3

one has to deal with a fundamental problem in quan- tum physics that is the discrimination amongst a set of non-orthogonal quantum states of a system. This can be addressed for instance by following a recent approach [50] based on the quantum stochastic walk model [51, 52] and inspired by the structure of classical neural networks. In particular, the quantum states to discriminate can be encoded on the input nodes of a network, while the dis- crimination is achieved on the output nodes. Then, the parameters of the underlying network (e.g. hopping rates (a) or link weights) are optimized to obtain the highest prob- ability of correct discrimination. Interestingly enough, this probability approaches the theoretical optimal quan- tum limit known as Helstrom bound [53]. Our numerical and analytical results in Ref. [50] also show the robust- ness and reconfigurability of the network for different set of quantum states. Our proposed (QQ-type) architec- ture can pave the way for experimental realizations of (b) QML protocols as well as novel quantum generalizations of deep learning algorithms running over larger complex FIG. 2: In panel (a) the 1D syntetic dataset used as classi- networks. fication benchmark. Class A are the blue dots while class B Finally, let us mention a CQ-type QML application we are the red crosses. Note that in the 1D space the dataset are proposing in Ref. [54] where artificial neural network cannot be linearly classified, i.e. there is not a simple thresh- models [1, 55] are exploited for quantum noise discrimina- old allowing us to separate the two classes. In panel (b) an example of the embedding circuit including the final SWAP tion in stochastic quantum dynamics. In particular some test to compute the overlap between the two embeddings [49]. Hamiltonian parameters are affected by noise, which is described as a stochastic process associated to a spe- cific probability distribution. Our aim is to discriminate among different noise probability distributions and cor- relation parameters, by measuring the quantum state at discrete time instants. In particular, we proposed the use of classical ML algorithms such as SVMs and Recurrent Neural Networks (RNN), i.e. Long Short Term Mem- ory (LSTM) [56] and Gated Recurrent Unit (GRU) [57], to perform supervised classification tasks on simulated quantum data. We believe that these are preliminary steps towards QML algorithms for quantum sensing [58], e.g. to detect Markovian vs. non-Markovian noise in (a) (b) real quantum systems as optical and atomic platforms that are very promising candidates for NISQ devices. FIG. 3: (a) Theoretical Gram matrix, representing the over- laps between 10 test embeddings; (b) Experimental Gram matrix on the IBM Valencia QPU. The achieved results are pretty consistent with the theory albeit a bit noisier than III. UNSUPERVISED LEARNING other setups that are exploited in Ref. [49]. Unlike supervised learning [33], unsupervised learning is a much harder, and still largely unsolved, problem. fective on small datasets (embedded in 1 or 2 qubits) and And yet, it has the appealing potential to learn the hid- should be quite robust to experimental noise, yielding den statistical correlations of large unlabeled datasets some promises for the future of NISQ devices. Indeed, as [34–36], which constitute the vast majority of data be- argued in Ref. [47], we can map high-dimensional data in ing available today. a n-qubit state using the same protocol. And, if we could For instance, clustering algorithms have the ability to build a circuit of 100 qubits with circuit depth 100 and a separate unlabeled data in different classes (or clusters) decoherence time of 10−3s, it could be capable to embed without the need for external labeling and supervision. O(1010) bits of classical , i.e. a task which Hybrid approaches have been tried to solve clustering [59] is classically unattainable. Therefore, high-dimensional by applying a quantum-based optimization scheme to a embeddings of large data sets for (QC-type) QML could classical clustering algorithm. In order to represent the be accessible in future on NISQ devices. clusters, we need to define a distance measure d(xi, xj) Once classical data are embedded on quantum states, between two data elements xi and xj. For example, a 4 choice could be the Euclidean distance, but specific appli- each other in a zero-sum game where on one side the gen- cations may naturally lead to very different metrics. You erator aims to generate ’fake’ data that the discriminator can then calculate the distance matrix C with elements is not able to distinguish from the real generated ones, Cij = d(xi, xj). Now this matrix can also be interpreted while on the other side the latter optimizes its discrim- as adjacency matrix of a weighted graph G and, by a ination strategy. Under some reasonable assumptions, suitable choice of metric and doing some coarse graining, it is possible to demonstrate that the game reaches the the problem of clustering reduces to the MAXCUT op- unique (Nash) equilibrium point where the generator is timization problem [60] on the graph G. The MAXCUT able to exactly reproduce the desired (real) data distribu- problem is an example of the class of NP-complete prob- tion. GANs have been generalized to the quantum case lems, which are notoriously hard to be solved. In Ref. leading to the so-called Quantum Generative Adversar- [59] an hybrid scheme is devised where the MAXCUT ial Networks (QGANs) [67, 68]. They are an example of optimization is solved by the QAOA algorithm [41] and QQ-type QML. Again the goal is to learn of reproduc- applied to a syntetic dataset composed of 2 clusters of 10 ing the state of a physical system that is now quantum, points living in a 2-dimensional space. The model was e.g. a register of qubits. This can be for instance im- also successfully tested on a real 19-qubit chip produced plemented by exploiting parametrized quantum circuits by Rigetti as demonstration of a QC-type application. (where quantum gates are controlled by real tunable pa- Another opportunity to use quantum technologies on rameters) [69] allowing to realize, among others, quantum unsupervised learning tasks comes from Variational Au- approximation optimization algorithms [41], VAEs [70], toencoders (VAEs) [61]. Variational Autoencoders are and eigensolvers [42]. However, recent efforts have been generative models in which the goal is to learn a complex mainly focused on learning pure states [68, 71], while the data distribution (e.g. the pixels of images represent- scenario of mixed quantum states is currently under in- ing cats) to later sample from it ’generating’ new ob- vestigation and some preliminary results are appearing jects. In particular VAEs learn a probability distribution in literature [72, 73]. In particular, in Ref. [73] we show how the emergence of limit cycles may considerably ex- pθ(x, z) = pθ(x|z)pθ(z), where usually the a posterior tend the convergence time in the case of mixed quantum distribution pθ(x|z) is implemented by a Deep Neural (generated data) states that of course play a key role Network and pθ(z) is simple (prior) distribution (e.g., i.i.d Gaussian or Bernoulli variables). The idea here is to for practical implementations on commercially available NISQ devices. replace the distribution pθ(z) with a complex distribu- tion sampled from a quantum device [62, 63]. This setup allows for the construction of quantum-classical hybrid generative models that can be scaled to large, realistic datasets. For example in Ref. [63] a quantum-classical IV. REINFORCEMENT LEARNING hybrid VAE was trained using a D-Wave 2000Q quantum annealer on the popular MNIST [64] dataset of hand- written digits. This quantum-classical hybrid VAE still Reinforcement learning has been already applied to employs a large amount of classical computing power per- closed quantum systems, i.e. following a unitary dynam- formed on modern GPUs. The computational task that ical evolution [37]. Yet, the setting of an agent acting on we offloaded to the quantum annealer (sampling from a an environment has a natural analogue in open quantum complex distribution) can still be performed classically at systems [74]. In Ref. [75] we propose to generalize RL a fraction of the overall computational cost. To achieve into the quantum domain in order to solve a Quantum any form of quantum advantage in this framework, we Maze problem. It is an example of QQ-type QML. For need to offload generative capacity to the prior distribu- Quantum Maze we mean a network whose topology is tion, by exploiting large graphs capable of representing represented by a perfect maze, i.e. there is always one complex probability distributions from which classical unique path between any two points in the maze, and sampling becomes too expensive. We have evidence that the dynamical evolution of the physical state (described this path to quantum advantage is possible by deploy- by its adjacency matrix ρ) of this network is quantum. ing annealers with denser connectivities and lower noise, The paths of the maze are defined by links between pair engineering classical neural networks that better exploit of nodes, which are described by an adjacency matrix physical connectivities and by working with more com- A. The coefficient Ai,j = 1 indicates the presence of the plex datasets. All these improvements should be achiev- link, while Ai,j = 0 indicates its absence. The evolution able in the near future, and represent possible interesting of the quantum system has been based on the quantum lines of research in the area of QC-type QML models. stochastic walk model [51, 52] that can be described by a Lindblad master equation [76] where the Hamiltonian of Furthermore, one of the most remarkable results in un- the system H corresponds to the adjacency matrix itself, supervised ML is provided by generative adversarial net- i.e. H = A, and with a set of Lindblad operators acting works (GANs) [65, 66], where game-theory models aim to as noise for the quantum walker, i.e. learn how to faithfully reproduce some given distribution of data. In particular, the idea is that two agents, named as the generator and the discriminator, compete against ρ˙ = −(1 − p) i[H, ρ] + p LCRW (ρ) + Lsink(ρ) (2) 5 with a term describing (for p = 1) the regime of a classical random walk (CRW), as

X 1 L (ρ) = L ρL† − {L† L , ρ}, (3) CRW ij ij 2 ij ij i,j with the Lindblad operators being defined as Lij = (Aij/dj) |ii hj|, where {dj} are the node degrees. When p = 0 one recovers the pure quantum walker regime, while p = 1 corresponds to the classical scenario for a random walker. For the intermediate values of p in the range ]0, 1[, one has a quantum stochastic walker with an (a) interplay between coherent evolution and noise effects. In addition, there is a Lindblad operator that irreversibly episodes transfers the population from the “exit” node n to the episode running average training sink S with a rate Γ, as follows training running average baseline final trained Lsink(ρ) = Γ [2 |Si hn| ρ |ni hS| − {|ni hn| , ρ}] . (4)

Moreover, we assume that all population is initially at the entrance node of the maze. The exit of the maze corresponds to the sink S which irreversibly traps the population. Equation 2 gives leads to the following ex- pression for the escaping probability from the maze

t 0 0 psink(t) = 2Γ ρn,n(t ) dt . (5) (b) ˆ0 The goal is to maximize this escaping probability in the FIG. 4: In panel (a) a sample of a 6 × 6 perfect maze. In shortest amount of time. Since we consider a perfect white the possible paths, in black the walls. In the lower maze, there is a single path to exit the maze from the left corner, the node in blue is the entrance to the maze, entrance node, in presence of several dead ends. In this corresponding to the initial quantum state, while the node in red is the exit, i.e., the node connected to the sink. In scenario, an external controller (the computer user or panel (b) an example of a training curve for an agent doing the quantum system itself) is the agent that has some actions at periodic time instants, in the regime of p = 0.8 on information about the quantum state of the system, while the 6 × 6 maze in (a). The curves show the rewards from the the maze is the RL environment. The available actions single episodes and their running average over 100 episodes for the agent, during the evolution, are: as well as the training curve of the agent with the running average again over 100 episodes. The two constant lines are 1 – Building walls. During the evolution, at given (pe- the baseline quantum walker with no actions performed by riodic) time instants, the system can change the the agent and the final trained policy. [75] adjacency matrix of the environment removing a link (changing from 1 to 0 an entry Ai,j). This em- ulates the closing of a door through a passage that dynamically changing the topology of the maze, even by for instance leads to a dead end or that does not just adding or removing a small number of walls, we can bring efficiently the walker to the exit. significantly improve the transfer rate of the walker to 2 – Breaking walls. The system can change the ad- the sink. We can also imagine the action of modifying jacency matrix of the environment adding a link the adjacency matrix as a sort of additional, engineered noise that, if carefully tuned, can improve the exit per- (changing from 0 to 1 an entry Ai,j). This mim- ics the creation of an hole in a maze wall, hence formance of the stochastic quantum walker. An example opening a shortcut. of such an improvement is shown in Fig.4 where an agent was trained to perform the described actions on a 6 × 6 In this setup, the objective function is the amount of perfect maze. We can see how the performance improves time required to exit the maze (to be minimized) or the from the baseline stochastic quantum walker as the agent amount of population that exits to the sink in a given learns how to get better rewards, i.e. transferring more amount of time (to be maximized). If the adjacency ma- population into the sink. The number of actions made by trix can change – either intrinsically by random flips or the agent was quite small (8 in this particular case) but as a result of actions taken by the agent – we are in carefully planned in order to yield a great boost to the the canonical scenario for RL but in a quantum domain efficiency of the walker. Given the key role of transport (quantum RL). We thus want to learn a policy by which, of energy or information over complex networks, this ap- 6 proach (see more details in Ref. [75]) might represent classical-quantum protocols allowing to evaluate the po- an interesting and promising step towards novel QML tential advantages of exploiting them over their classical schemes for NISQ devices and quantum technologies in ML counterparts and then to exploit them on commer- general. cially available or coming soon NISQ devices and quan- tum technologies. We have hereby presented a perspec- tive on new recent algorithms covering the main areas of V. CONCLUSIONS ML (supervised, unsupervised and reinforcement learn- ing) and different combinations of quantum-classical data Quantum Machine Learning is where nowadays ma- and algorithms (QQ, QC, CQ models). We have also chine learning is going to meet quantum information sci- pointed out that there is still plenty of work to do, as it is ence in order to realize more powerful quantum technolo- definitely necessary to scale up the algorithms listed here gies. However, a much deeper understanding of their (and all the other models that have been proposed re- underlying mechanisms is still required in order to de- cently) to the limits of existing devices performing an ac- velop new algorithms and especially to apply them to curate scaling analysis of performances and correspond- address real problems and then to lead to commercial ing errors. Ultimately, the crucial question we will need applications. This is a very young but very rapidly de- to answer in this field is whether a quantum speedup veloping research field where the the first results pave the is theoretically and experimentally feasible via Quantum way for new experimental demonstrations of such hybrid Machine Learning models running on NISQ machines.

[1] Christopher M. Bishop. Springer, New York, 1st ed. 2006. Somma. arXiv preprint arXiv:1511.02306, 2015. corr. 2nd printing 2011 edition edition, April 2011. [19] Seth Lloyd, Masoud Mohseni, and Patrick Rebentrost. [2] T. M. Cover and Joy A. Thomas. Wiley series in telecom- Nature Physics, 10(9):631–633, 2014. munications. Wiley, New York, 1991. [20] Michael A Nielsen and Isaac Chuang, 2002. [3] Trevor Hastie, Robert Tibshirani, and Jerome Friedman. [21] Daniel A Lidar and Todd A Brun. Cambridge University Springer Science & Business Media, 2009. Press, 2013. [4] http://themlbook.com/. [22] Austin G Fowler, Matteo Mariantoni, John M Martinis, [5] Richard S Sutton and Andrew G Barto. MIT press, 2018. and Andrew N Cleland. Physical Review A, 86(3):032324, [6] Alex Graves, Abdel-rahman Mohamed, and Geoffrey 2012. Hinton. In 2013 IEEE international conference on [23] Edward Farhi, Jeffrey Goldstone, Sam Gutmann, Joshua acoustics, speech and signal processing, pages 6645–6649. Lapan, Andrew Lundgren, and Daniel Preda. Science, IEEE, 2013. 292(5516):472–475, 2001. [7] Nicu Sebe, Ira Cohen, Ashutosh Garg, and Thomas S [24] Mark W Johnson, Mohammad HS Amin, Suzanne Huang. volume 29. Springer Science & Business Media, Gildert, Trevor Lanting, Firas Hamze, Neil Dickson, 2005. R Harris, Andrew J Berkley, Jan Johansson, Paul Bunyk, [8] Sorin Grigorescu, Bogdan Trasnea, Tiberiu Cocias, and et al. Nature, 473(7346):194–198, 2011. Gigel Macesanu. Journal of Field Robotics, 37(3):362– [25] C Neill, P Roushan, K Kechedzhi, S Boixo, SV Isakov, 386, 2020. V Smelyanskiy, R Barends, B Burkett, Y Chen, Z Chen, [9] Min Chen, Shiwen Mao, and Yunhao Liu. Mobile net- et al. arXiv preprint arXiv:1709.06678, 2017. works and applications, 19(2):171–209, 2014. [26] Abhinav Kandala, Antonio Mezzacapo, Kristan Temme, [10] Tijmen Tieleman. In Proceedings of the 25th interna- Maika Takita, Markus Brink, Jerry M Chow, and Jay M tional conference on Machine learning, pages 1064–1071. Gambetta. Nature, 549(7671):242, 2017. ACM, 2008. [27] Giuseppe Carleo, Ignacio Cirac, Kyle Cranmer, Lau- [11] Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione. rent Daudet, Maria Schuld, Naftali Tishby, Leslie Vogt- Contemporary Physics, 56(2):172–185, 2015. Maranto, and Lenka Zdeborov´a. Reviews of Modern [12] Peter Wittek. Academic Press, 2014. Physics, 91(4):045002, 2019. [13] Jeremy Adcock, Euan Allen, Matthew Day, Stefan Frick, [28] Louis-Fran¸coisArsenault, Richard Neuberg, Lauren A Janna Hinchliff, Mack Johnson, Sam Morley-Short, Sam Hannah, and Andrew J Millis. Inverse Problems, Pallister, Alasdair Price, and Stasja Stanisic. arXiv 33(11):115007, 2017. preprint arXiv:1512.02900, 2015. [29] Juan Carrasquilla and Roger G Melko. Nature Physics, [14] Srinivasan Arunachalam and Ronald de Wolf. arXiv 13(5):431–434, 2017. preprint arXiv:1701.06806, 2017. [30] Giuseppe Carleo and Matthias Troyer. Science, [15] Jacob Biamonte, Peter Wittek, Nicola Pancotti, Patrick 355(6325):602–606, 2017. Rebentrost, Nathan Wiebe, and Seth Lloyd. Nature, [31] Marin Bukov, Alexandre GR Day, Dries Sels, Phillip 549:195–202, 2017. Weinberg, Anatoli Polkovnikov, and Pankaj Mehta. [16] Aram W Harrow, Avinatan Hassidim, and Seth Lloyd. Physical Review X, 8(3):031086, 2018. Physical review letters, 103(15):150502, 2009. [32] Savvas Varsamopoulos, Koen Bertels, and Carmen G [17] Nathan Wiebe, Daniel Braun, and Seth Lloyd. Physical Almudever. Quantum Machine Intelligence, 2(1):1–12, review letters, 109(5):050505, 2012. 2020. [18] Andrew M Childs, Robin Kothari, and Rolando D [33] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 7

http://www. deeplearningbook.org, 2016. F. Bougares, H. Schwenk, and Y. Bengio. Proceedings [34] Pascal Vincent, Hugo Larochelle, Yoshua Bengio, and of the 2014 Conference on Empirical Methods in Natural Pierre-Antoine Manzagol. In Proceedings of the 25th in- Language Processing (EMNLP), pages 1724–1734, 2014. ternational conference on Machine learning, pages 1096– [57] F.A. Gers, J. Schmidhuber, and F. Cummins. 1999 Ninth 1103. ACM, 2008. International Conference on Artificial Neural Networks [35] Geoffrey E Hinton, Peter Dayan, Brendan J Frey, and ICANN 99, 2:850–855, 1999. Radford M Neal. Science, 268(5214):1158, 1995. [58] C.L. Degen, F. Reinhard, and P. Cappellaro. Rev. Mod. [36] Yoshua Bengio, Pascal Lamblin, Dan Popovici, and Hugo Phys., 89:035002, 2017. Larochelle. In Advances in neural information processing [59] JS Otterbach, R Manenti, N Alidoust, A Bestwick, systems, pages 153–160, 2007. M Block, B Bloom, S Caldwell, N Didier, E Schuyler [37] Vedran Dunjko, Jacob M. Taylor, and Hans J. Briegel. Fried, S Hong, et al. arXiv preprint arXiv:1712.05771, Physical Review Letters, 117(13):130501, September 2017. 2016. [60] Meena Mahajan and Venkatesh Raman. J. Algorithms, [38] Tadashi Kadowaki and Hidetoshi Nishimori. Physical Re- 31(2):335–354, 1999. view E, 58(5):5355, 1998. [61] Diederik P Kingma, Shakir Mohamed, Danilo Jimenez [39] Giuseppe E Santoro, Roman Martoˇn´ak,Erio Tosatti, and Rezende, and Max Welling. In Advances in Neural In- Roberto Car. Science, 295(5564):2427–2430, 2002. formation Processing Systems, pages 3581–3589, 2014. [40] J Brooke, TF Rosenbaum, and G Aeppli. Nature, [62] Amir Khoshaman, Walter Vinci, Brandon Denis, Evgeny 413(6856):610–613, 2001. Andriyash, and Mohammad H Amin. Quantum Science [41] Edward Farhi, Jeffrey Goldstone, and Sam Gutmann. and Technology, 4(1):014001, 2019. arXiv preprint arXiv:1411.4028, 2014. [63] Walter Vinci, Lorenzo Buffoni, Hossein Sadeghi, Amir [42] Alberto Peruzzo, Jarrod McClean, Peter Shadbolt, Man- Khoshaman, Evgeny Andriyash, and Mohammad Amin. Hong Yung, Xiao-Qi Zhou, Peter J Love, Al´anAspuru- Machine Learning: Science and Technology, 2020. Guzik, and Jeremy L O’brien. Nature communications, [64] Yann LeCun. http://yann. lecun. com/exdb/mnist/, 5, 2014. 1998. [43] Hartmut Neven, Vasil S Denchev, Geordie Rose, and [65] Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing William G Macready. arXiv preprint arXiv:0811.0416, Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, 2008. and Yoshua Bengio. In Advances in neural information [44] Vasil S Denchev, Nan Ding, SVN Vishwanathan, and processing systems, pages 2672–2680, 2014. Hartmut Neven. arXiv preprint arXiv:1205.1148, 2012. [66] Ian Goodfellow. arXiv preprint arXiv:1701.00160, 2016. [45] Kristen L Pudenz and Daniel A Lidar. Quantum infor- [67] Seth Lloyd and Christian Weedbrook. Physical review mation processing, 12(5):2027–2070, 2013. letters, 121(4):040502, 2018. [46] Alex Mott, Joshua Job, Jean-Roch Vlimant, Daniel Li- [68] Pierre-Luc Dallaire-Demers and Nathan Killoran. Phys- dar, and Maria Spiropulu. Nature, 550(7676):375, 2017. ical Review A, 98(1):012324, 2018. [47] Seth Lloyd, Maria Schuld, Aroosa Ijaz, Josh Izaac, and [69] Marcello Benedetti, Erika Lloyd, Stefan Sack, and Nathan Killoran. arXiv preprint arXiv:2001.03622, 2020. Mattia Fiorentini. Quantum Science and Technology, [48] Ville Bergholm, Josh Izaac, Maria Schuld, Christian 4(4):043001, 2019. Gogolin, Carsten Blank, Keri McKiernan, and Nathan [70] Alex Pepper, Nora Tischler, and Geoff J Pryde. Physical Killoran. arXiv preprint arXiv:1811.04968, 2018. review letters, 122(6):060501, 2019. [49] Ilaria Gianani, Ivana Mastroserio, Lorenzo Buffoni, Na- [71] Marcello Benedetti, Edward Grant, Leonard Woss- talia Bruno, Ludovica Donati, Valeria Cimini, Marco nig, and Simone Severini. New Journal of Physics, Barbieri, Francesco S Cataliotti, and Filippo Caruso. 21(4):043023, 2019. arXiv preprint arXiv:2106.13835, 2021. [72] Ling Hu, Shu-Hao Wu, Weizhou Cai, Yuwei Ma, Xiang- [50] Nicola Dalla Pozza and Filippo Caruso. Phys. Rev. Re- hao Mu, Yuan Xu, Haiyan Wang, Yipu Song, Dong- search, 2:043011, 2020. Ling Deng, Chang-Ling Zou, et al. Science advances, [51] James D. Whitfield, C´esarA. Rodr´ıguez-Rosario, and 5(1):eaav2761, 2019. Al´anAspuru-Guzik. Physical Review A, 81:022323, 2010. [73] Paolo Braccia, Filippo Caruso, and Leonardo Banchi. [52] Filippo Caruso. New Journal of Physics, 16(5):055015, Eprint arXiv:2012.05996, 2020. May 2014. [74] Heinz-Peter Breuer and Francesco Petruccione. Oxford [53] C. Helstrom. Mathematics in Science and Engineering : University Press, 2002. a series of monographs and textbooks, 1976. [75] Nicola Dalla Pozza, Lorenzo Buffoni, Stefano Martina, [54] Stefano Martina, Stefano Gherardini, and Filippo and Filippo Caruso. arXiv preprint arXiv:2108.04490, Caruso. arXiv preprint arXiv:2101.03221, 2021. 2021. [55] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. [76] Goran Lindblad. Communications in Mathematical MIT Press, 2016. http://www.deeplearningbook.org. Physics, 48(2):119–130, 1976. [56] K. Cho, B.v. Merrienboer, C. Gulcehre, D. Bahdanau,