Quantum reservoir computing: a reservoir approach toward quantum machine learning on near-term quantum devices

Keisuke Fujii1, ∗ and Kohei Nakajima2, † 1Graduate School of Engineering Science, Osaka University, 1-3 Machikaneyama, Toyonaka, Osaka 560-8531, Japan. 2Graduate School of Information Science and Technology, The University of Tokyo, Bunkyo-ku, 113-8656 Tokyo, Japan (Dated: November 11, 2020) Quantum systems have an exponentially large degree of freedom in the number of particles and hence provide a rich dynamics that could not be simulated on conventional computers. Quantum reservoir computing is an approach to use such a complex and rich dynamics on the quantum systems as it is for temporal machine learning. In this chapter, we explain quantum reservoir computing and related approaches, quantum and quantum circuit learning, starting from a pedagogical introduction to quantum mechanics and machine learning. All these quantum machine learning approaches are experimentally feasible and effective on the state-of-the-art quantum devices.

I. INTRODUCTION a useful task like machine learning, now applications of such a near-term quantum device for useful tasks includ- Over the past several decades, we have enjoyed expo- ing machine leanring has been widely explored. On the nential growth of computational power, namely, Moore’s other hand, quantum simulators are thought to be much law. Nowadays even smart phone or tablet PC is much easier to implement than a full-fledged universal quan- more powerful than super computers in 1980s. Even tum computer. In this regard, existing quantum sim- though, people are still seeking more computational ulators have already shed new light on the physics of power, especially for artificial intelligence (machine learn- complex many-body quantum systems [9–11], and a re- ing), chemical and material simulations, and forecasting stricted class of quantum dynamics, known as adiabatic complex phenomena like economics, weather and climate. dynamics, has also been applied to combinatorial opti- In addition to improving computational power of conven- misation problems [12–15]. However, complex real-time tional computers, i.e., more Moore’s law, a new genera- quantum dynamics, which is one of the most difficult tion of computing paradigm has been started to be inves- tasks for classical computers to simulate [16–18] and has tigated to go beyond Moore’s law. Among them, natural great potential to perform nontrivial information process- computing seeks to exploit natural physical or biological ing, is now waiting to be harnessed as a resource for more systems as computational resource. Quantum reservoir general purpose information processing. computing is an intersection of two different paradigms of natural computing, namely, and Physical reservoir computing, which is the main sub- reservoir computing. ject throughout this book, is another paradigm for ex- Regarding quantum computing, the recent rapid ex- ploiting complex physical systems for information pro- perimental progress in controlling complex quantum sys- cessing. In this framework, the low-dimensional input is tems motivates us to use quantum mechanical law as a projected to a high-dimensional dynamical system, which reservoir new principle of information processing, namely, quan- is typically referred to as a , generating tran- tum information processing [2, 3]. For example, certain sient dynamics that facilitates the separation of input mathematical problems, such as integer factorisation, states [19]. If the dynamics of the reservoir involve both which are believed to be intractable on a classical com- adequate memory and nonlinearity [20], emulating non- puter, are known to be efficiently solvable by a sophis- linear dynamical systems only requires adding a linear arXiv:2011.04890v1 [quant-ph] 10 Nov 2020 ticatedly synthesized quantum algorithm [4]. Therefore, and static readout from the high-dimensional state space considerable experimental effort has been devoted to real- of the reservoir. A number of different implementations ising full-fledged universal quantum computers [5, 6]. In of reservoirs have been proposed, such as abstract dy- the near feature, quantum computers of size > 50 qubits namical systems for echo state networks (ESNs) [21] or with fidelity > 99% for each elementary gate would ap- models of neurons for liquid state machines [22]. The im- pear to achieve quantum computational supreamcy beat- plementations are not limited to programs running on the ing simulation on the-state-of-the-art classical supercom- PC but also include physical systems, such as the surface puters [7, 8]. While this does not directly mean that a of water in a laminar state [23], analogue circuits and quantum computer outperforms classical computers for optoelectronic systems [24–29], and neuromorphic chips [30]. Recently, it has been reported that the mechani- cal bodies of soft and compliant robots have also been successfully used as a reservoir [31–36]. In contrast to ∗ [email protected] the refinements required by learning algorithms, such as † k [email protected] in [37], the approach followed by reservoir 2 computing, especially when applied to real systems, is on a complex d-dimensional system Cd, where the symbol to find an appropriate form of physics that exhibits rich is called ket and indicates a complex column vector. dynamics, thereby allowing us to outsource a part of the Similarly,|·i is called bra and indicates a complex row computation. vector, andh·| they are related complex conjugate, Quantum reservoir computing (QRC) was born in the ψ = ψ † = c∗ c∗  . (2) marriage of quantum computing and physical reservoir h | | i 1 ··· d computing above to harness complex quantum dynamics With this notation, we can writte an inner product of as a reservoir for real-time machine learning tasks [38]. two quantum state ψ and φ by ψ φ . Let us define Since the idea of QRC has been proposed in Ref. [38], an orthogonal basis | i | i h | i its proof-of-principle experimental demonstration for non temporal tasks [39] and performance analysis and im-    1   0  0 provement [40–42] has been explored. The QRC ap- .  0   .   .  proach to quantum tasks such as quantum tomography    .   .   .     .  and quantum state preparation has been recently gar- 1 =  .  , ... k =  1  , ... d =  .  , (3) | i   | i   | i  .  nering attention [71–73]. In this book chapter, we will  .   0   .   .     .  provide a broad picture of QRC and related approaches .  .  starting from a pedagogical introduction to quantum me- 0 . d chanics and machine learning. The rest of this paper is organized as follows. In Sec II, a quantum state in the d-dimensional system can be de- we will provide a pedagogical introduction to quantum scribed simply by mechanics for those who are not familiar to it and fix our d X notation. In Sec III, we will briefly mention to several ψ = c i . (4) | i i| i machine learning techniques like, linear and nonlinear re- i=1 gressions, temporal machine learning tasks and reservoir computing. In Sec IV, we will explain QRC and related The state is said to be a superposition state of i . The co- | i approaches, quantum extreme learning machine [39] and efficients ci are complex, and called complex probability { } quantum circuit learning [43]. The former is a frame- amplitudes. If we measure the system in the basis i , {| i} work to use quantum reservoir for non temporal tasks, we obtain the measurement outcome i with a probability that is, the input is fed into a quantum system, and gen- p = i ψ 2 = c 2, (5) eralization or classification tasks are performed by a lin- i |h | i| | i| ear regression on a quantum enhanced feature space. In and hence the complex probability amplitudes have to be the latter, the parameters of the quantum system is fur- normalized as follows ther fine-tuned via the gradient descent by measuring an analytically obtained gradient, just like the back propa- d 2 X 2 gation for feedforward neural networks. Regarding QRC, ψ ψ = ci = 1. (6) |h | i| | | we will also see chaotic time series predictions as demon- i=1 strations. Sec. V is devoted to conclusion and discussion. In other words, a quantum state is represented as a nor- malized vector on a complex vector space. Suppose the measurement outcome i corresponds to a II. PEDAGOGICAL INTRODUCTION TO certain physical value ai, like energy, magnetization and QUANTUM MECHANICS so on, then the expectation value of the physical valuable is given by In this section, we would like to provide a pedagogical X introduction to how quantum mechanical systems work aipi = ψ A ψ A , (7) h | | i ≡ h i for those who are not familiar to quantum mechanics. i If you already familiar to quantum mechanics and its where we define an hermitian operator notations, please skip to Sec. III. X A = a i i , (8) i| ih | i A. Quantum state which is called observable, and has the information of the measurement basis and physical valuable. A state of a quantum system is described by a state The state vector in quantum mechanics is similar to vector, a probability distribution, but essentially different form  c  it, since it is much more primitive; it can take complex 1 value and is more like a square root of a probability. The ψ =  .  (1) | i  .  unique features of the quantum systems come from this cd property. 3

B. Time evolution D. Density operator

The time evolution of a quantum system is determined Next, I would like to introduce operator formalism of by a Hamiltonian H, which is a hermitian operator acting the above quantum mechanics. This describes an exactly on the system. Let us denote a quantum state at time the same thing but sometimes the operator formalism t = 0 by ψ(0) . The equation of motion for quantum would be convenient. Let us consider an operator ρ con- mechanics,| so-calledi Schr¨odingerequation, is given by structed from the state vector ψ : | i ∂ ρ = ψ ψ . (16) i ψ(t) = H ψ(t) . (9) | ih | ∂t| i | i If you chose the basis of the system i for the matrix This equation can be formally solved by representation, then the diagonal elements{| i} of ρ corre- 2 sponds the probability distribution pi = ci when the ψ(t) = e−iHt ψ(0) . (10) system is measured in the basis i . Therefore| | the op- | i | i erator ρ is called a density operator.{| i} The probability Therefore the time evolution is given by an operator distribution can also be given in terms of ρ by e−iHt, which is a unitary operator and hence the norm of the state vector is preserved, meaning the probability pi = Tr[ i i ρ], (17) conservation. In general, the Hamiltonian can be time | ih | dependent. Regarding the time evolution, if you are not where Tr is the matrix trace. An expectation value of an interested in the continuous time evolution, but in just observable A is given by its input and output relation, then the time evolution is A = Tr[Aρ]. (18) nothing but a unitary operator U h i The density operator can handle a more general situation ψ = U ψ . (11) | outi | ini where a quantum state is sampled form a set of quantum states ψk with a probability distribution qk . In this In quantum computing, the time evolution U is some- case, if{| wei} measure the system in the basis{ i} i , the quantum gate times called . probability to obtain the measurement outcome{| ihi is|} given by X C. Qubits p = q Tr[ i i ρ ], (19) i k | ih | k k The smallest nontrivial quantum system is a two- where ρ = ψ ψ . By using linearity of the trace dimensional quantum system 2, which is called quantum k k k C function, this| readsih | bit or qubit: X p = Tr[ i i q ρ ]. (20) α 0 + β 1 , ( α 2 + β 2 = 1). (12) i | ih | k k | i | i | | | | k Suppose we have n qubits. The n-qubit system is de- Now we interpret that the density operator is given by fined by a tensor product space ( 2)⊗n of each two- C X dimensional system as follows. A basis of the system ρ = qk ψk ψk . (21) is defined by a direct product of a binary state x with | ih | | ki k xk 0, 1 , ∈ { } In this way, a density operator can represent classical

x1 x2 xn , (13) mixture of quantum states by a convex mixture of den- | i ⊗ | i ⊗ · · · ⊗ | i sity operators, which is convenient in many cases. In which is simply denoted by general, a positive and hermitian operator ρ being sub- ject to Tr[ρ] = 1 can be a density operator, since it can x x x . (14) be interpreted as a convex mixture of quantum states via | 1 2 ··· ni spectral decomposition: Then a state of the n-qubit system can be described as X X ρ = λi λi λi , (22) ψ = α x x x . (15) | ih | | i x1,x2,...,xn | 1 2 ··· ni x1,x2,...,xn where λi and λi are the eigenstates and eigen- vectors{| respectively.i} { } Because of Tr[ρ] = 1, we have n The dimension of the n-qubit system is 2 , and hence P λ = 1. n i i the tensor product space is nothing but a 2 -dimensional From its definition, the time evolution of ρ can be given 2n complex vector space C . The dimension of the n-qubit by system increases exponentially in the number n of the qubits. ρ(t) = e−iHtρ(0)eiHt (23) 4 or Since the Pauli operators constitute a complete basis on † the operator space, any operator A can be decomposed ρout = UρinU . (24) into a linear combination of P (i), Moreover, we can define more general operations for the X density operators. For example, if we apply unitary op- A = aiP (i). (33) erators U and V with probabilities p and (1 p), respec- i tively, then we have − The coefficient ai can be calculated by using the Hilbert- † † Schmidt inner product as follows: ρout = pUρU + (1 p)V ρV . (25) − n As another example, if we perform the measurement of ai = Tr[P (i)A]/2 , (34) ρ in the basis i , and we forget about the measure- by virtue of the orthogonality ment outcome,{| theni} the state is now given by a density n operator Tr[P (i)P (j)]/2 = δi,j. (35) X X Tr[ i i ρ] i i = i i ρ i i . (26) The number of the n-qubit Pauli operators P (i) is 4n, | ih | | ih | | ih | | ih | { } i i and hence a density operator ρ of the n-qubit system can be represented as a 4n-dimensional vector Therefore if we define a map from a density operator to   another, which we call superoperator, r00...0 X  .  ( ) = i i ( ) i i , (27) r =  .  , (36) M ··· | ih | ··· | ih | i r11...1

n the above non-selective measurement (forgetting about where r00...0 = 1/2 because of Tr[ρ] = 1. Moreover, the measurement outcomes) is simply written by because P (i) is hermitian, r is a real vector. The super- operator is a linear map for the operator, and hence (ρ). (28) K M can be represented as a matrix acting on the vector r: In general, any physically allowed quantum operation K ρ0 = (ρ) r0 = Kr, (37) that maps a density operator to another can be repre- K ⇔ sented in terms of a set of operators Ki being subject where the matrix element is given by † { } to Ki Ki = I with an identity operator I: n Kij = Tr[P (i) (P (j))]/2 . (38) X K (ρ) = K ρK†. (29) K i i In this way, a density operator ρ and a quantum oper- i ation on it can be represented by a vector r and a K The operators Ki are called Kraus operators. matrix K, respectively. { }

E. Vector representation of density operators III. MACHINE LEARNING AND RESERVOIR APPROACH Finally, we would like to introduce a vector represen- tation of the above operator formalism. The operators In this section, we briefly introduce machine learning themselves satisfy axioms of the linear space. Moreover, and reservoir approaches. we can also define an inner product for two operators, so-called Hilbert-Schmidt inner product, by A. Linear and nonlinear regression Tr[A†B]. (30) The operators on the n-qubit system can be spanned by A supervised machine learning is a task to construct a the tensor product of Pauli operators I,X,Y,Z ⊗n, model f(x) from a given set of teacher data x(j), y(j) { } { } n and to predict the output of an unknown input x. Sup- O pose x is a d-dimensional data, and f(x) is one dimen- P (i) = σ . (31) i2k−1i2k sional, for simplicity. The simplest model is linear regres- k=1 sion, which models f(x) as a linear function with respect where σij is the Pauli operators: to the input:  1 0   0 1  d I = σ00 = ,X = σ10 = , X 0 1 1 0 f(x) = wixi + w0. (39) i=1  1 0   0 i  The weights w and bias w are chosen such that an er- Z = σ = ,Y = σ = . (32) i 0 01 0 1 11 i −0 ror between f{(x)} and the output of the teacher data, i.e. − 5 loss, becomes minimum. If we employ a quadratic loss market, which are called temporal tasks, the network has (j) (j) to handle the input data that is given in a sequential way. function for given teacher data xi , y , the prob- lem we have to solve is as follows:{{ } } To do so, the feeds the previ- ous states of the nodes back into the states of the nodes d X X (j) (j) 2 at next step, which allows the network to memorize the min ( wixi y ) , (40) {wi} − past input. In contrast, the neural network without any j i=0 recurrency is called a feedforward neural network. where we introduced a constant node x0 = 1. This cor- Let us formalize a temporal machine learning task with responds to solving a superimposing equations: the recurrent neural network. For given input time series L L xk k=1 and target time series y¯k k=1, a temporal ma- y = Xw, (41) chine{ } learning is a task to generalize{ } a nonlinear function, where y = y(j), X = x(j), and w = w . This can be k j ji i i i y¯k = f( xj j=1). (47) solved by using the Moore-Penrose pseudo inverse X+, { } which can be defined from the singular value decomposi- For simplicity, we consider one-dimensional input and tion of X = UDV T to be output time series, but their generalization to a multi- dimensional case is straightforward. To learn the non- + T X = VDU . (42) linear function f( x k ), the recurrent neural network { j}j=1 Unfortunately, the linear regression results in a poor can be employed as a model. Suppose the recurrent neu- performance in complicated machine learning tasks, and ral network consists of m nodes and is denoted by m- any kind of nonlinearity is essentially required in the dimensional vector   model. A neural network is a way to introduce non- r1 linearity to the model, which is inspired by the human r =  .  . (48) brain. In the neural network, the d-dimensional input  .  data x is fed into N-dimensional hidden nodes with an rm N d input matrix W in: × To process the input time series, the nodes evolve by in W x. (43) in r(k + 1) = σ[W r(k) + W xk], (49) Then each element of the hidden nodes is now processed where W is an m m transition matrix and W in is an by a nonlinear activation function σ such as tanh, which × is denoted by m 1 input weight matrix. Nonlinearity comes from the× nonlinear function σ applied on each element of the σ(W inx). (44) nodes. The output time series from the network is defined in terms of a 1 m readout weights by Finally the output is extracted by an output weight W out × (1 N dimensional matrix): y = W outr(k). (50) × k out in W σ(W x). (45) Then the learning task is to determine the parameters in W in, W , and W out by using the teacher data x , y¯ L The parameters in W in and W out are trained such that k k k=1 so as to minimize an error between the teacher{ y¯ }and the error between the output and teacher data becomes k the output y of the network. { } minimum. While this optimization problem is highly { k} nonlinear, a gradient based optimization, so-called back propagation, can be employed. To improve a representa- C. Reservoir approach tion power of the model, we can concatenate the linear transformation and the activation function as follows:    While the representation power of the recurrent neu- W outσ W (l) σ W (1)σ(W inx) , (46) ral network can be improved by increasing the number ··· of the nodes, it makes the optimization process of the which is called multi-layer or deep neural net- weights hard and unstable. Specifically, the back prop- work. agation based methods always suffer from the vanishing gradient problem. The idea of reservoir computing is to resolve this problem by mapping an input into a complex B. Temporal task higher dimensional feature space, i.e., reservoir, and by performing simple linear regression on it. The above task is not a temporal task, meaning that Let us first see a reservoir approach on a feedforward the input data is not sequential but given simultaneously neural network, which is called extreme learning ma- like the recognition task of images for hand written lan- chine [44]. The input data x is fed into a network like guage, pictures and so on. However, for a recognition multi-layer perceptron, where all weights are chosen ran- of spoken language or prediction of time series like stock domly. The states of the hidden nodes at some layer is 6 now regarded as basis functions of the input x in the A. Quantum extreme learning machine feature space: The idea of quantum extreme learning machine lies in φ1(x), φ2(x), ..., φN (x) . (51) using a Hilbert space, where quantum states live, as an { } enhanced feature space of the input data. Let us denote Now the output is defined as a linear combination of these the set of input and teacher data by x(j), y¯(j) . Suppose we have an n-qubit system, which is{ initialized} to X wiφi(x) + w0 (52) 0 ⊗n. (54) i | i In order to feed the input data into quantum system, a and hence the coefficients are determined simply by the unitary operation parameterized by x, say V (x), is ap- linear regression as mentioned before. If the dimen- plied on the initial state: sion and nonlinearity of the the basis functions are high V (x) 0 ⊗n. (55) enough, we can model a complex task simply by the lin- | i ear regression. For example, if x is one-dimensional data and normal- The echo state network is similar but employs the reser- ized to be 0 x 1, then we may employ the Y -basis − ≤ ≤ voir idea for the recurrent neural network [21, 22, 45], rotation e iθY with an angle θ = arccos(√x): which has been proposed before extreme learning ma- e−iθY 0 = √x 0 + √1 x 1 . (56) chine appeared. To be specific, the input weights W in | i | i − | i and weight matrix W are both chosen randomly up to The expectation value of Z with respect to e−iθY 0 be- an appropriate normalization. Then the learning task is comes | i done by finding the readout weights W out to minimize Z = 2x 1, (57) the mean square error h i − and hence is linearly related to the input x. To enhance X (y y¯ )2. (53) the power of quantum enhanced feature space, the input k − k k could be transformed by using a nonlinear function φ: p This problem can be solved stably by using the pseudo θ = arccos( φ(x)). (58) inverse as we mentioned before. The nonlinear function φ could be, for example, hyper- For both feedforward and recurrent types, the reservoir bolic tangent, Legendre polynomial, and so on. For approach does not need to tune the internal parameters simplicity, below we will use the simple linear input of the network depending on the tasks as long as it posses θ = arccos(√x). sufficient complexity. Therefore, the system, to which the If we apply the same operation on each of the n qubits, machine learning tasks are outsourced, is not necessarily we have the neural network anymore, but any nonlinear physical V (x) 0 ⊗n = (√x 0 + √1 x 1 )⊗n system of large degree of freedoms can be employed as | i | i − | i i a reservoir for information processing, namely, physical X Y r x k = (1 x)n/2 i , ..., i . reservoir computing [23–36]. − 1 x | 1 ni i1,...,in k − Therefore, we have coefficients that are nonlinear with respect to the input x because of the tensor product IV. QUANTUM MACHINE LEARNING ON NEAR-TERM QUANTUM DEVICES structure. Still the expectation value of the single qubit operator Z on the kth qubit is 2x 1. However, if we k − measure a correlated operator like Z1Z2, we can obtain a In this section, we will see QRC and related frame- second order nonlinear output works for quantum machine learning. Before going deep Z Z = (2x 1)2 (59) into the temporal tasks done on QRC, we first explain h 1 2i − how complicated quantum natural dynamics can be ex- with respect to the input x. To measure a correlated ploit as generalization and classification tasks. This can operator, it is enough to apply an entangling unitary op- be viewed as a quantum version of extreme learning ma- eration like CNOT gate Λ(X) = 0 0 I + 1 1 X: chine [39]. While it is an opposite direction to reser- | ih | ⊗ | ih | ⊗ quantum circuit learning ψ Λ1,2(X)Z1Λ1,2(X) ψ = ψ Z1Z2 ψ . (60) voir computing, we will also see h | | i h | | i (QCL) [43], where the parameters in the complex dy- In general, an n-qubit unitary operation U transforms the namics is further tuned in addition to the linear readout observable Z under the conjugation into a linear combi- weights. QCL is a quantum version of a feedforward neu- nation of Pauli operators: ral network. Finally, we will explain quantum reservoir † X computing by extending quantum extreme learning ma- U Z1U = αiP (i). (61) chine for temporal learning tasks. i 7

(a) 0 ✓ x0 x1 | i 0 0 ✓ | i 1 0 ✓ | i 2 0 ✓ | i 3 0 ✓ | i 4 0 ✓ | i 5 0 ✓ | i 6 0 ✓ | i 7 ×2 1.0 (b) training data leaned output

threshold at 0.5

FIG. 2. (a) The quantum circuit for quantum extreme learn- FIG. 1. The expectation value hZi of the output of a quantum ing machine. The box with thetak indicates Y -rotations by circuit as a function of the input (x0, x1). angles θk. The red and blue boxes correspond to X and Z rotations by random angles, Each dotted-line box represent a two-qubit gate consisting of two controlled-Z gates and 8 Thus if you measure the output of the quantum circuit X-rotations and 4 Z-rotations. As denoted by the dashed- after applying a unitary operation U, line box, the sequence of the 7 dotted boxes is repeated twice. The readout is defined by a linear combination of hZii with ⊗n UV (x) 0 , (62) constant bias term 1.0 and the input (x0, x1). (b) (Left) The | i training data for a two-class classification problem. (Middle) you can get a complex nonlinear output, which could The readout after learning. (Right) Prediction from the read- be represented as a linear combination of exponentially out with threshold at 0.5. many nonlinear functions. U should be chosen to be ap- propriately complex with keeping experimental feasibility but not necessarily fine-tuned. becomes minimum. As we mentioned previously, this can To see how the output behaves in a nonlinear way with be solved by using the pseudo inverse. In short, quantum respect to the input, in Fig. 1, we will plot the output extreme learning machine is a linear regression on a ran- Z for the input (x0, x1) and n = 8, where the inputs domly chosen nonlinear basis functions, which come from areh i fed into the quantum state by the Y -rotation with the quantum state in a space of an exponentially large di- angles mension, namely quantum enhanced feature space. Fur- thermore, under some typical nonlinear function and uni- θ2k = k arccos(√x0) (63) tary operations settings to transform the observables, the θ2k+1 = k arccos(√x1) (64) output in Eq. (66) can approximate any continuous func- tion of the input. This property is known as the universal on the 2kth and (2k + 1)th qubits, respectively. Regard- approximation property (UAP), which implies that the ing the unitary operation U, random two-qubit gates are quantum extreme learning machine can handle a wide sequentially applied on any pairs of two qubits on the class of machine learning tasks with at least the same 8-qubit system. power as the classical extreme learning machine [75]. Suppose the Pauli Z operator is measured on each Here we should note that a similar approach, quan- qubit as an observable. Then we have tum kernel estimation, has been taken in Ref. [46]. In quantum extreme learning machine, a classical feature zi = Zi , (65) h i vector φi(x) Φ(x) Zi Φ(x) is extracted from observ- ables on the quantum≡ h | feature| i space Φ(x) V (x) 0 ⊗n. for each qubit. In quantum extreme learning machine, | i ≡ | i the output is defined by taking linear combination of Then linear regression is taken by using the classical these n output: feature vector. On the other hand, in quantum kernel estimation, quantum feature space is fully employed by n X using support vector machine with the kernel functions y = wizi. (66) K(x, x0) Φ(x) Φ(x0) , which can be estimated on a i=1 quantum≡ computer. h | Whilei classification power would be better for quantum kernel estimation, it requires more Now the linear readout weights wi are tuned so that the quadratic loss function { } quantum computational costs both for learning and pre- diction in contrast to quantum extreme learning machine. X L = (y(j) y¯(j))2 (67) In Fig. 2, we demonstrate quantum extreme learn- − j ing machine for a two-class classification task of a two- 8 dimensional input 0 x0, x1 1. Class 0 and 1 are The gradient of the loss function can be obtained as fol- ≤ ≤ 2 defined to be those being subject to (x0 0.5) + (x1 lows: 0.5)2 0.15 and > 0.15, respectively. The− linear read-− ∂ ∂ X out weights≤ w are learned with 1000 randomly chosen L( φ ) = ( A( φ , x(j)) y(j))2 { i} ∂φ { k} ∂φ h { k} i − training data and prediction is performed with 1000 ran- l l j domly chosen inputs. The class 0 and 1 are determined X (j) (j) ∂ (j) whether or not the output y is larger than 0.5. Quan- = 2( A( φk , x ) y ) A( φk , x ) . h { } i − ∂φl h { } i tum extreme learning machine with an 8-qubit quantum j circuit shown in Fig. 2 (a) succeeds to predict the class Therefore if we can measure the gradient of the observ- with 95% accuracy. On the other hand, a simple linear (j) able A( φk , x ) , the loss function can be minimized regression for (x0, x1) results in 39%. Moreover, quan- accordingh { to} the gradienti descent. tum extreme learning machine with U = I, meaning no If the unitary operation u(φ ) is given by entangling gate, also results in poor, 42%. In this way, k the feature space enhanced by quantum entangling op- −i(φk/2)Pk u(φk) = Wke , (71) erations is important to obtain a good performance in quantum extreme learning machine. where Wk is an arbitrary unitary, and Pk is a Pauli oper- ator. Then the partial derivative with respect to the lth parameter can be analytically calculated from the out- (j) B. Quantum circuit learning puts A( φk , x ) with shifting the lth parameter by  [43,h 61]:{ } i ± In the split of reservoir computing, dynamics of a phys- ∂ (j) ical system is not fine-tuned but natural dynamics of the A( φk , x ) system is harnessed for machine learning tasks. However, ∂φl h { } i 1 if we see the-state-of-the-art quantum computing devices, = ( A( φ , ..., φ + , φ , ... , x(j)) the parameter of quantum operations can be finely tuned 2 sin  h { 1 l l+1 } i (j) as done for universal quantum computing. Therefore it A( φ1, ..., φl , φl+1, ... , x ) ). is natural to extend quantum extreme learning machine − h { − } i by tuning the parameters in the quantum circuit just like feedfoward neural networks with back propagation. By considering the statistical error to measure the ob- Using parameterized quantum circuits for supervised servable A ,  should be chosen to be  = π/2 so as to machine leaning tasks such as generalization of nonlinear make theh denominatori maximum. After measuring the functions and pattern recognitions have been proposed partial derivatives for all parameters φk and calculating in Refs. [43, 47], which we call quantum circuit learning. the gradient of the loss function L( φk ), the parameters Let us consider the same situation with quantum extreme are now updated by the gradient descent:{ } learning machine. The state before the measurement is given by (m+1) (m) ∂ θl = θl α L( φk ). (72) ⊗ − ∂φ { } UV (x) 0 n. (68) l | i In the case of quantum extreme learning machine the The idea of using the parameterized quantum cir- unitary operation for a nonlinear transformation with cuits for machine learning is now widespread. After respect to the input parameter x is randomly chosen. the proposal of quantum circuit learning based on the However, the unitary operation U may also be parame- analytical gradient estimation above [43] and a similar terized: idea [47], several researches have been performed with Y various types of parameterized quantum circuits [48–52] U( φ ) = u(φ ). (69) { k} k and various models and types of machine learning includ- k ing generative models [54, 55] and generative adversarial Thereby, the output from the quantum circuit with re- models [56–58]. Moreover, an expression power of the pa- spect to an observable A rameterized quantum circuits and its advantage against ⊗n † † ⊗n A( φk , x) = 0 V (x)U( φk ) ZiU( φk )V (x) 0 classical probabilistic models have been investigated [59]. h { } i h | { } { } | i Experimentally feasible ways to measure an analytical becomes a function of the circuit parameters φk in ad- gradient of the parameterized quantum circuits have been dition to the input x. Then the parameters φ{ is} tuned { k} investigated [60–62]. An advantage of using such a gradi- so as to minimize the error between teacher data and the ent for the parameter optimization has been also argued output, for example, by using the gradient just like the in a simple setting [63], while the parameter tuning be- output of the feedforward neural network. comes difficult because of the vanishing gradient by an Let us define a teacher dataset x(j), y(j) and a { } exponentially large Hilbert space [64]. Software libraries quadratic loss function for optimizing parameterized quantum circuits are now X L( φ ) = ( A( φ , x(j)) y(j))2. (70) developing [65, 66]. Quantum machine learning on near- { k} h { k} i − j term devices, especially for quantum optical systems, is 9

(a) Quantum Reservoir Computing Now we see, from Eq. (74), a time evolution similar 0 virtual to the recurrent neural network, r = tanh(W r). How- nodes ever, there is no nonlinearity such as tanh in each quan- input output tum operation W . Instead, the time evolution W can true nodes } be changed according to the external input xk, namely

x’i (t ) LR Wxk , which contrasts to the conventional recurrent neu- in 1 ral network where the input is fed additively W r+W xk. hidden nodes

0.8 This allows the quantum reservoir to process the input

0.6 information xk nonlinearly, by repetitively feeding the (b) virtual nodes { } 0.4 input.

0.2 Suppose the input xk is normalized such that 0 ……… { } ≤ 0 xk 1. As an input, we replace a part of the qubits to 1 Input 0.8 virtual node ≤ 0.6 1 Input the quantum state. The density operator is given by 0.4 0.8 0.2 0.6 1 Input y 0 0.4 0.8 k 1 0.2 0.6 1 Input 2 0 0.4 0.8 linear I + (2x 1)Z 0.2 0.6 1 Input k r¯ 3 0 0.4 0.8 readout weights l 0.2 0.6 ρxk = − . (75) 4 0 0.4 out 2 true 5 0.2 W 0 l,v l,v node { } virtual node For simplicity, below we consider the case where only one

qubit is replaced for the input. Corresponding matrix Sxk FIG. 3. (a) Quantum reservoir computing. (b) Virtual nodes is given by and temporal multiplexing.  I + (2x 1)Z  (S ) = Tr P (j) k − Tr [P (i)] /2N , xk ji 2 ⊗ replace proposed in Refs [67, 68]. Quantum circuit learning with parameterized quantum circuits has been already exper- where Trreplace indicates a partial trace with respect to imentally demonstrated on superconducting qubit sys- the replaced qubit. With this definition, we have tems [46, 69] and a trapped ion system [70]. ρ0 = Tr [ρ] ρ r0 = S r. (76) replace ⊗ xk ⇔ xk The unitary time evolution, which is necessary to ob- C. Quantum reservoir computing tain a nonlinear behavior with respect to the input valu- −iHτ able xk, is taken as a Hamiltonian dynamics e for Now we return to the reservoir approach and extend a given time interval τ. Let us denote its representation quantum extreme learning machine from non temporal on the vector space by Uτ : tasks to temporal ones, namely, quantum reservoir com- ρ0 = e−iHτ ρeiHτ r0 = U r. (77) puting [38]. We consider a temporal task, which we ex- ⇔ τ plained in Sec. III B. The input is given by a time series L Then, a unit time step is written as an input-depending xk k and the purpose is to learn a nonlinear temporal linear transformation: function:{ } k r((k + 1)τ) = U S r(kτ). (78) y¯ = f( x ). (73) τ xk k { j}j L where r(kτ) indicates the hidden nodes at time kτ. To this end, the target time series y¯k k=1 is also pro- vided as teacher. { } Since the number of the hidden nodes are exponen- tially large, it is not feasible to observe all nodes from Contrast to the previous setting with non temporal M tasks, we have to fed input into a quantum system se- experiments. Instead, a set of observed nodes r¯l l=1, which we call true nodes, is defined by a M 4N{matrix} quentially. This requires us to perform an initializa- × tion process during computation, and hence the quan- R, tum state of the system becomes mixed state. Therefore, X r¯ (kτ) = R r (kτ). (79) in the formulation of QRC, we will use the vector rep- l li i resentation of density operators, which was explained in i Sec. II E. The number of true nodes M has to be a polynomial In the vector representation of density operators, the in the number of qubits N. That is, from exponentially quantum state of an N-qubit system is given by a vector many hidden nodes, a polynomial number of true nodes N in a 4N -dimensional real vector space, r R4 . In QRC, are obtained to define the output from QR (see Fig. 3 ∈ similarly to recurrent neural networks, each element of (a)): the 4N -dimensional vector is regarded as a hidden node X out of the network. As we seen in Sec. II E, any physical yk = Wl r¯l(kτ), (80) operation can be written as a linear transformation of l the real vector by a 4N 4N matrix W : × where Wout is the readout weights, which is obtained r0 = W r. (74) by using the training data. For simplicity, we take the 10 single-qubit Pauli Z operator on each qubit as the true That is, the system learns the input of the next step. nodes, i.e., Once the system successfully learnsy ¯k, by feeding the output into the input of the next step of the system, the r¯l = Tr[Zlρ], (81) system evolves autonomously. Here we employ the following target time series from so that if there is no dynamics these nodes simply provide chaotic attractors: (i) Lorenz attractor, a linear output (2x 1) with respect to the input x . k − k Moreover, in order to improve the performance we also dx perform the temporal multiplexing. The temporal mul- = a(y x), (85) dt − tiplexing has been found to be useful to extract com- dy plex dynamics on the exponentially large hidden nodes = x(b z) y, (86) through the restricted number of the true nodes [38]. In dt − − dz temporal multiplexing, not only the true nodes just af- = xy cz, (87) dt − ter the time evolution Uτ , also at each of the subdivided V time intervals during the unitary evolution Uτ to con- with (a, b, c) = (10, 28, 8/3), (ii) the chaotic attractor of struct V virtual nodes, as shown in Fig. 3 (b). After each Mackey-Glass equation, input by Sxk , the signals from the hidden nodes (via the true nodes) are measured for each subdevided intervals d x(t τ) x(t) = β − γx(t) (88) after the time evolution by Uvτ/V (v = 1, 2, ...V ), i.e., dt 1 + x(t τ)n − − r(kτ + (v/V )τ) U S r(kτ). (82) ≡ (v/V )τ xk with (β, γ, n) = (0.2, 0.1, 10) and τ = 17, (iii) R¨ossler In total, now we have N V nodes, and the output is attoractor, × defined as their linear combination: dx = y z, (89) N V dt − − X X out yk = W r¯l(kτ + (v/V )τ). (83) dy j,v = x + ay, (90) l=1 v=1 dt L dz By using the teacher data y¯k k , the linear readout = b + z(x c), (91) out { } dt − weights Wj,v can be determined by using the pseudo inverse. In Ref. [38], the performance of QRC has been with (0.2, 0.2, 5.7), and (iv) H´enonmap, investigated extensively for both binary and continuous inputs. The result shows that even if the number of the xt+1 = 1 1.4xt + 0.3xt−1. (92) qubits are small like 5-7 qubits the performance as pow- − erful as the echo state network of the 100-500 nodes have Regarding (i)-(iii), the time series is obtained by us- been reported both in short term memory and parity ing the fourth-order Runge-Kutta method with step size check capacities. Note that, although we do not go into 0.02, and only x(t) is employed as a target. For the time detail in this chapter, the technique called spatial multi- evolution of quantum reservoir, we employ a fully con- plexing [40], which exploits multiple quantum reservoirs nected transverse-field Ising model with common input sequence injected, is also introduced X to harness quantum dynamics as a computational re- H = JijXiXj + hZi, (93) source. Recently, QRC has been further investigated in ij Refs. [41, 71, 74]. Specifically, in Ref. [71], the authors use quantum reserovir computing to detect many-body where the coupling strengths are randomly chosen such entanglement by estimating nonlinear functions of dein- that Jij is distributed randomly from [ 0.5, 0.5] and sity operators like entropy. h = 1.0. The time interval and the number− of the virtual nodes are chosen to be τ = 4.0 and v = 10 so as to ob- tain the best performance. The first 104 steps are used D. Emulating chaotic attractors using quantum for training. After the linear readout weights are deter- dynamics mined, several 103 steps are predicted by autonomously evolving the quantum reservoir. The results are shown To see a performance of QRC, here we demonstrate in Fig. 4 for each of (a) Lorenz attractor, (b) the chaotic L attractor of Mackey-Glass system, (c) R¨osslerattractor, an emulation of chaotic attractors. Suppose xk k is a discretized time sequence being subject to a{ complex} and (d) H´enonmap. All these results show that training nonlinear equation, which might has a chaotic behavior. is done well and the prediction is successful for several In this task, the target, which the network is to output, hundreds steps. Moreover, the output from the quantum is defined to be reservoir also successfully reconstruct the structures of these chaotic attractors as you can see from the delayed y¯ = x = f( x k ). (84) phase diagram. k k+1 { j}j=1 11

(a) Lorenz attractor 1 1 0.9 0.9 0.8 0.8 0.7 0.6 0.7 0.5 0.4 0.6 xk-15 0.3 0.2 0.5 0.1 0.4 0 0.3 1 0.9 teacher 0.8 0.2 0.7 0.6 0.1 QRC 0 0.1 0.5 0.2 0.3 0.4 0 0.4 0.3 0.5 0.6 0.2 xk-6 0.7 0.8 0.1 9600 9800 10000 10200 10400 10600 10800 11000 0.9 0 timestep xk 1 (b) Mackey-Glass time series 1 0.9 0.9 0.8 0.8 0.7

0.7 0.6

0.6 0.5 0.5 xk-15 0.4 0.4 0.3 0.3 0.2 teacher 0.2 0.1 QRC 0.1 0 0 9600 9800 10000 10200 10400 10600 10800 11000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 timestep xk (c) Rössler attractor 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 xk-15 0.5 0.6 0.4 0.3 0.5 0.2 0.1 0.4 0

0.3 1 0.9 0.2 teacher 0.8 0.7 0.1 0.6 QRC 0 0.1 0.5 0.2 0.3 0.4 0 0.4 0.3 0.5 0.6 0.2 xk-6 9600 9800 10000 10200 10400 10600 10800 11000 0.7 0.1 0.8 0.9 timestep xk 1 0 (d) Hénon map 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 xk-1 0.4 0.3 0.3 0.2 teacher 0.2 0.1 QRC 0.1 0 0 9600 9800 10000 10200 10400 10600 10800 11000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 timestep xk 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 9900 9950 10000 10050 10100 timestep

FIG. 4. Demonstrations of chaotic attractor emulations. (a) Lorenz attractor. (b) Mackey-Glass system. (c) R¨osslerattractor. (d) H´enonmap. The dotted line shows the time step when the system is switched from teacher forced state to autonomous state. In the right side, delayed phase diagrams of learned dynamics are shown.

V. CONCLUSION AND DISCUSSION stand the power of a quantum enhanced feature space. ACKNOWLEDGEMENT Here we reviewed quantum reservoir computing and related approaches, quantum extreme learning machine and quantum circuit learning. The idea of quantum KF is supported by KAKENHI No.16H02211, JST reservoir computing comes from the spirit of reservoir PRESTO JPMJPR1668, JST ERATO JPMJER1601, computing, i.e., outsourcing information processing to and JST CREST JPMJCR1673. KN is supported by JST natural physical systems. This idea is best suited to PRESTO Grant Number JPMJPR15E7, Japan, by JSPS quantum machine learning on near-term quantum de- KAKENHI Grant Numbers JP18H05472, JP16KT0019, vices in NISQ (noisy intermediate quantum) era. Since and JP15K16076. KN would like to acknowledge Dr. reservoir computing uses complex physical systems as a Quoc Hoan Tran for his helpful comments. This work is feature space to construct a model by the simple linear supported by MEXT Quantum Leap Flagship Program regression, this approach would be a good way to under- (MEXT Q-LEAP) Grant No. JPMXS0118067394.

[1] R.P. Feynman, Simulating physics with computers, Int. [2] M.A. Nielsen and I. L. Chuang, Quantum computation J. Theor. Phys. 21, 467 (1982). and quantum information, (Cambridge university press 12

2010). [24] L. Appeltant, M. C. Soriano, G. Van der Sande, J. Danck- [3] K. Fujii, Quantum Computation with Topological Codes aert, S. Massar, J. Dambre, B. Schrauwen, C. R. Mirasso, -From Qubit to Topological Fault-Tolerance-, Springer- and I. Fischer, Information processing using a single dy- Briefs in Mathematical Physics (Springer-Verlag 2015). namical node as complex system. Nat. Commun. 2, 468 [4] P. W. Shor, Algorithms for quantum computation: Dis- (2011). crete logarithms and factoring, In Proceedings of the 35th [25] D. Woods and T. J. Naughton, Photonic neural net- Annual Symposium on Foundations of Computer Sci- works., Nat. Phys. 8, 257 (2012). ence, 124 (1994). [26] L. Larger, M. C. Soriano, D. Brunner, L. Appeltant, J. [5] R. Barends et al., Superconducting quantum circuits at M. Gutierrez, L. Pesquera, C. R. Mirasso, and I. Fischer, the surface code threshold for fault tolerance, Nature 508, Photonic information processing beyond Turing: an opto- 500 (2014). electronic implementation of reservoir computing, Optics [6] J. Kelly et al., State preservation by repetitive error detec- Express 20, 3241 (2012). tion in a superconducting quantum circuit, Nature 519, [27] Y. Paquot, F. Duport, A. Smerieri, J. Dambre, B. 66 (2015). Schrauwen, M. Haelterman, and S. Massar, Optoelec- [7] J. Preskill, Quantum Computing in the NISQ era and tronic Reservoir Computing, Sci. Rep. 2, 287 (2012). beyond., Quantum 2, 79 (2018). [28] D. Brunner, M. C. Soriano, C. R. Mirasso, and I. Fischer, [8] S. Boixo et al., Characterizing quantum supremacy in Parallel photonic information processing at gigabyte per near-term devices., Nature Physics 14, 595 (2018). second data rates using transient states, Nat. Commun. [9] J.I. Cirac and P. Zoller, Goals and opportunities in quan- 4, 1364 (2013). tum simulation, Nat. Phys. 8, 264 (2012). [29] K. Vandoorne, P. Mechet, T. V. Vaerenbergh, M. Fiers, [10] I. Bloch, J. Dalibard, and S. Nascimb`ene, Quantum sim- G. Morthier, D. Verstraeten, B. Schrauwen, J. Dambre, ulations with ultracold quantum gases, Nat. Phys. 8, 267 and P. Bienstman, Experimental demonstration of reser- (2012). voir computing on a silicon photonics chip Nat. Commun. [11] I. M. Georgescu, S. Ashhab, and F. Nori, Quantum sim- 5, 3541 (2014). ulation, Rev. of Mod. Phys. 86, 153 (2014). [30] A. Z. Stieg, A. V. Avizienis, H. O. Sillin, C. Martin- [12] T. Kadowaki and H. Nishimori, Quantum annealing in Olmos, M. Aono, and J. K. Gimzewski Emergent criti- the transverse Ising model, Phys. Rev. E 58, 5355 (1998). cality in complex turing B-type atomic switch networks, [13] E. Farhi, J. Goldstone, S. Gutmann, J. Lapan, A. Lund- Adv. Mater. 24, 286 (2012). gren, and D. Preda, A quantum adiabatic evolution al- [31] H. Hauser, A. J. Ijspeert, R. M. F¨uchslin, R. Pfeifer, and gorithm applied to random instances of an NP-complete W. Maass, Towards a theoretical foundation for morpho- problem, Science 292, 472 (2001). logical computation with compliant bodies Biol. Cybern. [14] T. F. Rønnow, Z. Wang, J. Job, S. Boixo, S. V. Isakov, D. 105, 355 (2011). Wecker, J. M. Martinis, D. A. Lidar, M. Troyer Defining [32] K. Nakajima, H. Hauser, R. Kang, E. Guglielmino, D. G. and detecting quantum speedup, Science 345, 420 (2014). Caldwell, and R. Pfeifer, Computing with a Muscular- [15] S. Boixo, T. F. Rønnow, S. V. Isakov, Z. Wang, D. Hydrostat System, Proceedings of 2013 IEEE Interna- Wecker, D. A. Lidar, J. M. Martinis, and M. Troyer, Evi- tional Conference on Robotics and Automation (ICRA), dence for quantum annealing with more than one hundred 1496 (2013). qubits, Nat. Phys. 10, 218 (2014). [33] K. Nakajima, H. Hauser, R. Kang, E. Guglielmino, D. [16] T. Morimae, K. Fujii, and J. F. Fitzsimons, Hardness G. Caldwell, and R. Pfeifer, A soft body as a reservoir: of classically simulating the one-clean-qubit model, Phys. case studies in a dynamic model of octopus-inspired soft Rev. Lett. 112, 130502 (2014). robotic arm Front. Comput. Neurosci. 7, 1 (2013). [17] K. Fujii, H. Kobayashi, T. Morimae, H. Nishimura, S. [34] K. Nakajima, T. Li, H. Hauser, and R. Pfeifer, Exploiting Tamate, and S. Tani, Power of Quantum Computation short-term memory in soft body dynamics as a computa- with Few Clean Qubits, Proceedings of 43rd International tional resource, J. R. Soc. Interface 11, 20140437 (2014). Colloquium on Automata, Languages, and Programming [35] K. Nakajima, H. Hauser, T. Li, and R. Pfeifer, Informa- (ICALP 2016), pp.13:1-13:14. tion processing via physical soft body, Sci. Rep. 5, 10487 [18] K. Fujii and S. Tamate, Computational quantum-classical (2015). boundary of complex and noisy quantum systems, Sci. [36] K. Caluwaerts, J. Despraz, A. I¸s¸cen, A. P. Sabelhaus, Rep. 6, 25598 (2016). J. Bruce, B. Schrauwen, and V. SunSpiral, Design and [19] M. Rabinovich, R. Huerta, and G. Laurent, Transient control of compliant tensegrity robots through simula- dynamics for neural processing Science 321, 48 (2008). tions and hardware validation, J. R. Soc. Interface 11, [20] J. Dambre, D. Verstraeten, B. Schrauwen, and S. Mas- 20140520 (2014). sar, Information processing capacity of dynamical sys- [37] Y. LeCun, Y. Bengio, and G. Hinton, Deep Learning Na- tems, Sci. Rep. 2, 514 (2012). ture 521, 436 (2015). [21] H. Jaeger and H. Haas, Harnessing nonlinearity: predict- [38] K. Fujii and K. Nakajima, Harnessing Disordered- ing chaotic systems and saving energy in wireless commu- Ensemble Quantum Dynamics for Machine Learning, nication, Science 304, 78 (2004). Phys. Rev. Applied 8, 024030 (2017). [22] W. Maass, T. Natschl¨ager,and H. Markram, Real-time [39] M. Negoro et al., Machine learning with controllable computing without stable states: a new framework for quantum dynamics of a nuclear ensemble in a solid, neural computation based on perturbations, Neural Com- arXiv:1806.10910 (2018). put. 14, 2531 (2002). [40] K Nakajima et al., Boosting computational power through [23] C. Fernando and S. Sojakka, Pattern recognition in a spatial multiplexing in quantum reservoir computing, bucket In Lecture Notes in Computer Science 2801, p. Phys. Rev. Applied 11, 034021 (2019). 588 (Springer, 2003). 13

[41] A Kutvonen, K Fujii, and T Sagawa, Optimizing a quan- (2019). tum reservoir computer for time series prediction, Sci [59] Y. Du et al., The expressive power of parameterized quan- Rep 10, 14687 (2020). tum circuits, Phys. Rev. Research 2, 033125 (2020). [42] Q. H. Tran and K. Nakajima, Higher-order quantum [60] M. Schuld et al., Evaluating analytic gradients on quan- reservoir computing, arXiv:2006.08999 (2020). tum hardware, Phys. Rev. A 99, 032331 (2019). [43] K. Mitarai et al., Quantum circuit learning, Phys. Rev. [61] K. Mitarai and K. Fujii, Methodology for replacing indi- A 98, 032309 (2018). rect measurements with direct measurements, Phys. Rev. [44] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, Extreme learn- Research 1, 013006 (2019). ing machine: theory and applications, Neurocomputing [62] J. G. Vidal, and D. O. Theis, Calculus on parameterized 70, 489 (2006). quantum circuits, arXiv:1812.06323 (2018). [45] D. Verstraeten, B. Schrauwen, M. D’Haene, and D. [63] A. Harrow and N. John Low-depth gradient measure- Stroobandt, An experimental unification of reservoir ments can improve convergence in variational hybrid computing methods, Neural Netw. 20, 391 (2007). quantum-classical algorithms, arXiv:1901.05374 (2019). [46] V. Havlicek et al, Supervised learning with quantum en- [64] J. R. McClean et al. Barren plateaus in quantum neural hanced feature spaces, Nature 567, 209 (2019). network training landscapes, Nature Communications 9, [47] E. Farhi and H. Neven, Classification with quantum neu- 4812 (2018). ral networks on near term processors, arXiv:1802.06002 [65] V. Bergholm et al., PennyLane: Automatic dif- (2018). ferentiation of hybrid quantum-classical computations, [48] M. Schuld et al., Circuit-centric quantum classifiers, arXiv:1811.04968 (2018). Phys. Rev. A 101, 032308 (2020). [66] Z.-Y. Chen et al., VQNet: Library for a Quantum- [49] H. Chen et al., Universal discriminative quantum neural Classical Hybrid Neural Network, arXiv:1901.09133 networks, arXiv:1805.08654 (2018). (2019). [50] W. Huggins et al., Towards Quantum Machine Learning [67] G. R. Steinbrecher et al., Quantum optical neural net- with Tensor Networks, Quantum Science and Technology works, npj Quantum Information 5, 60 (2019). 4, 024001 (2019). [68] N. Killoran et al., Continuous-variable quantum neural [51] I. Glasser, N. Pancotti, and J. I. Cirac, From probabilis- networks, Phys. Rev. Research 1, 033063 (2019). tic graphical models to generalized tensor networks for [69] C. M. Wilson et al., Quantum Kitchen Sinks: An algo- supervised learning, arXiv:1806.05964 (2018). rithm for machine learning on near-term quantum com- [52] Y. Du et al., Implementable Quantum Classifier for Non- puters, arXiv:1806.08321 (2018). linear Data, arXiv:1809.06056 (2018). [70] D. Zhu et al., Training of Quantum Circuits on a Hybrid [53] M. Benedetti et al., Adversarial quantum circuit learning Quantum Computer, Science Advances 5, 9918 (2019). for pure state approximation, New J. Phys. 21, 043023 [71] S. Ghosh et al., Quantum reservoir processing, npj Quan- (2019). tum Information 5, 35 (2019). [54] M. Benedetti et al., A generative modeling approach for [72] S. Ghosh, T. Paterek, and T. C. H. Liew, Quantum neu- benchmarking and training shallow quantum circuits, npj romorphic platform for quantum state preparation, Phys. Quantum Information 5, 45 (2019). Rev. Lett. 123, 260404 (2019). [55] J.-G. Liu and L. Wang, Phys. Rev. A 98, 062324 (2018). [73] S. Ghosh et al., Reconstructing quantum states with [56] H. Situ et al., Quantum generative adversarial network quantum reservoir networks, IEEE Trans. Neural Netw. for generating discrete data, Information Sciences 538, Learn. Syst., pp. 1–8 (2020). 193 (2020). [74] J. Chen and H. I. Nurdin, Learning Nonlinear Input- [57] J. Zeng et al., Learning and Inference on Generative Ad- Output Maps with Dissipative Quantum Systems, Quan- versarial Quantum Circuits, Phys. Rev. A 99, 052306 tum Information Processing 18, 198 (2019). (2019). [75] T. Goto, Q. H. Tran and K. Nakajima, Universal ap- [58] J. Romero and A. Aspuru-Guzik Variational quantum proximation property of quantum feature maps, arXiv: generators: Generative adversarial quantum machine 2009.00298 (2020). learning for continuous distributions, arXiv:1901.00848