<<

arXiv:1609.05884v2 [quant-ph] 2 Feb 2018 ilmil olwRf[]t ecielann rules.): learning describe to Ref.[2] follow mainly will h nuevsdlann,i hc h egtvco tthe at of vector example weight typical the a which ( in is learning, rule[4] target unsupervised learning lea called the unsupervised Hebbian response, in rules. required specified not ing a is it respon and However, the considered. neuron In between can learning: the distance rules unsupervised the of learning rules, and general, learning supervised In supervised as iteration. shoul categorized each weights be synaptic at the learni updated how learning the specifies be the process Moreover, function this networks. describes in cost neural biases rule this artificial and in of weights process minimization the of a mathemati composed Often, a describe generall function. and are neurons cost of algorithms activities the Learning r to algorithm. linked activation learning local structure a variables, the input-output and determined: network, are the followings of human the the ANN, I problems. of of multivariate for structure design solutions neural optimal find the to brain mimic which models tical Here, and ftemdli on ob ieri h ieo h weight the of size the in linear clas be the over to improvement algorithms. quadratic found a provides is This matrix. model network compl The the neural presented. is of artificial rule learning Widrow-Hoff for algorith the using quantu model estimation phase implementation the algorithm the quantum Combining with conventional problems. amplification the amplitude eigenvalue-related over the speed-ups for provide to known c eigenvalues. corres eigenvectors principal non-zero the the the i.e., matrix: by to weight formed the terms of infini ponents in In result eigenvalues. the formulas flatting eigenvect iterative while the matrix change weight not the do of formula, form Widrow-Hoff learning the the as models, computational such network on neural demands some heavy In makes resources. nodes many with works h anitrs fti ae,ilsrtsatpclsuper typical 6]: a 3, illustrates [2, paper, rule this learning of interest main the response. target the to closer output an eeie nvriy sabl uky mi:adaskin25 email: Turkey, Istanbul, University, Medeniyet j .Dsi swt h optrEgneigDprmn,Ista Department, Engineering Computer the with is Daskin A. rica erlntok AN 13 r dpiestatis- adaptive are [1–3] (ANN) networks neural Artificial nqatmcmuig h hs siainagrtmis algorithm estimation phase the computing, quantum In Abstract nteohrhn,Wdo-oflann ue5,wihis which rule[5], learning Widrow-Hoff hand, other the On And 1) + w unu mlmnainMdlfrArtificial for Model Implementation Quantum A x [ j t hieaini pae ytefloigfrua(We formula following the by updated is iteration th ] steiptvector, input the is stetre epne erigi endb getting by defined is Learning response. target the is ersnstewihsa the at weights the represents Telann rcs o ut aee erlnet- neural layered multi for process learning —The .I I. w TOUTO AND NTRODUCTION [ j + 1 w ] = [ j + w 1 ] [ j = ] − η w sapstv erigconstant learning positive a is ησ [ j ] ′ − ( v B )( ηt ACKGROUND j t x hiteration. th − . erlNetworks Neural y ) x @gmail.com , ponding ma Daskin Ammar y this ty, vised the n ules, exity t ,a m, ulas, sical om- is , nbul ors (1) (2) rn- cal ng se m d y s s qain oei arxfrsa follows: as forms matrix in come equations fipt,tres ciain,adotuscnb represe be can matrices outputs the and by activations, targets, inputs, of ly ta.1]hv ecie unu eso o princi for version analysis. quantum component a described have Furthermor al.[16] [13–15]. et possible Lloyd be data to shown big in been to have speed-ups set analysis 12] [11, data memory mapping access random or Usi quantum 10] fa algorithm[8]. computation[9, Shor’s search quantum Grover’s adiabatic e.g., classical and problems: algorithm[7] their particular toring over some for speed-up computationa com- counterparts computational quantum on provide in demands model heavy Algorithms makes resources. nodes putational many with networks where odiiiletmt fa ievco opouetephas the produce requires to eigenvector PEA an case, of general estimate the initial subcomponent good in a quantum a While as algorithms. used a Because other ubiquitously eigenvector. is of of matrix PEA approximate property, operator unitary this given evolution of a a of time for eigenvalue the Hamiltonian) the as of mainl value (considered algorithm phase The the problems. finds related alg eigenvalue classical in known the rithms over speed-ups fo computational vides need still is models[24]. new there introduced on results, is research promising further memory some associative Despite quantum [27]. a searc [8], Grover the algorithm using inspi Furthermore, 26]) references, computing. Ref.[25, quantum see of the confus (e.g. be algorithms list not classical should developed: the and models with These been review Ref.[24]). to also complete refer have please a networks (For neural analogou e.g.[18–23] artificial quantum different the few a of [17], with networks otetre,i h irwHf trto,tecag is change the iteration, Widrow-Hoff the error proportiona the in amount to an proportional target, by vector the input the to of direction the in h ciain h upt n h agtvle becomes values target the and output, viz., vectors: the activation, the imi function, sigmoid upto eli h osdrdnetwork, considered the specifies in which cell function a activation of the output of derivative the is where ntercn eae,priual eaigtenuosin neurons the relating particularly decades, recent the In neural layered multi for task learning the that known is It h unu hs siainagrtm(E)2]pro- (PEA)[28] algorithm estimation phase quantum The hl nteHbinieaintewih etri moved is vector weight the iteration Hebbian the in While hnteeaesvrliptadtre soitos h se the associations, target and input several are there When v W = W ersnstemti fsnpi weights. synaptic of matrix the represents x [ j T +1] v w , ,T V, T, X, y = steatvto fteotu eland cell output the of activation the is W σ and W ( [ v j [ +1] 1 = ) j t ] respectively. , − ( and t = − η / ( W 1+ (1 y σ Y ) ′ [ fw osdrmulti-neurons; consider we If . epciey hn h above the Then, respectively. , j ( ] V − exp ) ⊛ ηXT ( X − )( v T )) T , y . − = Y σ ) ( T v ) , .. the e.g., : e by red σ nted ′ ( the the pal (3) (4) ng ed v o- c- e; e, y h ) s 1 r t l l in some cases, it is able to find the phase by using an initial limj Φ[j] = I. Thus, in infinity, the learning process W →∞ T equal superposition state: e.g., Shor’s factoring algorithm [7]. ends up as: W[ ] = QQ . In Ref.[29], it is shown that a flag register can be used in the ∞ phase estimation algorithm to eliminate the ill-conditioned part B. Quantum Algorithms Used in the Model of a matrix by processing the eigenvalues greater than some In the following, we shall first explain two well-known threshold value. Amplitude amplification algorithm [8, 30– quantum algorithms and then describe how they are used in 32] is used to amplify amplitudes of certain chosen quantum Ref.[35] to obtain the linear combination of the eigenvectors. states considered. In the definition of quantum reinforcement 1) Quantum Phase Estimation Algorithm: The phase esti- learning [33, 34], states and actions are represented as quantum mation algorithm (PEA) [28, 36] finds an estimation for the states. And based on the observation of states a reward is phase of an eigenvalue of a given operator. In mathematical applied to the register representing actions. Later, the quantum terms, the algorithm seen in Fig.1 as a circuit works as follows: amplitude amplification is applied to amplify the amplitudes of An estimated eigenvector ϕ associated to the jth eigen- rewarded states. In addition, in a prior work [35] combining the • | j i value eiφj of a unitary matrix, U of order N is assumed amplitude amplification with the phase estimation algorithm, given. U is considered as a time evolution operator of we have showed a framework to obtain the eigenvalues in the Hamiltonian (H) representing the dynamics of the a given interval and their corresponding eigenvectors from quantum system: an initial equal superposition state. This framework can be ~ U = eitH/ , (6) used as a way of doing quantum principal component analysis (QPCA). where t represents the time and ~ is the Planck constant. For a given weight matrix W ; in As a result, the eigenvalues of U and H are related: while iφj linear auto-associators using the Widrow-Hoff learning rule; e is the eigenvalue of U, its phase φj is the eigenvalue during the learning process, the eigenvectors does not change of H. while the eigenvalues goes to one [2, 6]: i.e., The algorithm uses two quantum registers dedicated limj W[j] • converges to QQT , where Q represents the eigenvectors→∞ of to the eigenvalue and the eigenvector, respectively, W . Therefore, for a given input x, the considered network reg1 and reg2 with m and (n = log2N) number | i | i produces the output QQT x. In this paper, we present a quan- of qubits. The initial state of the system is set to tum implementation model for the artificial neural networks reg reg = 0 ϕ , where 0 is the first standard basis | 1i| 2i | i| j i | i by employing the algorithm in Ref.[35]. In particular, we vector. T show how to construct QQ x on quantum computers in Then, the quantum Fourier transform is applied to reg1 , • | i linear time. In the following section, we give the necessary which produces the following equal superposition state: description of Widrow-Hoff learning rule and QPCA described M 1 1 − in Ref.[35]. In Sec.III, we shall show how to apply QPCA to U reg reg = k ϕ , (7) QF T | 1i | 2i √ | i | j i the neural networks given by the Widrow-Hoff learning rule M kX=0 and discuss the possible implementation issues such as the where M =2m and k is the kth standard basis vector. circuit implementation of W , the preparation of the input x For each kth in the| i first register, a quantum operator, • k−1 as a , and determining the number of iterations U 2 , controlled by this qubit is applied to the second in the algorithm. In Sec.IV, we analyze the complexity of the register. This operation leads the first register to hold the whole application. Finally, in Sec.V, an illustrative example is discrete Fourier transform of the phase, φj . presented. The inverse quantum Fourier transform on the first reg- • ister produces the binary digits of φj . II. METHODS Finally, the phase is obtained by measuring the first • In this section, we shall describe the Widrow-Hoff learning register. rule and the quantum algorithms used in the paper. 2) Quantum Amplitude Amplification Algorithm: If a given ψ in N-dimensional Hilbert space can be | i A. Widrow-Hoff Learning rewritten in terms of some orthonormal states considered as the “good” and the “bad” parts of ψ as: For a linear autoassociator, i.e. Y = V , σ′(V )= I, and T = | i X; Widrow-Hoff learning rule given in Eq.(4), also known as ψ = sin(θ) ψgood + cos(θ) ψbad , (8) LMS algorithm, in matrix form can be described as follows | i | i | i [2, 3]: then amplitude amplification technique [8, 37, 38] can be used to increase the amplitude of ψgood in magnitude while T | i W[j] = W[j 1] + η(X W[j 1]X)X . (5) decreasing the amplitude of ψbad . The technique mainly con- − − − sists of two parts: the marking| andi the amplifying implemented This can be also expressed by using the eigendecomposition by two operators, respectively U and U . Here, U marks- of W = QΛQT : i.e., W = QΦ QT , where Φ = f ψ f [j] [j] [j] flips the sign of-the amplitudes of ψ and does nothing to [I (I ηΛ)j ]. Φ is called the eigenvalue matrix at good [j] ψ . U can be implemented as| a reflectioni operator when the− epoch−j. Based on this formulation, Widrow-Hoff error bad f |ψ i and ψ are known: correction rule only affects the eigenvalues and flattens them | goodi | badi 1 when η 2λ− (λ is the largest eigenvalue of W ): i.e., U = I 2 ψ ψ , (9) ≤ max max f − | goodi h good| 2 where I is an identity matrix. In the amplification part, the marked amplitudes are amplified by the application of the operator Uψ: U = I 2 ψ ψ (10) ψ − | i h | To maximize the probability of ψ , the iteration operator | goodi Fig. 1: The phase estimation part of the algorithm. G = UψUf is iteratively O(√N) times applied to the resulting state.

C. Quantum principal component analysis In Ref.[35], we have shown that combining PEA with the amplitude amplification, one can obtain eigenvalues in certain intervals. In the phase estimation part, the initial state of the registers Fig. 2: The general to find the principal is set to 0 0 . Then, the second register is put into the components of W .The dashed box indicates an iteration of equal superposition| i | i state 1/√N(1,..., 1)T . The phase esti- the amplitude amplification. mation process in this input generates the superposition of the eigenvalues on the first and the eigenvectors on the second register. In this final superposition state, the amplitudes for principal component analysis. For this purpose, we form Uf the eigenpairs are proportional to the norm of the projection in a way that marks only the non-zero eigenvalues and their of the input vector onto the eigenvector: i.e., the normalized corresponding eigenvectors: For zero eigenvalues ( in binary sum of the eigenvector elements. This part is represented by form (0 ... 0) ), the first register is in 0 = (1, 0, 0 ..., 0, 0)T | i UPEA and also involves the input preparation circuit, Uinput, state. Therefore, we need to construct a Uf which “marks” on the second register. the nonzero eigenvalues and does nothing to 0 . This can be | i In the amplification part, first, Uf is applied to the first done by using a vector f in the standard basis which has the register to mark the eigenvalues determined by the binary same non-zero coefficients| i for the all basis states except the values of the eigenvalues: For instance, if we want to mark first one: an eigenvalue equal to 0.25 in reg1 with 3 qubits, we use | i 0 Uf = I 2 010 010 since the binary form of 0.25 is (010) (the left− most| i bit h represents| the most significant bit). The 1 1 1 amplitudes of the marked eigenvalues are then amplified by Uf = I 2 f f , with f =   . (14) − | i h | | i µ . the application of Uψ with ψ representing the output of the . | i   phase estimation: 1   ψ = U reg reg = U 0 0 . (11) PEA 1 2 PEA Here, is a normalization constant equal to 1 . does | i | i | i | i | i µ √M 1 Uf − Using the above equation, Uψ can be implemented as: nothing when the first register in 0 state; however, it does not only flip the signs but also changes| i the amplitudes of the Uψ = I 2 ψ ψ = UPEAU0UPEA† , (12) − | i h | other states. Then, Uψ is applied for the amplification of the where U0 = I 2 0 0 . The amplitudes of the eigenvalues marked amplitudes. The iterative application of UψUf results a in the desired− region| i h are| further amplified by the iterative quantum state where the amplitude of 0 becomes almost zero | i application of the operator G = UψUf . At the end of this and the amplitudes of the other states becomes almost equal. process, a linear combination of the eigenvectors with the At this point, the second register holds QQT x which is the | i coefficients determined by the normalized sum of the vector expected output from the neural network. This is explained in elements of the eigenvectors are produced. In the following more mathematical terms below. section, we shall show how to apply this process to model the implementation of the neural networks based on the Widrow- Hoff learning rule. A. Details of the Algorithm Here, we assume that U = eiW t is given: Later, in Sec.III-D, III. APPLICATION TO THE NEURAL NETWORKS we shall also discuss how U may be obtained as a quantum Since the weight matrix in Widrow-Hoff learning rule circuit from a given W matrix. converges to the principal components in infinity[6]: i.e., Fig.2 shows the algorithm as a quantum circuit where the T W[ ] = QQ , the behavior of the trained network on some dashed lines indicates an iteration in the amplitude amplifi- ∞ input x can be concluded as: cation. At the beginning, U is applied to the initial state | i PEA T 0 0 . Note that UPEA includes also an input preparation W[ ] x = QQ x . (13) | i| i ∞ | i | i circuit, U , bringing the second register from 0 state input | i Our main purpose is to find an efficient way to implement to the input x . UPEA generates a superposition of the this behavior on quantum computers by using the quantum eigenvalues and| i associated eigenvectors, respectively, on the

3 The normalized Probability of (2µ|f i|ϕi) first and the second registers with the amplitudes defined by 1 the overlap of the eigenvector and the input x : | i N 1 0.8 − ψ = UPEA 0 0 = αj λj ϕj , (15) | i | i | i | i | i 0.6 Xj=0 where α = ϕ x . Probability 0.4 j h j | | i In the second part, the operator G = UψUf is applied to c = 0.1 T 0.2 c = 0.3 ψ iteratively until QQ x can be obtained on the second c = 0.5 | i | i c = 0.7 register. The action of Uf applied to ψ is as follows: 0 | i 0 5 10 15 N 1 Iteration − ψ1 = Uf ψ = (I 2 f f ) αj λj ϕj Fig. 3: The normalized probability of (2µ f ϕ ) through the | i | i − | i h | | i | i (16) | i | i Xj=0 iterations for different values of c. = ψ 2µ f ϕ¯ . | i− | i | i Here, assuming the first k number of eigenvalues are zero, the the probability for 0 becomes close to zero, the probability unnormalized state ϕ¯ is defined as: | i | i for the rest of the states becomes equal and so the total of N 1 these probabilities as shown in the bottom figure of each − ϕ¯ = αj ϕj . (17) subfigure becomes almost one. At that point, the fidelity found | i | i T Xj=k by reg QQ x also comes closer to one. h 2| | i

It is easy to see that ϕ¯ = QQT x , which is our target | i | i B. Number of Iterations output. When Uψ is applied to the output in Eq.(16), we simply change the amplitudes of ψ : Through the iterations, while the probability for 0 state | i goes to zero, the probabilities for the rest of the states| beci ome U ψ = U ( ψ 2µ f ϕ¯ ) ψ | 1i ψ | i− | i | i almost equal. This indicates that the individual states of each = (I 2 ψ ψ ) ψ 2µ (I 2 ψ ψ ) f ϕ¯ − | i h | | i− − | i h | | i | i qubit turns into the equal superposition state. Therefore, if N 1 the state of a qubit in the first register is in the almost equal − = ψ 2µ I 2 ψ αj ϕj λj  f ϕ¯ superposition state, then the success probability is very likely − | i− − | i h | h | | i | i Xj=0 to be in its maximum level. In the Hadamard basis, 0 and 1   | i | i = ψ 2µ f ϕ¯ +4µ2P ψ are represented in the equal superposition states as follows: − | i− | i | i f | i = (4µ2P 1) ψ 2µ f ϕ¯ . 0 + 1 0 1 f − | i− | i | i 0 = | i | i and 1 = | i − | i. (20) (18) | i √2 | i √2

Here, Pf is the initial success probability and equal to Therefore, using the Hadamard basis, if the probability of N 1 − 2. The repetitive applications of only changes the measuring 0 is close to one, in other words, if 1 is not j=k αj G | i | i Pamplitudes of ψ and f ϕ¯ : e.g., seen in the measurement, then the second register likely holds | i | i | i QQT x with a maximum possible fidelity. Fig.4 shows the G2 ψ = (c2 3c + 1) ψ (c 2)2µ f ϕ¯ | i | i − | i− − | i | i comparisons of the individual qubit probabilities (i.e., the G3 ψ = (c3 5c2 +6c 1) ψ (c2 4c + 3)2µ f ϕ¯ probability to see a qubit in the first register in 0 in the | i − − | i− − | i |(19)i Hadamard basis.) with the total probability observed| ini Fig.5f for the random case: As seen in the figure, the individual where 2 . The normalized probability of c = (4µ Pf 1) probabilities exhibit the same behavior as the total probability. f is presented− in Fig. 3 by using different values for (2µ ϕ ) Generally, obtaining a possible probability density of an (The| i | amplitudesi of and ( f ) are normalized.). The c ψ 2µ unknown quantum state is a difficult task. However, since amplitude of through| i the iterations| i of the amplitude ampli- ψ we are dealing with only a single qubit and does not require fication oscillates| i with a frequency depending on the overlaps the exact density, this can be done efficiently. For instance, of the input with the eigenvectors. When the amplitude of ψ if 0 is seen a number times in ten measurements, then the becomes close to zero, the second register in the remaining| i success| i probability is expected to be a/10. Here, the number part f is exactly T x and the first register is equal ϕ¯ QQ of measurements obviously determines the precision in the to f |. i | i | i obtained probability which may also affect the fidelity. Fig.5| i represents the iterations of the algorithm for a random 27 27 matrix with 27/2 number of zero eigenvalues and a × C. Error-Precision (Number of Qubits in reg ) random input x (MATLAB code for the random generation | 1i is given in Appendix| i A). In each subfigure, we have used The number of qubits, m, in the first register should be different numbers of qubits for the first register to see the sufficient to distinguish very small nonzero eigenvalues from effect on the results. The bar graphs in the subfigures shows the ones which are zero. In our numerical random experiments, the probability change for each state j , j =0 ... 1, of the first we have observed that choosing only six or five qubits are qubit (A different color tone indicates| i a different state.). When enough to get very high fidelity while not requiring a high

4 The Probability of Each Qubit After Hadamard Gates composition yields an approximation composed of (4κ) num- 1 Total Probability ber of Uxj matrices. Any Uxj can be implemented as a House- qubit−1 holder matrix by using O(2n) quantum operations which is qubit−2 0.9 qubit−3 linear in the size of xj [44–47]. qubit−4 qubit−5 qubit−6 E. Obtaining a solution from the output 0.8 Fidelity Generally, the amplitudes of the output vector (the final state of the second register) encodes the information needed 0.7 for the solution of the considered problem. Since obtaining Probability the full density of a quantum state is known to be very

0.6 inefficient for larger systems, one needs to drive efficient measurement schemes specific to the problem. For instance, for some problems, comparisons of the peak values instead of 0.5 the whole vectors may be enough to gauge a conclusion: In this case, since a possible outcome in a measurement would

0.4 be the one with an amplitude likely to be greater than most 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Iteration of the states in magnitude, the peak values can be obtained efficiently. However, this alone may not be enough for some Fig. 4: The probability to see a qubit in the first register in applications. 0 state after applying a Hadamard gate to the qubit and its Moreover, in some applications such as the spectral clus- comparison| i with the total probability and the fidelity given in tering problem, a superposition of vectors that are forming a Fig.5f. Note that the above separate curve is the fidelity. Since solution space for the problem can be used as an input state. there are only small differences between the probabilities on In that case, the measurement of the output in the solution the individual qubits and the total probability, the curves for space yields the solution for the problem. This method can be the probabilities mostly overlap. used efficiently (polynomial time complexity in the number of qubits) when the vectors describing the solution space are number of iterations. The impact of the number of qubits on tensor product of Pauli matrices. the fidelity and the probability is shown in Fig.5 in which each sub-figure is drawn by using different register sizes for IV. COMPLEXITY ANALYSIS the same random case. As seen in the figure, the number of The computational complexity of a quantum algorithm qubits also affects the required number of iterations: e.g., while is assessed by the total number of single gates and two for m = 3, the highest fidelity and probability are seen at qubit controlled NOT (CNOT) gates in the quantum circuit the fourth iteration; for m = 6, it happens around the ninth implementing the algorithm. We derive the computational iteration. complexity of the whole method by finding the complexities of Uf on the first register with m number qubits and Uψ on D. Circuit Implementation of W the second register with n number of qubits. We shall use M = 2m and N = 2n to describe the sizes of the operators The circuit implementation of W requires forming a quan- on the registers. tum circuit representing the time evolution of W : i.e., U = ei2πW t. When W is a sparse matrix, the circuit can be formed by following the method in Ref.[39]. However, when it is not A. The complexity of Uf T sparse but in the following form W = j xjxj , then the It is known that the number of quantum gates to implement exponential becomes equal to: P a Householder matrix is bounded by the size of the matrix Ref. T [44–47]. Therefore, the circuit for Uf requires O(M) CNOT i2πW t i2πt P xjxj U = e = e j . (21) gates since it is a Householder transformation formed by the To approximate the above exponential, we apply the Trotter vector f of size M. | i Suzuki formula [40–43] to decompose Eq.(21) into the terms T i2πtxjxj ¯ ¯ Uj = e = Uxj IUx†j , where I is a kind of identity B. The complexity of Uψ matrix with the first element set to ei2πt, and U is a unitary xj Uψ is equal to UPEAU0U † in which the total complexity matrix with the first row and column equal to x . For instance, PEA j will be typically governed by the complexity of UPEA. UPEA if the second order Trotter-Suzuki decomposition is applied to involves the Fourier transforms, input preparation circuit, and Eq.(21) (Note that the order of the decomposition impacts the the controlled U = eitW with different t values: accuracy of the approximation.), the following is obtained: The circuits for the quantum Fourier transform and its • κ T t κ T i2πt P =1 xjxj i2π 2 P =2 xjxj inverse are well known [36] and can be implemented on e j Uj e j Uj. (22) ≈   the first register in O(m2). Then, the same decomposition is applied to the term The input preparation circuit on the second register, κ T • i2πt/2 P =2 xjxj e j in the above equation. This recursive de- Uinput, can be implemented again as a Householder

5 transformation by using O(N) number of quantum gates. Therefore, (It can be also designed by following Sec. III.B. of +.333 0 .333 +.333 Ref.[48]: In that case, for every two vector elements, a − T  010 0  controlled rotation gate is used to construct U with W[ ] = QQ = (26) input ∞ .333 0 +.333 .333 the initial row equal to x; thus, 0 x .) − −  Uinput = +.333 0 .333 +.333 The circuit complexity of U = eitW |isi highly| i related  −  • to the structure of W . When W of order N is sparse We use the following Trotter-Suzuki formula [40–43] to com- enough: i.e., the number of nonzero entries is bounded by pute the exponential of W = XX′: x xT x xT x xT some polynomial of the number of qubits, poly(n); then U = ei2πW eiπ 2 2 ei2π 1 1 eiπ 2 2 (27) W can be simulated by using only O(poly(n)) number ≈ of quantum gates [39, 49, 50]. However, when W is not In the simulation for a random input x , the comparison of T | i sparse but equal to x x T , then as shown in Sec. III-D, W[ ] x = QQ x with the output of the second register in i i ∞ | i | i we use Trotter-SuzukiP formula which yields a product of the quantum model yields the fidelity. For two different ran- (4κ) number of Ux matrices with 1 j κ. Since Ux dom inputs, the simulation results in each iteration are shown j j T can be implemented as a Householder≤ transformation≤ by in Fig.6a and Fig.6b for x = (.3517 .3058 .6136 .6374) | i using O(N) quantum gates, U requires O(κN) quantum and x = (.7730 .1919 .1404 .5881)T , respectively. | i gates. If we combine all the above terms, the total complexity can VI. CONCLUSION be concluded as: The weight matrix of the networks based on the Widrow- O(κN + M). (23) Hoff learning rule converges to QQT , where Q represents This is linear in system-size, however, exponential in the the eigenvectors of the matrix corresponding to the nonzero number of qubits involved in either one of the registers. In eigenvalues. In this paper, we have showed how to apply the quantum principal component analysis method described comparison, any classical method applied to obtain QQT x at in Ref.[35] to artificial neural networks using the Widrow- least requires O(N 2) time complexity because of the matrix vector multiplication. Therefore, the quantum model presented Hoff learning rule. In particular, we have show that one can implement an equivalent quantum circuit which produces here may provide a quadratic speed-up over the classical T methods for some applications. the output QQ x for a given input x in linear time. The When the weight matrix is sparse or the data is given implementation details are discussed by using random cases, as a quantum states, it can be implemented in O(poly(n)). and the computation complexity is analyzed based on the Therefore, the whole complexity becomes linear in the number number of quantum gates. In addition, a simple numerical of qubits, which may provide an exponential speed-up over the example is presented. The model is general and requires only classical algorithms. However, when the weight matrix is not linear time computational complexity in the size of the weight sparse, the complexity becomes exponential in the number of matrix. qubits. The current experimental research by big companies such as Google and IBM aims to build 50 qubit operational APPENDIX quantum computers [51]. Because of the limitations of the The random matrix used in the numerical example is current quantum computer technology, when the required generated by the following MATLAB code snippet: number of qubits goes beyond 50, the applications of the %number of non-zero eigenvalues algorithm becomes infeasible. npc = ceil(N/2); V. AN ILLUSTRATIVE EXAMPLE d = rand(N,1);%random eigenvalues d(npc+1:end) = 0; Here, we give a simple example to show how the algorithm %random eigenvectors works: Let us assume, we have given weights represented by [Qfull,˜] = qr(randn(N)); the columns of the following matrix [52]: %the unitary matrix in PEA 1 +1 − U = Qfull*diag(exp(1i*2*pi*d))*Qfull’; 1  1 1 X = − − , (24) %normalized input vector 10 × +1 1  −  x = rand(N,1); x = x/norm(x);  1 +1 −  1 REFERENCES where we scale the vectors by 10 so as to make sure that the eigenvalues of W are less than one. To validate the simulation [1] Simon S Haykin. Neural networks and learning ma- results, first, W[ ] is classically computed by following the chines, volume 3. Pearson Upper Saddle River, NJ, ∞ singular value decomposition of X: USA:, 2009. +.5774 0 [2] Herve Abdi. Linear algebra for neural networks. Interna-  0 1 .2495 0 .7071 +.7071 tional encyclopedia of the social and behavioral sciences. QΦP T = − . .5774 0  0 .14142  .7071 .7071 Elsevier, Oxford UK, 2001. −  − − +.5774 0 [3] Herve Abdi, Dominique Valentin, Betty Edelman, and   (25) Alice J O’Toole. More about the difference between

6 men and women: evidence from linear neural networks [19] Rigui Zhou, Huian Wang, Qian Wu, and Yang Shi. Quan- and the principal-component approach. Perception, 24 tum associative neural network with nonlinear search (5):539–562, 1995. algorithm. International Journal of Theoretical Physics, [4] R.G.M Morris. D.o. hebb: The organization of behavior, 51(3):705–723, 2012. ISSN 1572-9575. doi: 10.1007/ wiley: New york; 1949. Brain Research Bulletin, 50(56): s10773-011-0950-4. 437 –, 1999. ISSN 0361-9230. doi: http://dx.doi.org/10. [20] Sanjay Gupta and R.K.P. Zia. Quantum neural networks. 1016/S0361-9230(99)00182-3. Journal of Computer and System Sciences, 63(3):355 – [5] Bernard Widrow, Marcian E Hoff, et al. Adaptive 383, 2001. ISSN 0022-0000. switching circuits. In IRE WESCON convention record, [21] M Andrecut and MK Ali. A quantum neural network volume 4, pages 96–104. New York, 1960. model. International Journal of Modern Physics C, 13 [6] Herve Abdi, Dominique Valentin, Betty Edelman, and (01):75–88, 2002. Alice J O‘Toole. A widrow-hoff learning rule for a [22] MV Altaisky. Quantum neural network. arXiv preprint generalization of the linear auto-associator. Journal of quant-ph/0107012, 2001. Mathematical Psychology, 40(2):175–182, 1996. [23] Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione. [7] Peter W Shor. Algorithms for quantum computation: Quantum walks on graphs representing the firing patterns Discrete logarithms and factoring. In Foundations of of a quantum neural network. Phys. Rev. A, 89:032333, Computer Science, 1994 Proceedings., 35th Annual Sym- Mar 2014. posium on, pages 124–134. IEEE, 1994. [24] Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione. [8] Lov K Grover. A fast quantum mechanical algorithm The quest for a quantum neural network. Quantum for database search. In Proceedings of the twenty-eighth Information Processing, 13(11):2567–2586, 2014. annual ACM symposium on Theory of computing, pages [25] Noriaki Kouda, Nobuyuki Matsui, Haruhiko Nishimura, 212–219. ACM, 1996. and Ferdinand Peper. Qubit neural network and its [9] Hartmut Neven, Vasil S Denchev, Geordie Rose, and learning efficiency. Neural Computing & Applications, William G Macready. Training a binary classifier 14(2):114–121, 2005. with the quantum adiabatic algorithm. arXiv preprint [26] Panchi Li, Hong Xiao, Fuhua Shang, Xifeng Tong, Xin arXiv:0811.0416, 2008. Li, and Maojun Cao. A hybrid quantum-inspired neural [10] Hartmut Neven, Vasil S Denchev, Geordie Rose, and networks with sequence inputs. Neurocomputing, 117: William G Macready. Training a large scale classifier 81–90, 2013. with the quantum adiabatic algorithm. arXiv preprint [27] Dan Ventura and Tony Martinez. A Quantum Associative arXiv:0912.0779, 2009. Memory Based on Grover’s Algorithm, pages 22–27. [11] Esma A¨ımeur, Gilles Brassard, and S´ebastien Gambs. Springer Vienna, Vienna, 1999. Quantum speed-up for unsupervised learning. Machine [28] Alexei Kitaev. Quantum measurements and the abelian Learning, 90(2):261–287, 2013. stabilizer problem. Electronic Colloquium on Computa- [12] Seth Lloyd, Silvano Garnerone, and Paolo Zanardi. tional Complexity (ECCC), 3(3), 1996. Quantum algorithms for topological and geometric anal- [29] Aram W Harrow, Avinatan Hassidim, and Seth Lloyd. ysis of . arXiv preprint arXiv:1408.3106, 2014. Quantum algorithm for linear systems of equations. [13] Patrick Rebentrost, Masoud Mohseni, and Seth Lloyd. Physical review letters, 103(15):150502, 2009. Quantum support vector machine for big data classifica- [30] Lov K Grover. Quantum computers can search rapidly tion. Physical review letters, 113(13):130503, 2014. by using almost any transformation. Physical Review [14] Peter Wittek. Learning: What Quan- Letters, 80(19):4329, 1998. tum Computing Means to Data Mining. Academic Press, [31] Michele Mosca et al. Quantum searching, counting 2014. and amplitude amplification by eigenvector analysis. In [15] Maria Schuld, Ilya Sinayskiy, and Francesco Petruccione. MFCS98 workshop on Randomized Algorithms, pages for pattern classification. In PRICAI 90–100, 1998. 2014: Trends in Artificial Intelligence, pages 208–220. [32] Gilles Brassard, Peter Hoyer, Michele Mosca, and Alain Springer, 2014. Tapp. Quantum amplitude amplification and estimation. [16] Seth Lloyd, Masoud Mohseni, and Patrick Rebentrost. Contemporary Mathematics, 305:53–74, 2002. Quantum principal component analysis. Nature Physics, [33] CL Chen, DY Dong, and ZH Chen. Quantum compu- 10(9):631–633, 2014. tation for using reinforcement learning. [17] A. Manju and M. J. Nigam. Applications of quantum International Journal of , 4(06): inspired computational intelligence: a survey. Artificial 1071–1083, 2006. Intelligence Review, 42(1):79–156, 2014. ISSN 1573- [34] D. Dong, C. Chen, H. Li, and T. J. Tarn. Quantum 7462. reinforcement learning. IEEE Transactions on Systems, [18] Adenilton Jos da Silva, Teresa Bernarda Ludermir, and Man, and Cybernetics, Part B (Cybernetics), 38(5):1207– Wilson Rosa de Oliveira. Quantum over 1220, Oct 2008. ISSN 1083-4419. doi: 10.1109/TSMCB. a field and neural network architecture selection in a 2008.925743. quantum computer. Neural Networks, 76:55 – 64, 2016. [35] Ammar Daskin. Obtaining a linear combination of the ISSN 0893-6080. principal components of a matrix on quantum computers.

7 Quantum Information Processing, pages 1–15, 2016. doi: York, NY, USA, 2003. ACM. ISBN 1-58113-674-9. doi: 10.1007/s11128-016-1388-7. 10.1145/780542.780546. [36] Michael A Nielsen and Isaac L Chuang. Quantum com- [50] Andrew M. Childs and Robin Kothari. Simulating sparse putation and quantum information. Cambridge university hamiltonians with star decompositions. In Theory of press, 2010. Quantum Computation, Communication, and Cryptogra- [37] Gilles Brassard and Peter Hoyer. An exact quantum phy, volume 6519 of Lecture Notes in Computer Science, polynomial-time algorithm for simon’s problem. In pages 94–103. Springer Berlin Heidelberg, 2011. ISBN Theory of Computing and Systems, 1997., Proceedings 978-3-642-18072-9. of the Fifth Israeli Symposium on, pages 12–23. IEEE, [51] D Castelvecchi. Quantum computers ready to leap out 1997. of the lab in 2017. Nature, 541(7635):9, 2017. [38] Gilles Brassard, Peter Hoyer, Michele Mosca, and Alain [52] Herve Abdi, D Valentin, and B Edelman. Neural Tapp. Quantum amplitude amplification and estimation. Networks-Quantitative Applications in the Social Sci- Contemporary Mathematics, 305:53–74, 2002. ences. Sage University paper, Series, 1999. [39] DominicW. Berry, Graeme Ahokas, Richard Cleve, and BarryC. Sanders. Efficient quantum algorithms for sim- ulating sparse hamiltonians. Communications in Math- ematical Physics, 270(2):359–371, 2007. ISSN 0010- 3616. doi: 10.1007/s00220-006-0150-x. [40] Hale F Trotter. On the product of semi-groups of operators. Proceedings of the American Mathematical Society, 10(4):545–551, 1959. [41] Masuo Suzuki. Generalized trotter’s formula and sys- tematic approximants of exponential operators and inner derivations with applications to many-body problems. Communications in Mathematical Physics, 51(2):183– 190, 1976. [42] Naomichi Hatano and Masuo Suzuki. Finding expo- nential product formulas of higher orders. In and other optimization methods, pages 37–68. Springer, 2005. [43] David Poulin, Matthew B. Hastings, Dave Wecker, Nathan Wiebe, Andrew C. Doberty, and Matthias Troyer. The trotter step size required for accurate quantum sim- ulation of . Quantum Info. Comput., 15(5-6):361–384, April 2015. ISSN 1533-7146. [44] Peter A Ivanov, ES Kyoseva, and NV Vitanov. Engineer- ing of arbitrary u (n) transformations by quantum house- holder reflections. Physical Review A, 74(2):022323, 2006. [45] Jes´us Ur´ıas and Diego A Qui˜nones. Householder meth- ods for quantum circuit design. Canadian Journal of Physics, 93(999):1–8, 2015. [46] Peter A Ivanov and Nikolay V Vitanov. Synthesis of arbitrary unitary transformations of collective states of trapped ions by quantum householder reflections. Phys- ical Review A, 77(1):012335, 2008. [47] Stephen S Bullock, Dianne P OLeary, and Gavin K Brennen. Asymptotically optimal quantum circuits for d-level systems. Physical review letters, 94(23):230502, 2005. [48] Anmer Daskin, Ananth Grama, Giorgos Kollias, and Sabre Kais. Universal programmable quantum circuit schemes to emulate an operator. The Journal of chemical physics, 137(23):234112, 2012. [49] Dorit Aharonov and Amnon Ta-Shma. Adiabatic quan- tum state generation and statistical zero knowledge. In Proceedings of the thirty-fifth annual ACM symposium on Theory of computing, STOC ’03, pages 20–29, New

8 The Probabilities of the States in the First Register The Probabilities of the States in the First Register 1 0.8 1 10 0.8 0 0.6 01 00 0.6 0 0.4 11 0.4 1 10 Probability Probability 0 0.2 0.2 01 0 00 0 0 0 5 10 15 20 25 0 5 10 15 20 25 Iteration Iteration The Total Probability and the Fidelity The Total Probability and the Fidelity 1 1 Probability Probability 0.8 Fidelity Fidelity 0.8

0.6 0.6 0.4 Probability Probability 0.4 0.2

0 0.2 0 5 10 15 20 25 0 5 10 15 20 25 Iteration Iteration (a) m is 1. (b) m is 2.

The Probabilities of the States in the First Register The Probabilities of the States in the First Register 0.8 0.8 111 1001 101 0100 0.6 0.6 011 1110 001 1001 0.4 0.4 111 0100 1110 Probability 101 Probability 0.2 0.2 1001 011 0100 001 0 0 0 5 10 15 20 25 0 5 10 15 20 25 Iteration Iteration The Total Probability and the Fidelity The Total Probability and the Fidelity 1 1 Probability Probability Fidelity 0.9 Fidelity 0.8 0.8

0.6 0.7

Probability Probability 0.6 0.4 0.5

0.2 0.4 0 5 10 15 20 25 0 5 10 15 20 25 Iteration Iteration (c) m is 3. (d) m is 4.

The Probabilities of the States in the First Register The Probabilities of the States in the First Register 0.8 0.8 11101 111011

0.6 10011 0.6 100111

01001 010011 0.4 0.4 11101 111011

Probability 10011 Probability 100111 0.2 0.2 01001 010011

0 0 0 5 10 15 20 25 0 5 10 15 20 25 Iteration Iteration The Total Probability and the Fidelity The Total Probability and the Fidelity 1 1 Probability Probability 0.9 Fidelity 0.9 Fidelity

0.8 0.8

0.7 0.7

Probability 0.6 Probability 0.6

0.5 0.5

0.4 0.4 0 5 10 15 20 25 0 5 10 15 20 25 Iteration Iteration (e) m is 5. (f) m is 6. Fig. 5: The probability changes in the iteration of the amplitude amplification for a random 27 27 matrix with 27/2 number of zero eigenvalues and a random input x (MATLAB code9 for the random generation is given× in Appendix A). In each subfigure, we have used different numbers| ofi qubits, m, for the first register to see the effect on the results. The bar graphs in the subfigures shows the probability change for each state j , j = 0 ... 1, of the first qubit. For each state, a different color tone is used. | i The Probabilities of the States in the First Register The Probabilities of the States in the First Register 1 0.5 111011 111011 0.8 0.4 100111 100111 0.6 010011 0.3 010011 111011 111011 0.4 0.2

Probability 100111 Probability 100111

0.2 010011 0.1 010011

0 0 0 5 10 15 20 25 0 5 10 15 20 25 Iteration Iteration The Total Probability and the Fidelity The Total Probability and the Fidelity 1 1 Probability Probability 0.8 Fidelity 0.9 Fidelity

0.6 0.8

0.4 0.7 Probability Probability

0.2 0.6

0 0.5 0 5 10 15 20 25 0 5 10 15 20 25 Iteration Iteration (a) For the generated random input |xi = (.3517 .3058 .6136 .6374)T . (b) For the generated random input |xi = (.7730 .1919 .1404 .5881)T . Fig. 6: The simulation results of the quantum model for the example in Sec.V with two different input vectors.

10