<<

arXiv:1804.07633v3 [quant-ph] 17 Jul 2018 h esrmn nteotu falyri sdt decide t to used has used Ref.[21] is addition, layer In layers. a is hidden of which to output inputs state the the the in the to inputs measurement from and The weights solution mapping employs directly extract by them prepared to of sea [20] Grover some uses variational algorithm Ref.[19] [18], instance, on For [17], algorithms: based quantum the [16], are and [15], algorithms [10] circuits[14], many recent While [9], theory): the learning [8], of quantum general articles recent on review [13] subroutine in survey the and momentum big proposed(see algorithms gained quantum learning are [12]) and quantum Various [11], [10] years. (e.g. [9], and analysis [8], [5]) research data learning [7], [4], [6], [3], (e.g. quantum quantum models on network discussions neural general and [2]) (e.g. algorithms. provid thes recent classical that over of of algorithms speed-up novel use focus computational by complete possible main only the are the [1], computers are efforts research experimental many quant with useful building computers Although computational engineering. problems and intractable powerful science many net- solve in more to these allow be may in which to machines believed layers Quant considered are dramatically. the cost computers the and computational the However, parameters increase works parameters. input input of numbers the constan human-brain. weight adjusting to a by of done applied is mechanism networks in learning training the The mimic that nents networks. xoeta peu vrtesm tutrdcasclneu provide classical may structured it networ same and the problems the fo net. learning over indicate used machine speedup results are in exponential datasets used numerical and cancer be The inputs breast can and the simulations. iris of grad function the the then combinations activation through and linear described periodic descent, is the a of The uses weights. values network cosine the of that show we rpsdnua e.W ecietentoki em fa the neu of classical in terms equivalent in parameters involves its network draw these which the then to net describe and We circuit, applied quantum net. weights neural of proposed number the ae:Here, gates: eursonly requires lhuhmn al fot odsrb unu quantum describe to efforts early many Although compo- non-linear many of composed are networks Neural ne Terms Index Abstract ipeQatmNua e ihaPeriodic a with Net Neural Quantum Simple A I hsppr epooeasml erlntthat net neural simple a propose we paper, this —In n O qatmmcielann,qatmneural quantum learning, machine —quantum ( stenme fiptprmtr,and parameters, input of number the is nlog O 2 k ( ) k n ubro uisand qubits of number ) oe ntehde ae.Then, layer. hidden the in nodes ciainFunction Activation eateto optrEngineering Computer of Department mi:adaskin25-at-gmail-dot-com Email: O sablMdnytUniversity Medeniyet Istanbul ( nk aio,Itnu,Turkey Istanbul, Kadikoy, ) quantum ma Daskin Ammar k ient rch um um ral ral he es in ts s. is k e r s h opeiyo h ewr n hnso h numerical the sets. show data then different and two present network for then We the simulations parameters. net of input neural complexity the a of the va sum to cosine weighted the relates the involving it of function activation that periodic show a and with stat -output the analyze the we of circuit, quantum by a net descr as network After neural the phenomenon. a quantum the present utilizing efficient we fully paper, more this computationally In design algorithms. to us allows for help can space algorithms. Hilbert learning into based explai data kernel also to mapping is how used It Ref.[23] [22]. be nets in mai can neural a quantum circuit as of behavior success block nonlinear building until with perceptron repeat quantum a a this create that netwo processing neural shown classical and the is the to quantum It similar way into the nonlinear a to networks in algorithms realm data neural mapping classical data artificial the robust of from providing power by is realm full algorithms quantum gates. learning the phase quantum tap current by to in implemented the problem of are main register weights The second the the to and mapped percep algorithm is classical input a binary of the output where the imitate to estimation phase { ucinsc stefloigoe(e e.2]frasmooth a for Ref.[24] (see one activatio following an introduction): the by neu determined as multiple is such neuron into function each fed of output are The weights rons. different with parameters xetdt ese with seen be to expected efrac fnua esi eti plctos[5,[2 [25], general applications weight certain the argued in also any improve [27]. nets been in may neural has of functions change It performance output. activation small the output periodic a in the that change i.e. the small smoother: a make causes to neuron used a commonly of more sigm are and hyperbolic functions as such functions activation Nonlinear w h uepsto soeo h hsclpeoeathat phenomena physical the of one is superposition The ncasclnua ewrs iercmiain finput of combinations linear networks, neural classical In ee e sfis sueta niptparameter input an that assume first us let Here, j 1 w , . . . , output jk , = } .Q I. ntentok o ahipt ewill we input, each For network. the in , UANTUM  0 1 if if k N P P ubro ifrn weights different of number EURAL j j w w j j x x j j N ≤ > ET threshold threshold x ibing j lues tron rks. ned oid (1) 6], is n n e - construct the following to represent the input behavior Here, αj describes the phase value of the jth eigenvalue of of a parameter xj : . After the second Hadamard gate, the final state reads the U iw 1 x following: e j j iwj2xj  e  N N Uxj = . (2) 1 iαj iαj  ..   0 1+ e j + 1 1 e j  . (7)   2√N | i | i | i − | i  eiwjk xj  Xj  Xj      Since Uxj is a k dimensional matrix, for each input xj , we If we measure the first qubit, the probability of seeing 0 and | i employ log2 k number of qubits. Therefore, n-input parame- 1 , respectively P0 and P1, can be obtained from the above | i ters lead to n number of Uxj and require n log2 k number of equation as: qubits in total: This is depicted by the following circuit:

N / Ux1 / 1 iαj 2 1 P0 = 1+ e = (1 + cos(αj )) , (8) 4N | | 2N / Ux2 / Xj Xj N . 1 1 . iαj 2 P1 = 1 e = (1 cos(αj )) . (9) / Uxn / 4N | − | 2N − Xj Xj We can also describe the above circuit by the following tensor (10) product:

(ω, x)= Ux2 Ux2 Ux . (3) If a threshold function is applied to the output, then U ⊗ ⊗···⊗ n In matrix form, this is equal to: 0 if P1 P0 n z = (11) i w 1 x ≤ e Pj j j  1 if P1 > P0 i P wj1xj +wn2xn  e j  Here, applying the measurement a few times, we can also . . (4)  ..  obtain enough statistics for P0 and P1; and therefore describe   i P wjk xj z as the success probability of the desired output: i.e., z = P .  e j  d   The whole circuit can be also represented as an equivalent The diagonal elements of the above matrix describe an in- neural net shown in Fig.2. In the figure, f is the activation put with different weight-parameter combinations. Here, each function described by: combination is able to describe a path (or a neuron in the hidden layer) we may have in a neural net. The proposed f(α)=1 cos(α). (12) network with 1-output and n-inputs is constructed by plugging − this matrix into the circuit drawn in Fig.1. ω11 ✌ 0 H H ✌✌ z f(Σ) | i • x1 ωN12 ψ / (ω, x) / f(Σ) | i U z ωN21 Σ Fig. 1: The proposed with 1-output x2 f(Σ) and n-input parameters. ωN22 f(Σ) In the circuit, initializing ψ as an equal superposition state N allows the system qubits to equally| i impact the first qubit which Fig. 2: The equivalent representation of the quantum neural yields the output. In order to understand how this might work net for two input parameters and two weights for each input: as a neural net, we will go through the circuit step by step: i.e. n =2 and k =2. At the beginning, the initial input to the circuit is defined by: N 1 0 ψ = 0 j , (5) A. The Cost Function | i | i √N | i | i Xj We will use the following to describe the cost of the network where N = kn describing the matrix dimension and j is the for one sample: th vector in the standard basis. After applying the Hadamard| i j s gate and the controlled U(ω, x) to the first qubit, the state 1 2 C = (dj zj) , (13) becomes 2s − Xj N N 1  0 j + 1 eiαj j  . (6) where dj is the desired output for the jth sample and s is the √ | i | i | i | i 2N Xi Xj size of the training dataset.  

2 B. Backpropagation with IV. DISCUSSION The update rule for the weights is described by the follow- A. Adding Biases ing: Biases can be added to a few different places in Fig.1. As an ∂C ωi = ωi η . (14) example, for input xj , we can apply a gate Ubj with diagonal − ∂wi phases representing biases to Uxj . One can also add a bias Here, the partial derivative can be found via chain rule: For gate to the output qubit before the measurement. instance, from Fig.2 with an input x1, x2 , we can obtain { } B. Generalization to Multiple Output the gradient for the weight ω11 as (the constant coefficients omitted): Different means may be considered to generalize the net- work for multiple outputs. As shown in Fig.3, one can gener- ∂Cj ∂Cj ∂zj ∂α 2 = (dj zj)P x1. (15) alize the network by sequential applications of j s. Here, a j dj U U ∂ω11 ∂zj ∂α ∂ω11 ≈ − represents a generalized multi-qubit phase gate controlled by II. COMPLEXITY ANALYSIS the jth qubit representing the jth output. In the application of phase gates, the phases are kicked back to the control qubit. The computational complexity of any Therefore, although all j s operate on the same qubit, the is determined by the number of necessary single and CNOT U gradient for the parameters of each j is independent since gates and the number of qubits. The proposed network in Fig.1 the phases are kicked back to the differentU control qubits. only uses nlog2k +1 number of qubits. In addition, it only has nk controlled phase gates (k number of gates for each ✌ 0 H H ✌✌ z1 input.) and two Hadamard gates. Therefore, the complexity is | i • ✌ bounded by O(nk). 0 H H ✌✌ z2 A simulation of the same network on classical computers | i. . • ...... would require exponential overhead since the size of (ω, x) is ✌✌ n n U 0 H H ✌ zm k and the classical equivalent network involves k neurons in | i • the hidden layer. Therefore, the proposed quantum model may ψ / 1 2 ... m / provide exponential speed-up for certain structured networks. | i U U U Fig. 3: The generalized neural net with m-output and n-input III. SIMULATION OF THE NETWORKS FOR PATTERN parameters. RECOGNITION

The circuit given Fig.1 is run for two different simple data V. CONCLUSION sets: breast cancer(699 samples) and iris flowers(100 samples for two flowers) datasets (see [28] for datasets). For iris-dataset In this paper, we have presented a quantum circuit which can be used to efficiently represent certain structured neural we only use the samples for two flowers. The input parameters networks. While the circuit involves only number of are mapped into the range [ 1, 0]. Then, for each η value O(nk) quantum gates, the numerical results show that it can be and dataset, 80% of the whole− sample dataset is randomly used in machine learning problems successfully. We showed chosen for training, the remaining 20% of the dataset is used for testing. that since the simulation of the equivalent classical neural net involves n neurons, the presented quantum neural net Fig.4 and Fig.5 show the evaluations of the cost function k may provide exponential speed-up in the simulation of certain in each epoch (batch learning is used). Since (d z ) in j j neural network models. Eq.(15) is always positive and because of the periodicity− of the , as expected the cost function oscillates REFERENCES between maximum and minimum points and finally settle (if [1] M. Mohseni, P. Read, H. Neven, S. Boixo, V. Denchev, R. Babbush, the iteration number is large enough) at some middle point. A. Fowler, V. Smelyanskiy, and J. Martinis, “Commercialize quantum The accuracy of each trained network is also listed in technologies in five years,” Nature News, vol. 543, no. 7644, p. 171, 2017. TABLE I. As seen from the table, the network is able to almost [2] M. Lewenstein, “Quantum ,” Journal of Modern Optics, completely differentiate the inputs belong to two different vol. 41, no. 12, pp. 2491–2501, 1994. classes. [3] M. Altaisky, “Quantum neural network,” arXiv preprint quant- ph/0107012, 2001. TABLE I: Accuracy of the network trained with different [4] A. Narayanan and T. Menneer, “Quantum artificial neural network architectures and components,” Information Sciences, vol. 128, no. 3, learning rates pp. 231 – 255, 2000. [5] E. C. Behrman, L. Nash, J. E. Steck, V. Chandrashekar, and S. R. Skin- η 0.05 0.1 0.25 0.5 0.75 1 ner, “Simulations of quantum neural networks,” Information Sciences, vol. 128, no. 3-4, pp. 257–269, 2000. Iris(test) 99% 100% 99% 99% 100% 99% [6] S. Kak, “On quantum neural computing,” Information Sciences, vol. 83, Iris(whole) 91% 98% 95% 95% 95% 95% no. 3-4, pp. 143–160, 1995. Cancer(test) 97.8% 98.9% 96.4% 97.1% 95% 96.4% [7] R. L. Chrisley, “Learning in non-superpositional quantum neurocom- Cancer(whole) 95.4% 96.7% 96.9% 95.9% 95.3% 96.1% puters,” Brain, Mind and Physics. IOS Press, Amsterdam, pp. 126–139, 1997.

3 [8] M. Schuld, I. Sinayskiy, and F. Petruccione, “The quest for a quantum neural network,” Processing, vol. 13, no. 11, pp. 2567–2586, 2014. [9] P. Wittek, learning: what means to data mining. Academic Press, 2014. [10] J. Biamonte, P. Wittek, N. Pancotti, P. Rebentrost, N. Wiebe, and S. Lloyd, “,” Nature, vol. 549, no. 7671, p. 195, 2017. [11] S. Lloyd, M. Mohseni, and P. Rebentrost, “Quantum principal compo- nent analysis,” Nature Physics, vol. 10, no. 9, p. 631, 2014. [12] P. Rebentrost, M. Mohseni, and S. Lloyd, “Quantum support vector machine for classification,” Physical review letters, vol. 113, no. 13, p. 130503, 2014. [13] S. Arunachalam and R. de Wolf, “Guest column: a survey of quantum learning theory,” ACM SIGACT News, vol. 48, no. 2, pp. 41–67, 2017. [14] A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. Obrien, “A variational eigenvalue solver on a photonic quantum processor,” Nature communications, vol. 5, p. 4213, 2014. [15] M. Schuld, A. Bocharov, K. Svore, and N. Wiebe, “Circuit-centric quantum classifiers,” arXiv preprint arXiv:1804.00633, 2018. [16] E. Grant, M. Benedetti, S. Cao, A. Hallam, J. Lockhart, V. Stojevic, A. G. Green, and S. Severini, “Hierarchical quantum classifiers,” arXiv preprint arXiv:1804.03680, 2018. [17] K. Mitarai, M. Negoro, M. Kitagawa, and K. Fujii, “Quantum circuit learning,” arXiv preprint arXiv:1803.00745, 2018. [18] R. Xia and S. Kais, “Quantum machine learning for electronic structure calculations,” arXiv preprint arXiv:1803.10296, 2018. [19] B. Ricks and D. Ventura, “Training a quantum neural network,” in Advances in neural information processing systems, 2004, pp. 1019– 1026. [20] L. K. Grover, “Quantum helps in searching for a needle in a haystack,” Physical review letters, vol. 79, no. 2, p. 325, 1997. [21] M. Schuld, I. Sinayskiy, and F. Petruccione, “Simulating a perceptron on a quantum computer,” Physics Letters A, vol. 379, no. 7, pp. 660–663, 2015. [22] Y. Cao, G. G. Guerreschi, and A. Aspuru-Guzik, “Quantum neuron: an elementary building block for machine learning on quantum computers,” arXiv preprint arXiv:1711.11240, 2017. [23] M. Schuld and N. Killoran, “Quantum machine learning in feature hilbert spaces,” arXiv preprint arXiv:1803.07128, 2018. [24] M. Nielsen, Neural networks and . Determination Press, 2015. [Online]. Available: http://neuralnetworksanddeeplearning.com [25] J. Sopena, E. Romero, and R. Alquezar, “Neural networks with periodic and monotonic activation functions: a comparative study in classification problems,” IET Conference Proceedings, pp. 323–328(5), January 1999. [26] M. Nakagawa, “An artificial neuron model with a periodic activation function,” Journal of the Physical Society of Japan, vol. 64, no. 3, pp. 1023–1031, 1995. [27] M. Morita, “Memory and learning of sequential patterns by nonmono- tone neural networks,” Neural Networks, vol. 9, no. 8, pp. 1477 – 1489, 1996, four Major Hypotheses in Neuroscience. [28] D. Dheeru and E. Karra Taniskidou, “UCI machine learning repository,” 2017. [Online]. Available: http://archive.ics.uci.edu/ml

4 0.4 η=0.05 η=0.1 η=0.25 η=0.5 η=0.75 η=1 0.35

0.3

0.25

0.2

Cost Function 0.15

0.1

0.05

0 0 100 200 300 400 500 600 700 800 900 1000 iteration(epoch) Fig. 4: Evaluations of the cost function with different learning rates for iris flowers dataset.

0.35 η=0.05 η=0.1 η=0.25 η=0.5 η=0.75 η=1 0.3

0.25

0.2

0.15 Cost Function 0.1

0.05

0 0 100 200 300 400 500 600 700 800 900 1000 iteration(epoch) Fig. 5: Evaluations of the cost function with different learning rates for breast cancer dataset.

5