<<

Quantum reservoir computation utilising scale-free networks

Akitada Sakurai,1, 2 Marta P. Estarellas,2, ∗ William J. Munro,3, 2 and Kae Nemoto2, 1, † 1School of Multidisciplinary Science, Department of Informatics, SOKENDAI (the Graduate University for Advanced Studies), 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan 2National Institute of Informatics, 2-1-2 Hitotsubashi, Chiyoda-ku, Tokyo 101-8430, Japan 3NTT Basic Research Laboratories & Research Center for Theoretical Quantum Physics, 3-1 Morinosato-Wakamiya, Atsugi, Kanagawa, 243-0198, Japan Today’s quantum processors composed of fifty or more have allowed us to enter a computa- tional era where the output results are not easily simulatable on the world’s biggest supercomputers. What we have not seen yet, however, is whether or not such quantum complexity can be ever useful for any practical applications. A fundamental question behind this lies in the non-trivial relation between the complexity and its computational power. If we find a clue for how and what quantum complexity could boost the computational power, we might be able to directly utilize the quantum complexity to design quantum computation even with the presence of noise and errors. In this work we introduce a new reservoir computational model for pattern recognition showing a quan- tum advantage utilizing scale-free networks. This new scheme allows us to utilize the complexity inherent in the scale-free networks, meaning we do not require programing nor optimization of the quantum layer even for other computational tasks. The simplicity in our approach illustrates the computational power in quantum complexity as well as provide new applications for such processors.

I. INTRODUCTION some of which have been demonstrated on gate-based quantum computers [13, 14]. It has been suggested that The recent realization of quantum processors with fifty some quantum neural networks may offer an advantage plus qubits is undoubtedly a key milestone for the nascent over classical neural networks [11], however the optimiza- quantum technology field [1–3]. The quantum advantage tion of a parameterized with a classical has been estimated through its complexity generated by feedback loop would be difficult to scale [15]. Quantum these quantum processors, benchmarking it against the reservoir computation (QRC) can also be considered as necessary run for a classical computer to simulate a type of QNNs, which are introduced to mainly anal- it. Such a complexity has been believed to be deeply yse temporal/sequential data processing with both qubits connected to its quantum computational power, yet it [16, 17] and continuous variables [18]. In those works still remains unclear in what way we could extract us- quantum neural networks are considered in the Hilbert able computational power for real applications from the , whereas another approach uses a network of phys- quantum complexity itself. ical qubits [19], which has been used to construct a uni- Quantum neural networks (QNNs) have been consid- versal quantum reservoir computer [20]. ered as one of the potential directions to exploit quantum In this paper, we propose a new quantum reservoir complexity that the current quantum processors (includ- computation (QRC) model based on scale-free networks ing Noisey Intermidate Scale Quantum (NISQ) devices) that emerge during the of a discrete time can generate. There has been a number of approaches [21] and apply it to a pattern recognition task. These and definitions for QNNs [4, 5]. In the early days of networks represent the effective Hamiltonian of a period- QNNs, the core ideas behind perceptron (a model of ar- ically driven system obtained through the application of tificial neural networks) had been extended to QNNs [6], tools in Floquet theory [22]. In our model, the classical where the perceptron activation function was replaced by reservoir, that is a large nonlinear system, is replaced by an operator. Despite the simple notion of the quantum a quantum hidden layer characterized by the non-trivial arXiv:2108.12131v2 [quant-ph] 31 Aug 2021 perceptron given by Altaisky, its implementation remains complexity of scale-free networks. Non-trivial complex- highly nontrivial. The main focus of the early works ity as well as randomness has played an important role on quantum perceptron is to integrate the nonlinearity in both quantum and classical neural networks [17, 23]. in the classical model into Our QRC model presents several advantages. First [7–10]. More recently QNNs have been extensively inves- and foremost the network grows exponentially with the tigated as a subclass of variational quantum algorithms number of qubits N; 50 qubits could generate a network (VQAs) [11, 12]. VQAs can be modeled with a feature as large as the neural network in a human brain. Second, map and a variational model with classical feedfoward, despite this rapid growth in complexity, the parameter setting for the quantum hidden layer remains simple. In fact, there is no programing nor optimization necessary ∗ Current Affiliation: Qilimanjaro Quantum Tech, Carrer dels for the quantum hidden layer once it has been set, even Comtes de Bell-Lloc, 161, 08014 Barcelona, Spain for different computational tasks. Third, by eliminating † Corresponding author: [email protected] the costly parameter optimization over the variational 2

(a) (b) (d) Dropout

0 1 0 1 2 3 4 5 6 7 M-layer

9 Output layer Scale-free network (c) MNIST Images Input layer Quantum hidden layer M-layer Output layer

0 1 PCA Map

9

Quantum dynamics Measurement ONN + data processsing

FIG. 1. A schematic illustration of the structure of the quantum reservoir computer. (a) shows an example of scale-free networks represented in the Hilbert space. Each dot indicates a computational basis state, while each edge is weighted with the hopping strength between the two states. When the hopping strength is smaller than a threshold given by the percolation rule, there is no edge between the states. (b) indicates a quantum system used to implement the quantum hidden layer. Here the model of a discrete time crystal is used. (c) illustrates the total scheme of our QRC model. There are four layers: the input layer, the quantum hidden layer, the M-layer and the output layer. The input layer encodes the input data on to the quantum hidden layer, while the M-layer is to measure the quantum system in the computation basis, converting the quantum data (state) to the classical data. The one-layer neural network (ONN) is responsible for the training. model in VQAs, we can expect a high feasibility of our that the encoding can be done easily and efficiently. For QRC model and a significant speed up in the training this we chose the technique of Principle Component Anal- process. This advantage in the learning speed is similar ysis (PCA), as indicated in Fig. 1(c). After some time to what we expect with the extreme learning machine in in the dynamics of the quantum hidden layer the output the classical machine learning [24]. (k) quantum state |ψf i is measured in the computational (k) basis at the M-layer, converting the quantum data |ψf i to the classical data ~x (k). The classical data is then reno- II. THE MODEL malized for the one-layer neural network (ONN), that is a classical neural network only with the input and output Our quantum reservoir computation (QRC) model layers. Training is performed only at the output layer. consists of four layers as illustrated in Fig.1. The compu- Since the quantum hidden layer is the solo quantum (k) tation starts with encoding the classical data Iij for the component in our scheme, how it is designed is key to k-th image onto the initial state of the quantum hidden achieve a quantum advantage. The quantum hidden layer layer. The quantum hidden layer is the reservoir of this (Fig.1 (b)) and its network (Fig.1 (a)) are connected computational model. It may not be necessary, however through the visualization of the Hamiltonian [22]. The it is desirable to preprocess the classical input data so Hamiltonian of an N- system in the computational 3 basis {|ii} where i is a binary digit from 0 to 2N − 1 may we set for the quantum hidden layer is shorter than half P P be represented as H = i Ei|iihi| + i,j(i6=j) Wij|iihj| of the typical time of an ion trap experiment where Ei and Wij correspond to the energy of the com- with 10 qubits [25]. The discrete time crystal can also putational basis |ii and the transition energy between |ii be implemented on a gate-based quantum processor by a and |ji respectively. The network has an edge between gate decomposition of the Floquet operator or the entire the i-th node (state) and j-th node (state) with weight unitary map for the quantum dynamics [28]. Hence the Wij when the percolation condition |Ej − Ei| < |Wij| is model can be adapted to various quantum computational satisfied [22]. The percolation rule is to eliminate the systems. off-resonant transitions from the network. In the case of the periodically driven system, the network is generated from an effective Hamiltonian using the Floquet theory. III. PATTERN RECOGNITION WITH QRC There are now two approaches to setting the quantum hidden layer in the targeted regime. Using a network Here we illustrate how the QRC model performs pat- with the appropriate complexity (Fig.1 (a)) as a recipe, tern recognition. We use the MNIST handwritten digit we can generate a Hamiltonian with the scale-free nature data set publicly available [29] to benchmark the per- by assigning each computational basis state to each node formance. The data set contains 70000 (28 × 28 pixel) of the network. The Hamiltonian H can be implemented images of handwritten digits between 0 and 9. The classi- −iHτ/ as the time e ~ of the system. It is how- cal data for each image is first processed by the Principle ever likely that such a Hamiltonian involves many-body Component Analysis (PCA), as shown in Fig.2, which al- −iHτ/ interactions or the gate decomposition of e ~ can be lows us to select elements from the largest contribution. computationally costly. Another approach is to find a With the PCA, the k-th image data is represented as a quantum system with such a Hamiltonian, and the melt- ~ (k) P784 vector I = j=1 c˜j~vj, where the j-th element is the ing of a discrete time crystal is a known example of such j-th contribution among the 784 elements. Although it quantum systems.[21]. is possible to encode all 784 values to the quantum input The Hamiltonian of our discrete time crystal H(t) is state, it may require a complicated quantum circuit, in- given by [25, 26] stead we select the 2N largest contribution elements to ( encode to the quantum input state. These 2N values are Hˆ ≡ g (1 − ) P σx 0 ≤ t < T Hˆ (t) = 1 ~ l l 1 (1) encoded to each qubit by single-qubit rotations only; for  ˆ P z z z H2 ≡ ~ lm Jlmσl σm T1 ≤ t < T . the l-th qubit, the mapping is c˜l → θl and c˜N+l → φl and θl iφ θl the quantum input state is |ψli = cos |0i+e l sin |1i. x y z 2 2 Here {σl , σl , σl } are the Pauli operators on the l-th z α qubit, while Jlm ≡ J0/|l − m| represents the long-range interaction between the l-th and m-th qubits that takes MNIST Image Input layer the form of an approximate power-law decay with a con- stant exponent α. Next the parameter g satisfies the 1 2 N ˆ  ˆ  condition 2gT1 = π such that U1 = exp −iH1T1/~ be- comes a global π pulse around the x-axis. We assume T1 = T/2 for convenience. The parameter  is our ro- PCA Map tation error [21]. Because this is a periodic system, the PCA z Floquet operator is given by π     π y ˆ ˆ i ˆ i ˆ F = U(T ) = exp − H2T2 exp − H1T1 . (2) x ~ ~ 0 0 The network representation can be obtained from the ˆ eff effective Hamiltonian of the Floquet operator H = ˆ i~/T log[F] and the percolation rule. When  = 0, a Period-2 Discrete Time Crystal (2T-DTC) appears, and FIG. 2. Schematic diagram detailing how the 2N parame- the corresponding network is a locally-connected network ters are extracted from a sample image and encoded onto N (a set of dimers). As  gets larger, the network starts to qubits. The left top box shows that the k-th image from the grow following a preferential attachment mechanism and MNIST Image set requires 784 parameters to represent the typically forms a scale-free network, and when  reaches data in full. In the PCA Map (the bottom box), the PCA a critical region, the network goes through the transition is used to select the 2N most influential parameters. and a from scale-free to random [21]. In our QRC, we intend pair of parameters (cl, cl+N ) is encoded on one qubit by sin- to set  to 0.03 for a near optimal computation, which gle qubit rotations. Panel (c) shows the input state of the corresponds to the transition regime. quantum hidden layer. The discrete time crystal has been experimentally demonstrated [25, 27, 28], and the computational time Our previous work [21] has identified the value of  that 4 marks the transition regime from scale-free to random (a) (c) 1.00 1.00 Testing 0.01 0.03 0.05 ONN 0.07 0.09 Average networks, and we choose  from that parameter regime. Best For our numerical simulations we set our parameters to 0.96 0.98 Training be J0T = 0.06, α = 1.51. The coupling strength corre- sponds to the weak coupling regime. 0.92 Accuracy rate (k) 0.00 Now in the M-layer, we measure the output state |ψf i ONN 0.96 Testing 0.88 in the computational basis and extract the classical data. 0 100 200 300 Epoch n ONN As each measurement does not provide the entire infor- Average 0.3 Best (b) 0.01 0.94 mation of the quantum state, we build up the distribution 0.03 Accuracy rate 0.05 at the M-layer by repeating the process. We note that 0.2 0.07 there is no feedforward in these repeating processes. The 0.09

P(k) 0.92 distribution is then renomalized to have an average 0 and 0.1 variance 1 for the convenience of the classical data anal- ysis. The renomalized distribution is the output of the 0.0 0.90 (k) 1 10 100 0 0.02 0.04 0.06 0.08 0.1 M-layer ~x . k Epsilon The output ~x (k) is then weighted with the weight ma- trix W of the ONN. We follow a method widely used in FIG. 3. Performance of the QRC. Plot (a) shows the accuracy pattern recognition (see appendices for details) to obtain rate versus the number of epoch for each value of . The num- the output of the computation ~y (k). ber of periods for the time evolution of the quantum hidden layer n is 50, and the coupling strength J0T = 0.06. The de- gree distribution of the network for the efficient Hamiltonian for each  is plotted in (b), where k is the degree (the number IV. PERFORMANCE AND PROPERTIES of links) at each node and the P(k) is the number of node for the degree of k. The plot (c) shows the average and best We begin by first training our ONN with 60,000 sam- accuracy rates with its standard deviation for both training ples using gradient decent, back propagation and the and testing respectively. The average and the standard devi- mini-batch method, where we evaluate the accuracy rate ation are taken for 200 to 300 epochs. The number of qubits for the test samples. To evaluate the optimal value of is N = 11. , we calculate the -dependency of the performance for both training and testing. Fig. 3 shows the accuracy rate for testing against various values of . The system size of dependency on the accuracy rate for training and testing the quantum hidden layer is N = 11. respectively. In Fig.3, we compare the performance of our QRC In the above evaluation, the number of periods n is set model for various values of . The black line in Fig.3 to 50 meaning the time duration nT . If n is too small, (a) shows the accuracy rate for the QRC model without the quantum hidden layer would not be able to exploit its the quantum hidden layer (that is only the ONN) with large Hilbert space. Figs. 4 (a) show the n-dependency of the full classical input data (784 PCA elements). The the accuracy rate for both training (a-1) and testing (a- gray line is for  = 0, and in this case we may not nec- 2). The accuracy rates almost saturate around n = 50, essarily expect a quantum advantage from the quantum which is feasible based on current ion-trap technology dynamics, however it is interesting to observe that the [25]. reduction of the classical information at the input layer Next, we analyze the size effect of the quantum hid- does not significantly affect the accuracy rate, which in- den layer on the accuracy rates. When increasing N, dicates that the mapping between the classical data and we slightly increase the of the Hilbert space the quantum state as well as the application of PCA helps of the quantum hidden layer as well as the input of the to push up the accuracy rate. In Fig.3 (a), the accuracy PCA elements. Figs. 4 (b) show that the accuracy rate rate for testing quickly goes up high from the  value for training and testing. In both cases the accuracy rate from 0 to 0.03. The optimality peaks around  = 0.03 goes up with a larger system size N. In fact, only two and gradually reduces when  gets larger. The degree extra qubits from N = 7 to 9 gives a nearly 4% improve- distributions of the network for the effective Hamilto- ment in the accuracy for testing. nian with  = 0.01 to 0.03 exhibit the scale-free nature The scale free nature of the Hamiltonian previously as shown in Fig.3 (b). The non-trivial complexity in the investigated is statistical with an ensemble of disorder quantum hidden layer contributes to the computational distribution on qubits, and hence each instance could be power in this model for both the speed of the leaning deviated from the typical power-law degree distribution and the accuracy rate. This comparison with the net- (a feature of scale-free networks). In our system, when N work property and the accuracy rate suggests that we is odd, the degree distribution shows the power-law na- need to design complex behavior of the quantum hidden ture, though for even Ns it tends to be less typical. This layer to extract the computational power for the task at difference arises from the fact that the system with the hand and the network analysis could be a useful tool to flat disorder distribution holds additional system symme- design the quantum hidden layer. Fig.3 (c) shows the - tries when N is even. Fortunately, our numerical results 5

(a-1) (a-2) ferent degrees of dropout ranging from 0 − 15%. When 1.00 the dropout is introduced, the overfitting is significantly 0.98 suppressed.

0.96 Training Testing (a)0.03 0.94 Accuracy rate 0.92 ONN ONN Average Average Best Best 0.02 0.90 0 20 40 60 80 100 0 20 40 60 80 100 Period n Period n (b-1) (b-2) 1.00 0.01 0.98 D = 0.00 D = 0.05 0.96 D = 0.10 Training Testing D = 0.15 0.94 0.00 Accuracy rate 0 100 Epoch 200 300 0.92 1.00 ONN ONN (b) Average Average Best Average 0.99 0.90 6 7 8 9 10 11 12 6 7 8 9 10 11 12 (c) 0.98 Average Size N Size N 1.00 Best D = 0.05 0.97 Accuracy rate

FIG. 4. Performance of the QRC with time (nT ) and size 0.96 0 0.05 0.1 0.15 (N). The top two figures plot the average and best accuracy 0.98 Dropout rates with its standard deviation for training (a-1) and testing (a-2) for the different periods with N = 11. The bottom two figures show the size dependency of the accuracy rate for 0.96 training (b-1) and testing (b-2). The parameters are set as J0T = 0.06 and  = 0.03. The average and the standard

deviation are taken for 200 to 300 epochs. Accuracy rate 0.94 show that the performance of our QRC model is not sen- 0.92 sitive to the actual degree distribution as long as  stays in the regime, which eliminate the necessity to check the 0.90 degree distribution of each network. 0.00 0.02 0.04 0.06 0.08 0.10 Epsilon

V. OVERFITTING AND DROPOUT FIG. 5. The effect of drop out on the accuracy of the QRC. Figure (a) shows the accuracy rate difference between training and testing for each dropout. The difference for all nonzero One of the most common problems with machine learn- dropout rates saturates as the learning proceeds, which indi- ing is overfitting, and our QRC model is no exception. cates that overfitting can be suppressed by dropout. Figure The gap between the accuracy rates for the training and (b) shows the accuracy rates for training (top lines) and test- testing samples in Figs. 4 (a) gets larger as the learning ing (bottom lines) to choose the best rate for the dropout, progresses, and the accuracy rate for training in Fig.4 (b- which is D = 0.05 (5%) in this case. Overfitting can be fur- 1) almost reaches unity at N = 11, which suggests over- ther suppressed by increasing D, however the overall perfor- fitting. To capture this more clearly, we plot the differ- mance starts to drop for a large D. Figure (c) shows the - ence between the accuracy rates for training and testing dependency on the accuracy rate for testing with 5% dropout. in Fig.5 (a). Here we observe that the blue line does not All other parameters are the same as in Fig. 3. saturate. To address this, we introduce Dropout to the ONN in our model as shown in Fig. 1(d). The dropout Now we re-evaluate the accuracy rates for the QRC is one of the techniques developed in classical machine model, which are plotted in Fig. 5 (b) and (c). Fig.5 learning to circumvent overfitting by randomly erasing (b) shows that we can achieve the best performance for information in neural networks. It is not commonly used testing with 5% dropout in this case, while the Fig. 5 due to the computational cost associated with the tech- (c) shows the accuracy rate for testing samples with 5% nique [30], however in our case, the cost of dropout is Dropout for different values of . The performance of minimal with the one-layer neural network. We plot the the QRC model slightly increases with the dropout as effect of dropout in our QRC model in Fig.5 (a), by show- shown in (b), and hence dropout successfully suppresses ing the difference between the accuracy rates for four dif- the accuracy rate for training and increases it for testing 6 for a larger N where the overfitting is more severe. conversion c˜l → θl or φl is given by,

 h (train)i π c˜l − min c˜l θl or φl = , (A1) VI. CONCLUSION h (train)i h (train)i max c˜l − min c˜l

We have shown the potential for quantum advantage in where max [˜c(train)] and min [˜c(train)] mean the maximum the QRC model for pattern recognition. Our QRC model l l and minimum values of c˜l across all the training samples is the first example to directly facilitate quantum com- respectively. Here, we note that for the case where θ plexity for computation without programing, nor quan- l or φl goes beyond the range [0, π] with testing samples, tum/classical feedforward. The quantum hidden layer we truncate the value. Due to the optimization of the can be considered as an accelerator or booster to be in- basis vectors, the loss of information by the reduction serted into classical neural networks, and for the pattern from the 784 elements to the 2N highest contribution recognition task in this paper, it has been shown that a elements is minimized. quantum hidden layer as small as 8 qubits can success- fully boost the computational power. As the dynamics of the quantum hidden layer is fixed, no re-programing is required for the quantum hidden layer to adopt different Appendix B: ONN with the M-layer and the output tasks, and all we need to do is to upload the new pa- layer rameter setting for the ONN and the input-state prepa- ration protocol. Similarly, the quantum hidden layer can The ONN is a one-layer neural network of the M-layer be inserted into , in particular it with m active neurons and the output layer with n active might be a good candidate for feature map in VQA, as re- neurons and is to be optimized. cently observed that the complexity of feature map has a The number of neurons at the M-layer can in principle crucial role for VQAs to show a quantum advantage [11]. be as large as 2N , however when we apply the dropout The technology requirement to implement our QRC we randomly eliminate the active neurons in this layer as model is modest, and feasible based on the current tech- illustrated in Fig.1 (d). At the output layer, there are ten nology. There are several different implementations pos- neurons, each corresponding to a digit to be recognized. sible for the effective Hamiltonian we used in this paper: At the output layer the effects of the weight matrix is the periodically driven system is one example, and gate- summarized and a shift B~ is applied as model quantum computation is another. As the key to n achieve the acceleration lies in complexity of quantum dy- X namics, we may also employ different quantum models. ul = xi · wi,l + Bl. (B1) Quantum cellar automata could be a simple extension of i=1 our case, and , though not universal, can where xi is the i-th element of the output of the M-layer be a feasible candidate. The simplicity of our scheme ~x and and bl is the l-th element of the shift B~ . As our has paved a way to a new platform of practical quantum computational task is pattern recognition, we insert the computation. activation function f to convert the data ~u to ~y as ~y = f(~u). For the activation function we employ the soft-max VII. ACKNOWLEDGEMENTS function [31, 32], which is widely used in classification problems [31, 32], We thank Victor M. Bastidas and Aoi Hayashi for valu- exp(xl) able discussions. This work was supported by MEXT yl = f(xl) = P . (B2) exp(xk) Quantum Leap Flagship Program (MEXT Q-LEAP) k Grant Number JPMXS0118069605. We define the lost function Lk for the k-th sample by the cross as Appendix A: Principle Component Analysis n X (k) (k) Lk = − tl log(yl ), (B3) In our model, Principle Component Analysis (PCA) l is used to effectively encode the classical information to to evaluate the learning progress. Here ~y (k) is the output the quantum hidden layer as shown in Fig.2. The k- vector, while ~t (k) is a one-hot vector, that is a basis unit th MNIST image is converted to the vector I~ (k). The vector of the ten dimensional vector space. The one-hot basis vectors {~vj}j=1,...,784 for this vector representation vectors represents the correct result and hence are the need to be optimized with the sample data. Once the reference for the evaluation of the output of the ONN. basis vector set has been optimized with the sample data, Then we apply Gradient Descent, Back Propagation, and the same set is used to represent the test images. The the Mini-batch Method to optimize the ONN. 7

Appendix C: Gradient Descent and Back where M is the batch size and the Ll is the cost function propagation for the l-th sample. The hidden layer can be written as the matrix,

The weight matrix (wij) and the shift vector (Bi) are optimized using the Gradient descent method, through  T  ~x1 ∂L w(n+1) = w(n) − η l X =  .  . (D2) ij ij ∂w  .  ij T (C1) ~xM (n+1) (n) ∂Ll bi = bi − η , ∂bi where η is a learning rate. To calculate the derivatives, In this method, the derivative for the weight matrix is we use the back propagation method. Applying the chain given in the matrix representation by rule with Eqs. (B1)-(B3), we have

∂L l = x (y(l) − t(l))  ~y (1) − ~t (1)  ∂w l,i j j ij (C2) ∂L T . = X ·  .  . (D3) ∂Ll (l) (l) ∂wij   = (yj − tj ) ~y (M) − ~t (M) ∂bi

Appendix D: Mini-batch Method The derivative for the bias vector is then a sum of each derivatives as We use the mini-batch method to reduce the compu- tational cost and to avoid local minima. In this method, M the loss function is now the average of the loss function ∂L X ∂Ll = . (D4) for each sample and is given by ∂bi ∂bi l M 1 X L = L , (D1) M l l=1

[1] Arute, F. et al. Quantum supremacy using a pro- [12] Noori, M. et al. Analog-Quantum Feature Mapping for grammable superconducting processor. Nature 574, 505- Machine-Learning Applications. Phys. Rev. Appl. 14, 510 (2019). 034034 (2020). [2] Zhong, H.-S. et al. Quantum computational advantage [13] Havlí˘cek,V. et al. Supervised learning with quantum- using photons. Science 370, 1460-1463 (2020). enhanced feature . Nature, 567, 209-212 (2019). [3] Gong, M. et al. Quantum walks on a programmable two- [14] Xia, Y., Li, W., Zhuang, Q. & Zhang, Z. Quantum- dimensional 62-qubit superconducting processor. Science Enhanced Data Classification with a Variational Entan- 372, 948-952 (2021). gled Sensor Network. Phys. Rev. X 11, 021047 (2021). [4] Schuld, M., Sinayskiy, I. & Petruccione, F. The quest for [15] McClean, J. R., Boixo, S., Smelyanskiy, V. N., Babbush, a Quantum Neural Network. Quantum Inf. Process 13, R. & Neven, H. Barren plateaus in quantum neural net- 2567-2586 (2014). work training landscapes. Nat. Commun. 9, 4812 (2018). [5] Ezhov, A. A. & Ventura, D. Quantum Neural Networks, [16] Fujii, K. & Nakajima, K. Harnessing Disordered- in directions for intelligent systems and informa- Ensemble Quantum Dynamics for Machine Learning. tion sciences. (Physica-Verlag HD, 2000). Phys. Rev. Appl. 8, 024030 (2017). [6] Altaisky, M. V. Quantum neural network. Preprint at [17] Martínez-Peña, R., Giorgi, G. L., Nokkala, J., Sori- https://arxiv.org/abs/quant-ph/0107012 (2001). ano, M. C. & Zambrini, R. x Dynamical tran- [7] Gupta, S. & Zia, R. K. P. Quantum Neural Networks. J. sitions in quantum reservoir computing. Preprint at Comput. System Sci. 63, 355 (2001). https://arxiv.org/abs/2103.05348 (2021) [8] Panella, M. & Martinelli, G. Neural networks with quan- [18] Govia, L. C. G., Ribeill, G. J., Rowlands, G. E., Krovi, tum architecture and quantum learning. Int. J. Circuit H. K. & Ohki, T. A. Quantum reservoir computing with Theory Appl. 39, 61-77 (2011). a single nonlinear oscillator. Phys. Rev. Res. 3, 013077 [9] Sagheer, A. & Zidan, M. Autonomous Quan- (2021). tum Perceptron Neural Network. Preprint at [19] Martínez-Peña, R., Nokkala, J., Giorgi, G. L., http://arxiv.org/abs/1312.4149 (2013). Zambrini, R. & Soriano, M. C. Information Pro- [10] Zhou, R. & Ding, Q. Quantum M-P Neural Network. cessing Capacity of -Based Quantum Reser- Int. J. Theor. Phys. 46, 3209-3215 (2007). voir Computing Systems. Cogn. Comput. (2020) [11] Abbas, A. et al. The power of quantum neural networks. https://link.springer.com/article/10.1007/s12559-020- Nat. Comput. Sci. 1, 403-409 (2021). 09772-y. 8

[20] Ghosh, S., Krisnanda, T., Paterek, T. & Liew T. C. H. wanath, A. Discrete Time : Rigidity, Criticality, Universal quantum reservoir computing. Preprint at and Realizations. Phys. Rev. Lett. 118, 030401 (2017). http://dx.doi.org/10.1038/s42005-021-00606-3 (2020). [27] Choi, S. et al. Observation of discrete time-crystalline [21] Estarellas, M. P. et al. Simulating complex quantum order in a disordered dipolar many-body system. Nature networks with time crystals. Sci. Adv. 6, 42, eaay8892 543, 221-225 (2017). (2020). [28] Frey, P. & Rachel, S. Simulating a discrete time crys- [22] Bastidas, V. M., Renoust, B., Nemoto, K., & Munro, tal over 57 qubits on a quantum computer. Preprint at W. J. Ergodic-localized junctions in periodically driven https://arxiv.org/abs/2105.06632 (2021). systems. Phys. Rev. B 98, 224307 (2018). [29] LeCun, L., Cortes, C. & Burges, C. J. [23] Kinouchi, Q. & Copelli, M. Optimal dynamical range of The MNIST database of handwritten digits. excitable networks at criticality. Nature Phys. 2, 348-352 http://yann.lecun.com/exdb/mnist/index.html. (2006). [30] Srivastava, N., Hinton, G. E., Krizhevsky, A., Sutskever, [24] Huang, G. B., Zhu, Q. Y. & Siew, C. K. Extreme learning I. & Salakhutdinov, R. Dropout: A Simple Way to Pre- machine: Theory and applications. Neurocomputing 70, vent Neural Networks from Overfitting. J. Mach. Learn. 489-501 (2006). Res. 15, 1929-1958 (2014). [25] Zhang, J. et al. Observation of a discrete time crystal. [31] Haykin, S. Neural Networks And Learning Machines 3rd Nature 543, 217-220 (2017). edn. (Pearson, London, UK, 2010) [26] Yao, N. Y., Potter, A. C., Potirniche, I.-D. & Vish- [32] Goodfellow, I., Bengio, Y., Courville, A. & Bengio, Y. Deep learning (MIT Press, Cambridge, MA, USA, 2016).