Quantum Machine Learning with Adaptive Linear Optics

Quantum machine learning with adaptive linear optics

Ulysse Chabaud1,2,3, Damian Markham3,4, and Adel Sohbi5

1Department of Computing and Mathematical Sciences, California Institute of Technology 2Université de Paris, IRIF, CNRS, France 3Laboratoire d’Informatique de Paris 6, CNRS, Sorbonne Université, 4 place Jussieu, 75005 Paris, France 4JFLI, CNRS, National Institute of Informatics, University of Tokyo, Tokyo, Japan 5School of Computational Sciences, Korea Institute for Advanced Study, Seoul 02455, Korea

We study supervised learning algorithms duction of various subuniversal models—models in which a quantum device is used to per- that are believed to have an intermediate com- form a computational subroutine—either putational power between classical and univer- for prediction via probability estimation, sal quantum computing—such as Boson Sam- or to compute a kernel via estimation of pling [4] or IQP circuits [5], recently culminating quantum states overlap. We design imple- with the experimental demonstration of random mentations of these quantum subroutines circuit sampling [6] and Gaussian Boson Sam- using Boson Sampling architectures in lin- pling [7]. ear optics, supplemented by adaptive mea- Finding practical applications for these subuni- surements. We then challenge these quan- versal models, other than the demonstration of tum algorithms by deriving classical sim- quantum speedup, is a timely issue, as it may en- ulation algorithms for the tasks of output able interesting quantum advantages in the era of probability estimation and overlap estima- Noisy Intermediate-Scale Quantum devices [3]. tion. We obtain diﬀerent classical simula- Recently, there has been an increased interest bility regimes for these two computational on the possibility of enhancing classical machine tasks in terms of the number of adaptive learning algorithms using quantum computers [8], measurements and input photons. In both which includes the development of quantum neu- cases, our results set explicit limits to the ral networks [9–18] and the development of quan- range of parameters for which a quantum tum kernel methods [19–24]. advantage can be envisaged with adaptive In particular, recent proposals have been driven linear optics compared to classical machine by subuniversal models such as Gaussian Boson learning algorithms: we show that the Sampling [25, 26] or IQP circuits [5, 19]. In the number of input photons and the number latter, the authors considered supervised learning of adaptive measurements cannot be si- algorithms in which some computational subrou- multaneously small compared to the num- tines are executed in a quantum way, namely the ber of modes. Interestingly, our analysis estimation of the output probabilities of quantum leaves open the possibility of a near-term circuits, or the estimation of the overlap of the quantum advantage with a single adaptive output states of quantum circuits. They showed measurement. that IQP circuits alone could not provide a quan- arXiv:2102.04579v2 [quant-ph] 2 Jul 2021 tum advantage for these subroutines and there- 1 Introduction fore considered minimal extensions of these circuits, in terms of circuit depth. Quantum computers promise dramatic advan- Hereafter, we study the use of Boson Sampling tages over their classical counterparts [1,2], but a linear optical interferometers [4], with input pho- fault-tolerant universal quantum computer is still tons, for similar quantum machine learning tasks. far from being available [3]. The quest for near- Instead of extending the depth which was shown term quantum speedup has thus led to the intro- to improve machine learning model performances [19] while making the training phase challenging Ulysse Chabaud: [email protected] [27–29], we allow for adaptive measurements— Adel Sohbi: [email protected] intermediate measurements that drive the rest of

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 1 the computation—which provide a natural anal- optics which we consider. We give a prescription ogy with the circuit depth in the linear optics pic- for performing probability estimation and overlap ture [30]: by encoding qubits into single-photons estimation with instances of this model, and de- and using a sufficient number of adaptive mea- tail how to use these as subroutines for machine surements, one can perform universal quantum learning problems. In section4, we derive two computing [31]. classical simulation algorithms, one for each of We give a detailed prescription for performing these two tasks, and we analyse the running time quantum machine learning with classical data, us- of these algorithms. We conclude in section5. ing adaptive linear optics for computational subroutines such as probability estimation and over- 2 Background lap estimation. We also examine the classical simulability of 2.1 Kernel methods for quantum machine these quantum subroutines. More precisely, we learning give classical simulation algorithms whose run- times are explicitly dependent on: (i) the number 2.1.1 Encoding classical data with quantum states of modes m, (ii) the number of adaptive measure- We consider a typical machine learning problem, ments k, (iii) the number of input photons n and such as a classification problem, where a classical (iv) the number of photons r detected during the dataset D = {~x1, . . . , ~x|D|} from an input set X adaptive measurements. This effectively sets a is given. One way to use quantum computers to limit on the range of parameters for which adap- solve such problems is to encode classical data tive linear optics may provide an advantage for onto quantum states such that there exists a so- machine learning over classical computers using called feature map ~xl 7→ |φ(~xl)i which can be our methods, thus identifying the regimes where processed by a quantum computer. a quantum advantage can be envisaged. Boson Sampling instances [4] correspond to the case Definition 1 (Feature map [20]). Let F be a with no adaptive measurement, while the Knill– Hilbert space, called feature Hilbert space, X a Laflamme–Milburn scheme for universal quantum non empty set, called input set, and ~x ∈ X .A computing [31] corresponds to the case where the feature map is a map φ : X → F from inputs to number of adaptive measurements scales linearly vectors in the Hilbert space. with the size of the computation. For probability estimation, we show that the Many machine learning algorithms perform well classical simulation is efficient whenever the num- in linear cases such as the support vector ma- ber of adaptive measurements or the number of chines which will be used hereafter. However, input photons is constant. Moreover, the num- many real world problems require non-linearity ber of input photons and the number of adaptive to make successful predictions. By using kernel measurements cannot be simultaneously small methods one can introduce non-linearity and use compared to the number of modes (see Table2). estimation methods that are linear in terms of the For overlap estimation, we show a similar be- kernel evaluations. haviour, although in this case our results do not Definition 2 (Kernel [20]). Let X be an input rule out the possibility of a quantum advantage set. A function κ : X × X → C is called a kernel with a single adaptive measurement (see Tables3 if for any finite subset D = {~x1, . . . , x|D|} with and4). Our main technical contribution is an ex- M ≥ 2 the Gram matrix K with entries Kl,l0 = pression for the inner product of the output states κ(~xl, ~xl0 ) is positive semidefinite. of two adaptive unitary interferometers which is essentially independent of the number of adaptive The kernel corresponds to a dot product in a fea- measurements. ture space (here in a high-dimensional Hilbert The rest of the paper is organised as follows. In space). In [32], it is shown that the notions of section2, we provide a background on quantum feature map in Hilbert space and kernel can be machine learning with classical data and classi- connected. A straightforward way is to define a cal simulation of quantum computations. In sec- (quantum) kernel K from a feature map φ as fol- tion3, we introduce the model of adaptive linear lows:

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 2 that estimating the overlap in Eq. (3) is hard (for 2 κ(~xl, ~xl0 ) = |hφ(~xl)|φ(~xl0 )iF | , (1) classical computers). We give more formal defi- nitions for these classical simulation tasks in the where xm ∈ X , xm0 ∈ X and h·|·iF is the inner following section. product over the Hilbert space F. 2.2 Classical simulation of quantum computa- 2.1.2 Using Feature Hilbert Spaces for Machine tions Learning Depending on the approach used for simulat- There are two main ways to use feature Hilbert ing classically the functioning of quantum de- spaces: vices, several notions of simulability are commonly used. One could ask the classical simula- • The quantum variational classification [19] tion algorithm to mimic the output of the quan- or explicit approach [20]: the entire model tum computation [34, 35]: informally, a quantum computation is preformed on a quantum computation is weakly simulable if there exists a device trained by a hybrid variational classical algorithm which outputs samples from quantum-classical or classical algorithm [14, its output probability distribution in time poly- 15, 33]. In this case, the probability distri- nomial in the size of the quantum computation. bution over the possible outcomes is used for Various relaxations of this definition are possi- the classification. Hence, while the feature ble, allowing the classical sampling to be approx- map is used to encode the data, the kernel is imate rather than exact, or to abort with a small not known directly from the quantum device. probability. The existence of such an efficient In [19], the probability to obtain a certain bi- classical simulation has been ruled out for vari- nary output bi for the classical data input x ous subuniversal models of quantum computing, is given by: such as IQP circuits [5] or Boson Sampling [4], under complexity-theoretic conjectures—both in ⊗n 2 Pr (bi) = |hbi|W (θ)Uφ(x)|0 i| , (2) the exact case and the approximate case, up to additional conjectures. where Uφ(x) encodes the feature map and While weak simulation of quantum computa- W (θ) corresponds to the quantum support tions is arguably the most commonly studied, vector machine. other notions of classical simulation may be use- • The quantum kernel estimation [19] or im- ful: if the output samples of a quantum computa- plicit approach [20]: a quantum-assisted tion are used to compute a quantity which may be method where the quantum device is only computed efficiently classically by other means, it used to evaluate the kernel and the rest of is no longer necessary to simulate the whole quan- the machine learning algorithm is carried on tum device. We consider two concrete examples by a classical algorithm. In [19], the kernel which are prominent for variational quantum al- between two classical data vectors xl and xl0 gorithms in quantum machine learning: probabil- is given by: ity estimation and overlap estimation [19, 20].

⊗n † ⊗n 2 Definition 3 (Probability estimation). Let P be Kl,l0 = |h0 |U Uφ(x )|0 i| . (3) φ(xl) l0 a probability distribution over M outcomes and let , δ > 0. Given any outcome x in the sam- In order to challenge and justify the use of a ple space of P , probability estimation refers to quantum-assisted method, it is important to con- the computational task of outputting an estimate sider how hard it would be for a classical machine P˜[x] such that to compute the same quantities. Indeed, if the ˜ output probability or kernel could be estimated P [x] − P [x] ≤ , (4) directly classically, there would be no need for a with probability greater than 1 − δ, in time quantum computer for this task. Hence, when it O(poly ( M ) log 1 ). comes to classical hardness, the first method re- δ quires that estimating the output probability in Efficient probability estimation thus amounts to Eq. (2) is hard, while the second method requires outputting an estimate of the probability of a

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 3 fixed outcome with polynomially small additive lap estimation thus is related to quantum state error, with exponentially small probability of fail- discrimination. Several techniques exist to per- ure, in polynomial time in the size of the compu- form quantumly the overlap estimation of two tation. One may use the samples from a quantum states |φi and |ψi [39]. One of them is to per- computation in order to perform efficiently prob- form the swap test [40] with various copies of both ability estimation for any given outcome: given a states. quantum device of size M which outputs samples Overlap estimation can be also done efficiently from some probability distribution and a fixed classically for IQP circuits. Nonetheless, this outcome x in the sample space, one may run family of circuits has been identified as a promis- the device poly (M) times, recording the value 1 ing venue for implementing quantum machine whenever the outcome x is obtained and the value learning algorithms [19], when enlarged to con- 0 otherwise. Then, the frequency of the outcome tain similar circuits with bigger depth. Motivated x over the poly (M) uses of the quantum device by this approach, we consider hereafter the case of is a polynomially precise additive estimate of the another subuniversal model: Boson Sampling [4] probability of the outcome x with exponentially with input photons, supplemented with adaptive small probability of failure, by virtue of Hoeffding measurements. inequality [36]. Weak simulation is at least as hard as proba- 3 Quantum machine learning with bility estimation, since by the previous reasoning one may obtain polynomially precise addi- adaptive linear optics tive estimates of probabilities from samples of the probability distribution. Moreover, there are In what follows, we study Boson Sampling archi- some quantum computations for which weak sim- tectures [4], supplemented with a given number ulation is hard for classical computers (assum- of adaptive measurements—that is, some of the ing widely believed conjectures from complex- modes are measured throughout the computation ity theory), but probability estimation can be and the rest of the computation can depend on done efficiently classically. This is the case for their outcome—which we refer to as adaptive lin- IQP circuits [5, 19], Boson Sampling interferom- ear optics. In this section, we derive quantum al- eters [4] and even the circuit implementing the gorithms for performing probability and overlap period-finding subroutine of Shor’s factoring al- estimation with adaptive linear optics, together gorithm [2]. We detail the latter case in Ap- with a prescription for utilising these algorithms pendixA, for the sake of clarifying the relations as subroutines in supervised learning algorithms. between these different notions of classical simulation. Note also that computing exactly output 3.1 Adaptive linear optics probabilities of quantum circuits or interferome- Hereafter, we detail the computational model of ters is a #P-hard problem, therefore expected to adaptive linear optics which we consider. We first be hard classically and quantumly [37]. introduce a few notations. A more general computational task than prob- In the linear optics picture, the input states ability estimation in the context of quantum com- we consider are multimode photon number Fock puting is the following: states over m modes (we use bold math for multi- Definition 4 (Overlap estimation). Let |φi and index notations, see Table1): |ψi be output states of two efficiently describable 1 quantum computations of size M and let , δ > 0. †s1 †sm ⊗m |si = √ aˆ1 ... aˆm |0i , (6) Overlap estimation refers to the computational s! ˜ task of outputting an estimate O such that † where si and aî are respectively the number of ˜ 2 th O − | hφ|ψi | ≤ , (5) photons and the creation operator for the i m with probability greater than 1 − δ, in time mode [41]. We identify these states with -tuples s = (s , . . . , s ) ∈ Nm O(poly ( M ) log 1 ). of integers 1 m . Following [4], δ let us define, for all n ∈ N, The overlap between two quantum states is a m measure of their distinguishability [38] and over- Φm,n := {s = (s1, . . . , sm) ∈ N | |s| = n}. (7)

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 4 where Sr is the symmetric group over {1, . . . , r}. |s| s1 + ··· + sm We write Prm,n[.|t] the probability distribution of the outputs over Φm,n of the unitary interfer- s! s ! ··· s ! 1 m ometer U acting on an input |ti. With the previous notations we obtain, for all p, q ∈ N, all |si |s1 . . . smi s ∈ Φm,p and all t ∈ Φm,q, 2 s + t (s1 + t1, . . . , sm + tm) | Per(Us,t)| Prm,n[s|t] = δpq, (12) s!t! 0m (0,..., 0) where δpq is the Kronecker symbol. In what follows, we fix the input state |ti = m 1 (1,..., 1) |1n0m−ni, with single-photon states in the first n modes and vacuum states in all other modes, ∗ Table 1: Multi-index notations. For m ∈ N , we write where the superscript indicates the size of the s = (s , . . . , s ) ∈ Nm and t = (t , . . . , t ) ∈ Nm. 1 m 1 m string (0,..., 0) or (1,..., 1) when there is a possible ambiguity. We consider the case of linear op- This set corresponds to the m-mode Fock states tical quantum computing with adaptive photon- with total number of photons equal to n, and we number measurements, which we refer to as adap- m+n−1 have |Φm,n| = n . tive linear optics (see Fig.1). We denote by We consider a unitary interferometer of size k ∈ {0, . . . , m} the number of single-mode adap- m, described by an m × m unitary matrix U = tive measurements. Without loss of generality, (uij)1≤i,j≤m. Unlike in the circuit picture, the we assume that the first k modes are measured matrix U does not act on the computational ba- adaptively throughout the computation and we sis, which is in this case the infinite multimode write p = (p1, . . . , pk) the adaptive measurement Fock basis, but rather describes the linear evolu- outcomes. For r ∈ N and p ∈ Φk,r, let us define tion of the creation operator of each mode. More p U := [1k ⊕ Uk(p1, . . . , pk)] ... [11 ⊕ U1(p1)] U0, precisely, (13)  †   †   Pm †  where 1j is the identity matrix of size j and where aˆ1 aˆ1 k=1 u1kaˆk       the unitary matrices Uj depend on the measure-  .  7→ U  .  =  .  . (8)  .   .   .  ment outcomes p1, . . . , pj for all j ∈ {1, . . . , k}. † † Pm † aˆm aˆm k=1 umkaˆm An adaptive interferometer U over m modes with n input photons and k adaptive measurements is Uˆ We write instead the unitary action of the in- then represented as a family of nonadaptive uni- terferometer on the multimode Fock basis. Be- tary interferometers cause the interferometer conserves the total num- p ber of photons, for all p, q ∈ N, all s ∈ Φm,p and U := {U | p ∈ Φk,r, 0 ≤ r ≤ n} , (14) all t ∈ Φm,q for each possible adaptive measurement outcome hs|Uˆ|ti = 0 (9) p. The matrix U p describes the interferometer in whenever p 6= q. Let n ∈ N, s = (s1, . . . , sm) ∈ Fig.1, when the adaptive measurement outcome Φm,n and t = (t1, . . . , tm) ∈ Φm,n. Combining p = (p1, . . . , pk) ∈ Φk,r has been obtained, where Eq. (6) and Eq. (8) we obtain [4] r is the number of photons detected during the adaptive measurements. In this case, the output Per(U ) hs|Uˆ|ti = √ √s,t , (10) state is a pure state which reads: s! t! h p p†i Trk (|pihp| ⊗ 1m−k)Uˆ |tiht| Uˆ , (15) where Us,t is the n×n matrix obtained from U by th th repeating si times its i row and tj times its j where the partial trace is over the first k modes column for i, j = 1, . . . , m, and where the perma- and where |pi denotes the k-mode Fock state nent of a r × r matrix A = (aij)1≤i,j≤r is defined |p1 . . . pki. At the end of the computation, all as r the remaining m − k modes are measured with X Y Per A = aiσ(i), (11) photon-number detection, yielding the final out- σ∈Sr i=1 come s = (s1, . . . , sm−k) ∈ Φm−k,n−r.

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 5 . , ) i 6 n n i ψ m ψ | (1) | , the (16) } O k (log and , with . = and i k O i . 0 i ) − k φ , . . . , k m 1 | φ = m be an adap- | 1 ! m ... AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0oPtev1xxq+4cZJV4OalAjka//NUbxCyNUBomqNZdz02Mn1FlOBM4LfVSjQllYzrErqWSRqj9bH7qlJxZZUDCWNmShszV3xMZjbSeRIHtjKgZ6WVvJv7ndVMTXvsZl0lqULLFojAVxMRk9jcZcIXMiIkllClubyVsRBVlxqZTsiF4yy+vklat6l1Ua/eXlfpNHkcRTuAUzsGDK6jDHTSgCQyG8Ayv8OYI58V5dz4WrQUnnzmGP3A+fwAFYI2f AAAB7nicbVBNS8NAEJ3Ur1q/qh69LBbBiyWpgh6LXjxWsB/QhrLZbtolu5uwuxFK6I/w4kERr/4eb/4bt2kO2vpg4PHeDDPzgoQzbVz32ymtrW9sbpW3Kzu7e/sH1cOjjo5TRWibxDxWvQBrypmkbcMMp71EUSwCTrtBdDf3u09UaRbLRzNNqC/wWLKQEWys1NXDTFxEs2G15tbdHGiVeAWpQYHWsPo1GMUkFVQawrHWfc9NjJ9hZRjhdFYZpJommER4TPuWSiyo9rP83Bk6s8oIhbGyJQ3K1d8TGRZaT0VgOwU2E73szcX/vH5qwhs/YzJJDZVksShMOTIxmv+ORkxRYvjUEkwUs7ciMsEKE2MTqtgQvOWXV0mnUfcu642Hq1rztoijDCdwCufgwTU04R5a0AYCETzDK7w5ifPivDsfi9aSU8wcwx84nz9ZqY+T s s . The adaptive 1 modes with } n j input photons and 10 n ∈ { − m ) . . . j n r r , . . . , s ≤ ... , , or (poly 1 m ) 1 r s ! ( + | , . . . , p O k 1 m O k ≤ p ( k = n +

0 , the states O = k 2 n , k n =0 p n = AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2m3bpZhN2J0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xEnC/YgOlQgFo2ilh6Q/7pcrbtWdg6wSLycVyNHol796g5ilEVfIJDWm67kJ+hnVKJjk01IvNTyhbEyHvGupohE3fjY/dUrOrDIgYaxtKSRz9fdERiNjJlFgOyOKI7PszcT/vG6K4bWfCZWkyBVbLApTSTAms7/JQGjOUE4soUwLeythI6opQ5tOyYbgLb+8Slq1qndRrd1fVuo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnRfn3flYtBacfOYY/sD5/AFYto3W AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48VTVtoQ9lsJ+3SzSbsboRS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8MBVcG9f9dgpr6xubW8Xt0s7u3v5B+fCoqZNMMfRZIhLVDqlGwSX6hhuB7VQhjUOBrXB0O/NbT6g0T+SjGacYxHQgecQZNVZ68HujXrniVt05yCrxclKBHI1e+avbT1gWozRMUK07npuaYEKV4UzgtNTNNKaUjegAO5ZKGqMOJvNTp+TMKn0SJcqWNGSu/p6Y0FjrcRzazpiaoV72ZuJ/Xicz0XUw4TLNDEq2WBRlgpiEzP4mfa6QGTG2hDLF7a2EDamizNh0SjYEb/nlVdKsVb2Lau3+slK/yeMowgmcwjl4cAV1uIMG+MBgAM/wCm+OcF6cd+dj0Vpw8plj+APn8wcvlI27 n + X

r i | U k,r n adaptive measurements, the ψ k | = = Φ , which is the case for example k φ | ) , or

. . ∈ )

. | h m k,r and

. . r m . | Φ r . . . | (1) U n =0 (log { O X r (poly O = O = 2 = U adaptive measurements and let n = 2 k k p AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2m3bpZhN2J0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekEhh0HW/ncLa+sbmVnG7tLO7t39QPjxqmTjVjDdZLGPdCajhUijeRIGSdxLNaRRI3g7GtzO//cS1EbF6xEnC/YgOlQgFo2ilh6Rf65crbtWdg6wSLycVyNHol796g5ilEVfIJDWm67kJ+hnVKJjk01IvNTyhbEyHvGupohE3fjY/dUrOrDIgYaxtKSRz9fdERiNjJlFgOyOKI7PszcT/vG6K4bWfCZWkyBVbLApTSTAms7/JQGjOUE4soUwLeythI6opQ5tOyYbgLb+8Slq1qndRrd1fVuo3eRxFOIFTOAcPrqAOd9CAJjAYwjO8wpsjnRfn3flYtBacfOYY/sD5/AECUo2d k AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48VTVtoQ9lsJ+3SzSbsboRS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8MBVcG9f9dgpr6xubW8Xt0s7u3v5B+fCoqZNMMfRZIhLVDqlGwSX6hhuB7VQhjUOBrXB0O/NbT6g0T+SjGacYxHQgecQZNVZ68Hu1XrniVt05yCrxclKBHI1e+avbT1gWozRMUK07npuaYEKV4UzgtNTNNKaUjegAO5ZKGqMOJvNTp+TMKn0SJcqWNGSu/p6Y0FjrcRzazpiaoV72ZuJ/Xicz0XUw4TLNDEq2WBRlgpiEzP4mfa6QGTG2hDLF7a2EDamizNh0SjYEb/nlVdKsVb2Lau3+slK/yeMowgmcwjl4cAV1uIMG+MBgAM/wCm+OcF6cd+dj0Vpw8plj+APn8wfZIY2C U Let For adaptive linear optics over n + n both corresponding to specificment adaptive measure- results, have tonumber of be times. obtained a polynomial input photons and number of possiblecomes adaptive is given measurement by out- tive linear interferometer with In what follows, weof do the not adaptive assume measurement outcome concentration probability distribution and consider generalwith interferometers adaptive measurements. Inquantum this setting, efficient the regimethus for corresponds overlap to estimation when and where the sum istons over detected the at total thesurements. number stage of Hence, of pho- for theefficient, overlap adaptive either estimation mea- the tothe be probability adaptive measurements distribution outcomestrated for is on concen- a polynomial number of outcomes, or der to compute a polynomiallythe precise estimate overlap, of say, and 1 adaptive measurements and input state 1 p k AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48V7Qe0oWy2k3bpZhN2N0IJ/QlePCji1V/kzX/jts1BWx8MPN6bYWZekAiujet+O4W19Y3NreJ2aWd3b/+gfHjU0nGqGDZZLGLVCahGwSU2DTcCO4lCGgUC28H4dua3n1BpHstHM0nQj+hQ8pAzaqz0kPS9frniVt05yCrxclKBHI1++as3iFkaoTRMUK27npsYP6PKcCZwWuqlGhPKxnSIXUsljVD72fzUKTmzyoCEsbIlDZmrvycyGmk9iQLbGVEz0sveTPzP66YmvPYzLpPUoGSLRWEqiInJ7G8y4AqZERNLKFPc3krYiCrKjE2nZEPwll9eJa1a1buo1u4vK/WbPI4inMApnIMHV1CHO2hAExgM4Rle4c0Rzovz7nwsWgtOPnMMf+B8/gAAzo2c AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48VTVtoQ9lsJ+3SzSbsboRS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8MBVcG9f9dgpr6xubW8Xt0s7u3v5B+fCoqZNMMfRZIhLVDqlGwSX6hhuB7VQhjUOBrXB0O/NbT6g0T+SjGacYxHQgecQZNVZ68Hter1xxq+4cZJV4OalAjkav/NXtJyyLURomqNYdz01NMKHKcCZwWupmGlPKRnSAHUsljVEHk/mpU3JmlT6JEmVLGjJXf09MaKz1OA5tZ0zNUC97M/E/r5OZ6DqYcJlmBiVbLIoyQUxCZn+TPlfIjBhbQpni9lbChlRRZmw6JRuCt/zyKmnWqt5FtXZ/Wanf5HEU4QRO4Rw8uII63EEDfGAwgGd4hTdHOC/Ou/OxaC04+cwx/IHz+QPXnY2B U modes, depends on the measurement outcomes and project j † 2 − C the unitary cir- 1 m 2 C are used to drive the computation, whose final outcome is 0

C k AAAB6nicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48VTVtoQ9lsJ+3SzSbsboRS+hO8eFDEq7/Im//GbZuDtj4YeLw3w8y8MBVcG9f9dgpr6xubW8Xt0s7u3v5B+fCoqZNMMfRZIhLVDqlGwSX6hhuB7VQhjUOBrXB0O/NbT6g0T+SjGacYxHQgecQZNVZ68Htur1xxq+4cZJV4OalAjkav/NXtJyyLURomqNYdz01NMKHKcCZwWupmGlPKRnSAHUsljVEHk/mpU3JmlT6JEmVLGjJXf09MaKz1OA5tZ0zNUC97M/E/r5OZ6DqYcJlmBiVbLIoyQUxCZn+TPlfIjBhbQpni9lbChlRRZmw6JRuCt/zyKmnWqt5FtXZ/Wanf5HEU4QRO4Rw8uII63EEDfGAwgGd4hTdHOC/Ou/OxaC04+cwx/IHz+QPWGY2A U and 1 , . . . , p 1 C , acting on

p ...... j U 2021-07-02, click title to verify. Published under CC-BY 4.0. i i i i 0 m 0 1 1 AAAB8HicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGC/ZA2lM120i7dbMLuRiixv8KLB0W8+nO8+W/ctjlo64OBx3szzMwLEsG1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6mfqtR1Sax/LejBP0IzqQPOSMGis9PLldReVAYK9UdivuDGSZeDkpQ456r/TV7ccsjVAaJqjWHc9NjJ9RZTgTOCl2U40JZSM6wI6lkkao/Wx28IScWqVPwljZkobM1N8TGY20HkeB7YyoGepFbyr+53VSE175GZdJalCy+aIwFcTEZPo96XOFzIixJZQpbm8lbEgVZcZmVLQheIsvL5NmteKdV6p3F+XadR5HAY7hBM7Ag0uowS3UoQEMIniGV3hzlPPivDsf89YVJ585gj9wPn8AtneQWQ== | AAAB8HicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGC/ZA2lM120i7dbMLuRiixv8KLB0W8+nO8+W/ctjlo64OBx3szzMwLEsG1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6mfqtR1Sax/LejBP0IzqQPOSMGis9PLldReVAYK9UdivuDGSZeDkpQ456r/TV7ccsjVAaJqjWHc9NjJ9RZTgTOCl2U40JZSM6wI6lkkao/Wx28IScWqVPwljZkobM1N8TGY20HkeB7YyoGepFbyr+53VSE175GZdJalCy+aIwFcTEZPo96XOFzIixJZQpbm8lbEgVZcZmVLQheIsvL5NmteKdV6p3F+XadR5HAY7hBM7Ag0uowS3UoQEMIniGV3hzlPPivDsf89YVJ585gj9wPn8AtneQWQ== AAAB8HicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGC/ZA2lM120i7dbMLuRiixv8KLB0W8+nO8+W/ctjlo64OBx3szzMwLEsG1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6mfqtR1Sax/LejBP0IzqQPOSMGis9PHldReVAYK9UdivuDGSZeDkpQ456r/TV7ccsjVAaJqjWHc9NjJ9RZTgTOCl2U40JZSM6wI6lkkao/Wx28IScWqVPwljZkobM1N8TGY20HkeB7YyoGepFbyr+53VSE175GZdJalCy+aIwFcTEZPo96XOFzIixJZQpbm8lbEgVZcZmVLQheIsvL5NmteKdV6p3F+XadR5HAY7hBM7Ag0uowS3UoQEMIniGV3hzlPPivDsf89YVJ585gj9wPn8AuAKQWg== AAAB8HicbVBNS8NAEJ34WetX1aOXxSJ4KkkV9Fj04rGC/ZA2lM120i7dbMLuRiixv8KLB0W8+nO8+W/ctjlo64OBx3szzMwLEsG1cd1vZ2V1bX1js7BV3N7Z3dsvHRw2dZwqhg0Wi1i1A6pRcIkNw43AdqKQRoHAVjC6mfqtR1Sax/LejBP0IzqQPOSMGis9PHldReVAYK9UdivuDGSZeDkpQ456r/TV7ccsjVAaJqjWHc9NjJ9RZTgTOCl2U40JZSM6wI6lkkao/Wx28IScWqVPwljZkobM1N8TGY20HkeB7YyoGepFbyr+53VSE175GZdJalCy+aIwFcTEZPo96XOFzIixJZQpbm8lbEgVZcZmVLQheIsvL5NmteKdV6p3F+XadR5HAY7hBM7Ag0uowS3UoQEMIniGV3hzlPPivDsf89YVJ585gj9wPn8AuAKQWg== u | | | t modes. The output modes are measured using photon number detection. For all n AAACAXicbZDLSsNAFIYn9VbrLepGcDNYBFclqYIui25cVrAXaEKZTE7aoZNJmJkIJdSNr+LGhSJufQt3vo3TNgtt/WHg4z/nzMz5g5QzpR3n2yqtrK6tb5Q3K1vbO7t79v5BWyWZpNCiCU9kNyAKOBPQ0kxz6KYSSBxw6ASjm2m98wBSsUTc63EKfkwGgkWMEm2svn3kBTBgIqfmDjXxQIQF9u2qU3NmwsvgFlBFhZp9+8sLE5rFIDTlRKme66Taz4nUjHKYVLxMQUroiAygZ1CQGJSfzzaY4FPjhDhKpDlC45n7eyInsVLjODCdMdFDtVibmv/VepmOrvyciTTTIOj8oSjjWCd4GgcOmQSq+dgAoZKZv2I6JJJQbUKrmBDcxZWXoV2vuee1+t1FtXFdxFFGx+gEnSEXXaIGukVN1EIUPaJn9IrerCfrxXq3PuatJauYOUR/ZH3+AJoOl6E= AAACAXicbZDLSsNAFIYn9VbrLepGcDNYBFclqYIui25cVrAXaEKZTE7aoZNJmJkIJdSNr+LGhSJufQt3vo3TNgtt/WHg4z/nzMz5g5QzpR3n2yqtrK6tb5Q3K1vbO7t79v5BWyWZpNCiCU9kNyAKOBPQ0kxz6KYSSBxw6ASjm2m98wBSsUTc63EKfkwGgkWMEm2svn3kBTBgIqfmDjXxQIQF9u2qU3NmwsvgFlBFhZp9+8sLE5rFIDTlRKme66Taz4nUjHKYVLxMQUroiAygZ1CQGJSfzzaY4FPjhDhKpDlC45n7eyInsVLjODCdMdFDtVibmv/VepmOrvyciTTTIOj8oSjjWCd4GgcOmQSq+dgAoZKZv2I6JJJQbUKrmBDcxZWXoV2vuee1+t1FtXFdxFFGx+gEnSEXXaIGukVN1EIUPaJn9IrerCfrxXq3PuatJauYOUR/ZH3+AJoOl6E=

{ { a times, obtaining classical outcomes, m u n AAAB6HicbVBNS8NAEJ3Ur1q/qh69LBbBU0mqoMeiF48t2FpoQ9lsJ+3azSbsboQS+gu8eFDEqz/Jm//GbZuDtj4YeLw3w8y8IBFcG9f9dgpr6xubW8Xt0s7u3v5B+fCoreNUMWyxWMSqE1CNgktsGW4EdhKFNAoEPgTj25n/8IRK81jem0mCfkSHkoecUWOlpuyXK27VnYOsEi8nFcjR6Je/eoOYpRFKwwTVuuu5ifEzqgxnAqelXqoxoWxMh9i1VNIItZ/ND52SM6sMSBgrW9KQufp7IqOR1pMosJ0RNSO97M3E/7xuasJrP+MySQ1KtlgUpoKYmMy+JgOukBkxsYQyxe2thI2ooszYbEo2BG/55VXSrlW9i2qteVmp3+RxFOEETuEcPLiCOtxBA1rAAOEZXuHNeXRenHfnY9FacPKZY/gD5/MH2F+M9g== n Q ) m AAAB6nicbVDLSgNBEOz1GeMr6tHLYBC8GHajoMegF48RzQOSJcxOZpMh81hmZoWw5BO8eFDEq1/kzb9xkuxBEwsaiqpuuruihDNjff/bW1ldW9/YLGwVt3d29/ZLB4dNo1JNaIMornQ7woZyJmnDMstpO9EUi4jTVjS6nfqtJ6oNU/LRjhMaCjyQLGYEWyc9iHPZK5X9ij8DWiZBTsqQo94rfXX7iqSCSks4NqYT+IkNM6wtI5xOit3U0ASTER7QjqMSC2rCbHbqBJ06pY9ipV1Ji2bq74kMC2PGInKdAtuhWfSm4n9eJ7XxdZgxmaSWSjJfFKccWYWmf6M+05RYPnYEE83crYgMscbEunSKLoRg8eVl0qxWgotK9f6yXLvJ4yjAMZzAGQRwBTW4gzo0gMAAnuEV3jzuvXjv3se8dcXLZ47gD7zPHwy2jaQ= m For estimating an output probability with For estimating the overlap of output states of In the case of circuits with adaptive measure- (poly Accepted in Figure 1: Linear optical computing model with photons over We now detail howmation to or perform overlap probabilitylinear estimation esti- optical with interferometer an asprevious section. adaptive described in the a quantumO circuit, onefor may which the frequency run givescise a additive the polynomially estimate of pre- the circuit probabilitybe which can computed efficiently.with In adaptive the measurements, casethe one of a final only circuit looks measurementholds at outcomes for adaptive and linear the optical computations. same two unitary quantum circuits,circuits one may in run parallel both output and states, for compare example with their theDoing swap so quantum test [40]. a polynomiala number of polynomially times precise provides Alternatively, writing estimate of the overlap. ments, there is aadaptive different measurement output outcome. stateif for the In each number particular, ofoutcomes possible is exponential adaptive in measurement thetation, size then of the the probability compu- distributionoutcomes for these has tomial be number concentrated of onestimation events to for a be the polyno- efficient. quantum overlap This is because in or- unitary interferometer measurement outcomes 3.2 Quantumtion probability and overlap estima- cuits, one may build thethe circuit output quantum state onto the input state. be two output states. Let p and q denote the out- Algorithm 1: Overlap estimation for comes of the adaptive measurements for |φi and adaptive linear optics p |ψi, respectively, so that U is the interferometer Input: Adaptive interferometer |φi U q |ψi r for and is the interferometer for , with U = {U |r ∈ Φk,r, 0 ≤ r ≤ n}, adaptive input Fock state |ti. With Eq. (15) we obtain measurement outcomes p, q. Parameters: Input |ti and number of 2 | hφ|ψi | shots T . h p p† i Set the state preparation counter c = 0. = Tr Trk[(|pihp| ⊗ 1m−k)Uˆ |tiht| Uˆ ] |ψihψ| sp Set the overlap counter cover = 0. h ˆ p ˆ p† i = Tr (|pihp| ⊗ 1m−k)U |tiht| U (1k ⊗ |ψihψ|) while csp < T do h i Run U on input |ti, obtaining adaptive = Tr Uˆ p |tiht| Uˆ p†(|pihp| ⊗ |ψihψ|) measurement outcome r and output h i = Tr |tiht| Uˆ p† (|pihp| ⊗ |ψihψ|) Uˆ p . state |χi. (17) if r = p then csp → csp + 1. Because of the conservation of the total number Run U q† on input |qi ⊗ |χi. of photons, the overlap between the states |φi and Measure in photon number basis. |ψi is zero if |p|= 6 |q|. Otherwise, it can be es- Record measurement outcome s. timated using a polynomial number of copies of if s = t then the state |ψi as follows: send the input |pi ⊗ |ψi c → c + 1, p† over over into the interferometer with unitary matrix U else and measure the photon number in each output cover → cover. mode; record the value 1 if the measurement pat- else |ti 0 tern matches the Fock state and the value if r = q then otherwise. Then, the mean of the obtained values csp → csp + 1. yields a polynomially precise estimate of the over- Run U p† on input |pi ⊗ |χi. 2 lap | hφ|ψi | by Eq. (17) and Hoeffding inequality. Measure in photon number basis. A similar procedure can be followed when |φi and Record measurement outcome s. |ψi are output states of two different adaptive in- if s = t then terferometers. cover → cover + 1, Note that this overlap estimation requires the else c → c . preparation of the Fock state |pi but no adaptive over over measurements. By symmetry, one could estimate else the overlap alternatively using a polynomial num- return cover/T . ber of copies of the state |φi and preparing the Fock state |qi, or by mixing copies of |φi and |ψi. d In practice, one may run the adaptive interferom- R , yl ∈ C, ∀l ∈ {1 ..., |T |} and where C = eter U and apply the above procedure for |φi or {−1, 1} in the case of binary classification. |ψi on the fly, depending on whether the adaptive We also consider a feature map that is given measurement outcome obtained is equal to q or p by a subset of the family introduced in Eq. (14): pl (see Algorithm1 for overlap estimation including Uφ = {Ul }l, in which for each classical data xl pl this state preparation step). there is a unitary operator Ul . These tasks of probability or overlap estimation may be performed in particular as subroutines for 3.3.1 Support vector machine with quantum ker- machine learning algorithms, as we detail in the nel methods next section. The goal of a support vector machine [42–44] is to find the maximum-margin hyperplane that di- 3.3 Support vector machine with adaptive lin- vides the set of points by their y values. Such ear optics an hyperplane is defined by a vector ~w ∈ Rd and b ∈ R. With the hard-margin condition we have, We consider a training dataset T , with |T | points of the form {(~x1, y1),..., (~x|T |, y|T |)}, where ~xl ∈ yl(~w.~xl + b) ≥ 1, ∀l ∈ {1 ..., |T |}. (18)

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 7 In this case, a decision function over a new point Algorithm 2: Explicit method: training ~x can be constructed directly from the hyperplane phase as follows: Input: Training dataset T , feature maps Uφ, optimisation routine with cost f(~x) = sign(~w.~x + b). (19) function J(θ~). Parameters: Input |ti, number of shots When we have access to a feature map φ, the T , initial parameters θ~ . vector ~w can be decomposed as 0 Set θ~ = θ~0. |T | while J(θ~) not converged do X ~w = αlylφ(xl), (20) for l = 1 to |T | do

l=1 Set cpl = 0. Set cy = 0 ∀y ∈ C. for some αl ∈ C, and the decision function be- while cpl < T do comes ~ pl Run BS(θ)Ul on |ti, obtaining  |T |  adaptive measurement X f(~x) = sign  αlylκ(~xl, ~x) + b , (21) outcome r and outcome label l=1 y. if r = pl then where the kernel κ(~xl, ~x) has an interpretation in cpl → cpl + 1. φ (1) terms of the feature map as given in Eq. . cy → cy + 1. In order to ﬁnd the best separating hyper- else plane, one may use a quadratic programming end while solver. In section 3.3.2, we consider the case Compute Pr(y|pl) = cy/T . where the programming solver is composed of end for a hybrid quantum-classical algorithm whereas in Compute J(θ~) and update θ~. section 3.3.3, such solver will be external to the end while quantum device. The optimisation program that return Value J(θ~∗) and ﬁnal θ~∗ . needs to be solved is:

|T | X 1 X maximize αl − αlylαl0 yl0 κ(~xl, ~xl0 ) state as an input state of a second quantum de- αl 2 l=1 l,l0 vice for the support vector machine [19]. The |T | choice of the feature map ultimately depends on X subject to αlyl = 0 the architecture at hand: one may use, e.g., re- l=1 configurable interferometers [45] in the linear op- 0 ≤ αl ≤ 1/2nλ, ∀l ∈ {1 ..., |T |}, tics picture. We show that Boson Sampling can (22) be used to realize a support vector machine algorithm by using a hybrid variational quantum- where λ is a parameter that is introduced for the classical for the training phase [14, 15, 33]. We soft-margin condition. By solving such problem, write the Boson Sampling interferometer opera- ∗ ∗ one obtains the optimal coefficients {αl } and b of tion BS(θ~), where θ~ are the parameters of the the hyperplane (see Eq. (20)). This formulation beam splitters and phase shifters of the interfer- is for the quantum case , but similar methods can ometer, since any interferometer can be imple- be applied in the classical case [43]. mented efficiently with only phase shifters and beam splitters [46]. In order to make a predic- 3.3.2 Explicit method: probability estimation tion we need to bin the outcomes. For a given function g :Φm,n → {−1, +1}, we can write the The explicit method corresponds to the case following observable for the binning: where the prediction—assigning a classification label to the data—is obtained from the probabil- N = X g(s) |si hs| . (23) ity distribution when using a quantum device. s∈Φm,n Hereafter, we consider using adaptive linear optics for the feature map and use its output The observable N can also be decomposed in

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 8 Algorithm 3: Explicit method: predic- times where the value y has been recorded after tion phase that the value px has been also recorded. Input: Unlabeled data x, feature map In order to train the Boson Sampling device, it ~∗ ~ Uφ(x), optimal parameters θ . is necessary to use a cost function J(θ) and an Parameters: Input |ti, number of shots optimisation routine. For the optimisation rou- Tx. tine, standard variational optimisation methods

Set cpx = 0. can be used such as hybrid variational quantum- Set cy = 0 ∀y ∈ C. classical or classical algorithms [14, 15, 33]. Dif-

while cpx < Tx do ferent cost functions can be used such as the em- ~∗ Run BS(θ )Uφ(x) on |ti, obtaining pirical risk function [19] or the square-loss func- adaptive measurement outcome r and tion [20]. However, other type of cost function outcome label y. may not be suitable for this kind of problems, if r = px then such as the ones usually used in gate synthesis or cpx → cpx + 1. quantum state preparation [48, 49]. cy → cy + 1. The algorithm for the training phase is de- else scribed in Algorithm2 and the prediction phase end while is described in Algorithm3. Compute Pr(y|px) = cy/Tx. return argmaxy{Pr(y|px)} . 3.3.3 Implicit method: overlap estimation term of projectors as N = Π+1 − Π−1, where Πy = (1 + yN )/2. Algorithm 4: Implicit method: training For a given data point ~x with the associated phase ˆ px operation Ux , the probability to obtain the out- Input: Dataset T , Feature maps U, come y is: quadratic programming solver. Parameters: Input |ti and number of h i ~ ˆ px ˆ px† ~ † Pr(y|px) = Tr ΠyBS(θ)Ux |ti ht| Ux BS(θ) shots T . 1 for l = 1 to |T | do = 1 + y ht| Uˆ px†BS(θ~)†N BS(θ~)Uˆ px |ti . for l0 = 1 to |T | do 2 x x pl⊗T ⊗T (24) Run Ul on input |ti . pl pl0 In the quantum circuit model, it has been explic- Run Algo.1 with input Ul ,Ul0 . itly shown in [19] that the equivalent of Eq. (24) Store output at kernel matrix entry in the circuit picture is related to the decision Kl,l0 = κ(~xl, ~xl0 ). function of a support vector machine. This proof endfor relies on decomposing the variational quantum endfor circuit in the Pauli basis. Moreover, in [47], the Solve the optimisation problem in Eq. (22) authors provide a quantum circuit that simulates with T and K. ∗ ∗ Boson Sampling interferometers with input pho- return {αl } and b . tons, using the quantum Schur transform. Hence, the unitary BS(θ~) can be decomposed in the The implicit method corresponds to the case Pauli basis (for qudits), thus providing the same where the kernel is computed by a quantum de- relation for Eq. (24) to the decision function of a vice, while the rest of the machine learning algo- support vector machine. rithm is performed on a classical machine. In Pr(y|p ) Experimentally, the probability x can adaptive linear optics, the kernel can be esti- be estimated using the following approximation mated using Algorithm1. It is possible to pro- ceed to both the training and prediction phase Ty|x Pr(y|p ) ≈ , (25) using this hybrid quantum-classical method. The x T x algorithm for the training phase is described in where Tx is the number of times where the value Algorithm4 and the prediction phase is described px has been recorded and Ty|x is the number of in Algorithm5.

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 9 Algorithm 5: Implicit method: predic- {n + 1, . . . , m}. When |s|= 6 n however, the prob- tion phase ability is 0, since the input t has n photons and Input: Dataset T , unlabeled data ~x, the linear interferometer does not change the to- ∗ tal number of photons. optimal parameters {αl } and b Parameters: Input |ti and number of The permanent of a square matrix of size n can n shots T . be computed exactly in time O(n2 ), thanks to Set the variable f = b. Ryser’s formula [52]. However, polynomially pre- for l = 1 to |T | do cise estimates of the permanent of a square uni- px⊗T ⊗T tary matrix can be obtained in polynomial time Run Ux on input |ti . Run Algo.1 with parameters in the size of the matrix using an algorithm due px⊗T ⊗T pl to Gurvits [53], later generalised to matrices with Ux |ti ,Ul . repeated lines or columns [54], so probability es- f → f + αlylκ(~xl, ~x). endfor timation can be done classically efficiently, which return Return sign(f). was already noted in [4]. We now turn to the case k > 0—which to our knowledge has not been treated elsewhere—using 4 Classical simulation of adaptive lin- the notations of section 3.1. This case is a direct ear optics extension of the case k = 0. With Eq. (12), for r ∈ N, p ∈ Φk,r and s ∈ Φm−k,n−r, the probabil- The complexity of probability estimation and ity of an total outcome (p, s) ∈ Φm,n (adaptive overlap estimation of quantum computations has measurement and final outcome) is given by been well studied in the circuit model [50, 51]. 1 p 2 In this section, we challenge the quantum al- Prtotal[p, s] = Per U . (27) m,n p!s! (p,s),t gorithms introduced in the previous section by deriving classical algorithms for probability esti- Let r ∈ {0, . . . , n} and let s ∈ Φm−k,n−r. Then, mation and overlap estimation for adaptive lin- the probability of obtaining the final outcome s ear optics over m modes. We identify various after the adaptive measurements reads complexity regimes for different numbers of in- n ≤ m final X total put photons , different number of adap- Prm,n [s] = Prm,n [p, s] tive measurements k ≤ m, and different number p∈Φk,r of photons r ≤ n detected during the adaptive 1 1 2 (28) = X Per U p . measurements. To do so, we derive generic ex- s! p! (p,s),t p∈Φ pressions for the output probabilities and for the k,r overlap of output states of linear optical interfer- The sum is taken over the elements of Φk,r, which ometers with adaptive measurements. k+r−1 has r elements, where r ≤ n is the total number of photons detected during the adaptive 4.1 Classical probability estimation measurements. Each permanent in the sum can Firstly, we obtain a classical algorithm for prob- be estimated additively with polynomial preci- ability estimation of adaptive linear optics over sion in time O(poly m), using the generalised al- m modes with n input photons and k adaptive gorithm from [54] (see AppendixB for details). measurements. Hence, probability estimation can be done classi- We first consider the case k = 0, i.e., Bo- cally efficiently whenever the sum has a polyno- son Sampling. The probability of an outcome mial number of terms. In particular, as long as k r O(log m) s ∈ Φm,n for a unitary interferometer U given the both and are , the output probabil- n m−n input t = (1 , 0 ) ∈ Φm,n is given by Eq. (12): ity can be estimated efficiently. The simulability regimes are summarised in Table2. 1 2 Prm,n[s] = | Per (Us,t)| , (26) The universal quantum computing regime cor- s! responds to n = O(m) and k = O(m) [31]. The time complexity of the classical simulation is where Us,t is the n × n matrix obtained from th k+r−1 U by repeating si times its i row for i ∈ O r poly m and there is a possibility of {1, . . . , m} and removing its jth column for j = subuniversal quantum advantage for probability

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 10 the matrix element t, t of Uˆ †Vˆ : k O(1) O(log m) O(m) r hφ|ψi = X ht|Uˆ †|ui hv|Vˆ |ti hu|vi O(1) u,v∈Φm,n = X ht|Uˆ †|si hs|Vˆ |ti O(log m) s∈Φm,n (30) = ht|Uˆ †Vˆ |ti O(m) h † i Per (U V )t,t = , Table 2: Simulability regimes for probability estimation t! as a function of the parameters r (the total number where we used in the third line t ∈ Φm,n and the of photons detected during the adaptive measurements) † fact that Uˆ Vˆ conserves the space Φm,n. With and k (the number of single-mode adaptive measure- n m−n ments). In green is the parameter regime for which the input t = (1 , 0 ) with n photons in m the classical probability estimation is eﬃcient, i.e., takes modes, this reduces to polynomial time in m, while in red is the regime where h † i it is no longer eﬃcient. hφ|ψi = Per (U V )n , (31)

† where (U V )n is the n × n top left submatrix of estimation for n = O(log m) and k = O(m), or U †V . Hence, the inner product and the overlap n = O(m) and k = O(log m). Moreover, the frac- may be approximated to a polynomial precision r tion n of input photons detected during the adap- efficiently, since this is the case for the perma- tive measurements has to be sufficiently large to nent [53, 54], as detailed in the previous section. prevent efficient classical simulation. We now consider the case k > 0. Let r ∈ N and let p ∈ Φk,r. With Eq. (10) and Eq. (15), writing adap Prm,n [p] the probability of obtaining the adap- 4.2 Classical overlap estimation tive measurement outcome p, the output state of the interferometer U p with k adaptive measure- n m−n In this section, we obtain a classical algorithm for ments with input t = (1 , 0 ) in Fig.1, when overlap estimation of output states of adaptive the adaptive measurement outcome p is obtained, reads linear optical interferometers over m modes with 1 n input photons and k adaptive measurements. q |ψpi , (32) Pradap[p] Once again, we start with k = 0. With m,n Eq. (10), the output state of an m-mode inter- where U t ∈ Φ ferometer with input state m,n reads p X Per U(p,s),t |ψpi := √ |si, (33) X p!s! |φi = hs|Uˆ|ti |si s∈Φm−k,n−r

s∈Φm,n adap (29) and where Pr [p] = hψp|ψpi. The inner prod- X Per (Us,t) m,n = √ |si, uct of two (not normalised) output states |ψpi s∈Φ s!t! p q m,n and |ψqi of m-mode interferometers U and V with k adaptive measurements is zero if |p|= 6 |q|, where Us,t is the n × n matrix obtained from and when |p| = |q| = r, it is thus given by th U by repeating si times its i row for i ∈ th 1 {1, . . . , m} and repeating tj times its j row for hψ |ψ i = √ p q p!q! j ∈ {1, . . . , m}. The composition of two inter- 1 ferometers is another interferometer which uni- × X Per U p† Per V q . s! t,(p,s) (q,s),t tary representation is the product of the unitary s∈Φm−k,n−r representations of the composed interferometers. (34) Hence, the inner product of the output states |φi This expression is a sum of |Φm−k,n−r| terms, and |ψi of two m-mode interferometers U and V which is exponential in m whenever n is not con- with the same input state t ∈ Φm,n, is equal to stant. Using properties of the permanent [55], we

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 11 k O(1) O(log m) O(m) r O(1) O(log n) O(n) n n

O(1) O(1)

O(log m) ··· O(log m)

O(m) ··· ··· O(m)

Table 3: Simulability regimes for classical overlap esti- Table 4: Simulability regimes for classical overlap estimation as a function of the parameters n and k. In green mation as a function of the parameters n (total number is the parameter regime for which the classical overlap of input photons) and r (total number of photons de- estimation may be efficient (i.e., may take polynomial tected in the adaptive measurements). The regimes are time in m) depending on the value of r, and in red is obtained from the running time of√ the classical algorithm n n the regime where the classical algorithm is no longer ef- using Stirling’s equivalent n! ∼ 2πn( e ) . ficient. The cells containing a symbol “··· ” correspond to parameter regimes for which the quantum algorithm n2 for estimating the overlap is no longer efficient. modulus squared of a sum over r products of three permanents, of square matrices of sizes |p| = r, |q| = r and (n − r), respectively. In the show that the expression in Eq. (34) may actu- worst case, when r = n/2, the sum has at most ally be rewritten as a sum over fewer terms, which n O(4 ) terms, up to a polynomial factor in n. In constitutes our main technical result: particular, when n = O(log m) or r = O(1), the Lemma 1. Let r ∈ N. The inner product of two inner product reduces to a sum of a polynomial (not normalised) output states |ψpi and |ψqi of number of terms, which can all be computed in m-mode interferometers U p and V q with adap- time O(poly m) with Ryser’s formula [52]. An tive measurements outcome p, q ∈ Φk,r is given interesting fact is that the cost of computing the by inner product does not depend explicitely on the number k > 0 of adaptive measurements. How- 1 ever, it does depend explicitly on r the total num- hψp|ψqi = √ p!q! ber of photons detected during the adaptive mea- × X Per Ai Per Bj Per Ci,j , (35) surements, and a larger number of adaptive mea- i,j∈{0,1}n surements k favors the detection of a larger num- |i|=|j|=r ber of photons r. The overlap of normalised ouput states is given where for all i, j ∈ {0, 1}n such that |i| = |j| = r, by 2 i p† | hψp|ψqi | A = U(i,0m−n),(p,0m−k) (36) , (39) hψp|ψpi hψq|ψqi is an r×r matrix which can be obtained efficiently p which may also be computed efficiently when from U , n = O(log m). In this case, the classical algo- q rithm for overlap estimation simply computes the Bj = V (37) (q,0m−k),(j,0m−n) above expression, using Lemma1 for each of the is an r×r matrix which can be obtained efficiently inner products. The running time of this classi- O(n2 poly m) from V q, and cal algorithm thus is r and its effi- ciency is summarised as a function of n and k in i,j p† q Table3 and as a function of n and r in Table4. C =U(1n−i,0m−n),(0k,1m−k)V(0k,1m−k),(1n−j,0m−n) (38) Since the quantum efficient regime corresponds k+n is an (n−r)×(n−r) matrix which can be obtained to n = O(poly m) there is a possibility of efficiently from U p and V q. quantum advantage for overlap estimation when k = O(1) and n = O(m). However, like for prob- r We give a detailed proof in AppendixC. By ability estimation, the fraction n of input photons Lemma1, the inner product is expressed as the detected during the adaptive measurements has

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 12 to be sufficiently large to prevent efficient clas- be possible: support vector machine with adap- sical simulation. In this case, when the number tive linear optics has to take place either in a of adaptive measurements satisfies k = O(1), the bunching regime—concentrating many photons interferometers used must concentrate many pho- in the adaptive measurements—or using a num- tons on the adaptive measurements. ber of adaptive measurements scaling with the size of the problem. In both cases, this imposes strong experimental requirements. 5 Conclusion Our results have identified a sweet spot for quantum kernel estimation with adaptive lin- In this work, we have given a roadmap for per- ear optics using a single adaptive measurement, forming quantum variational classification and which would be interesting to demonstrate exper- quantum kernel estimation using adaptive linear imentally. In the longer term, variational classi- optical interferometers. We have investigated the fication with adaptive linear optics could also be transition of classical simulation between Boson interesting since it may enable quantum advan- Sampling [4] and the Knill–Laflamme–Milburn tage in a regime where the quantum algorithm scheme for universal quantum computing [31], in for overlap estimation is no longer efficient. terms of the number of adaptive measurements As previously mentioned, it is a pressing ques- performed. In particular, we have derived classi- tion whether more efficient classical algorithms cal algorithms for simulating the quantum com- may be derived. In practical settings, taking into putational subroutines involved: output proba- account photon losses could help providing more bility estimation and output state overlap esti- efficient classical simulation algorithms [56]. We mation. leave these considerations for future work. In the case of probability estimation, the possible regimes for quantum advantage are incom- patible with near-term implementations: both Acknowledgements the number of adaptive measurements k and the number of input photons n must be greater than AS has been supported by a KIAS individual log m, where m is the number of modes. On the grant (CG070301) at Korea Institute for Ad- other hand, for overlap estimation, there is a pos- vanced Study. DM acknowledges support from sibility of near-term computational advantage us- the ANR through project ANR-17-CE24-0035 ing adaptive linear optics with a single adaptive VanQuTe. measurement, requiring the preparation of photon number states. Note that the interferometer should be concentrating many photons r at References the stage of the adaptive measurements in order to obtain overlaps that are possibly hard to es- [1] R. P. Feynman, “Simulating physics with timate. Using more adaptive measurements does computers,” Int. J. Theor. Phys 21, (1982). not increase the complexity (apart from polyno- [2] P. W. Shor, “Algorithms for quantum mial factors in m), but may ease the detection of computation: discrete logarithms and a larger number of photons during the adaptive factoring,” in Proceedings 35th annual measurements. symposium on foundations of computer Our results suggest regimes where quantum ad- science, pp. 124–134, IEEE. 1994. vantage for machine learning with adaptive lin- [3] J. Preskill, “Quantum Computing in the ear optics is possible, in the parameter regimes NISQ era and beyond,” Quantum 2, 79 where our classical simulation algorithms fail to (2018). be efficient: it is an interesting open question [4] S. Aaronson and A. Arkhipov, “The whether better classical simulation algorithms for computational Complexity of Linear the computational subroutines involved can be Optics,” Theory of Computing 9, 143 found, or even classical algorithms solving di- (2013). rectly the machine learning problems efficiently. [5] M. J. Bremner, R. Josza, and D. Shepherd, In any case, our results restrict the parameter “Classical simulation of commuting regimes for which such a quantum advantage may quantum computations implies collapse of

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 13 the polynomial hierarchy,” Proc. R. Soc. A [16] K. Beer, D. Bondarenko, T. Farrelly, T. J. 459, 459 (2010). Osborne, R. Salzmann, D. Scheiermann, [6] F. Arute, K. Arya, R. Babbush, D. Bacon, and R. Wolf, “Training deep quantum J. C. Bardin, R. Barends, R. Biswas, neural networks,” Nature Communications S. Boixo, F. G. Brandao, D. A. Buell, et al., 11, 808 (2020). “Quantum supremacy using a [17] A. Abbas, D. Sutter, C. Zoufal, A. Lucchi, programmable superconducting processor,” A. Figalli, and S. Woerner, “The power of Nature 574, 505–510 (2019). quantum neural networks,” [7] H.-S. Zhong, H. Wang, Y.-H. Deng, M.-C. arXiv:2011.00027. Chen, L.-C. Peng, Y.-H. Luo, J. Qin, [18] M. Cerezo, A. Arrasmith, R. Babbush, D. Wu, X. Ding, Y. Hu, et al., “Quantum S. C. Benjamin, S. Endo, K. Fujii, J. R. computational advantage using photons,” McClean, K. Mitarai, X. Yuan, L. Cincio, Science 370, 1460–1463 (2020). and P. J. Coles, “Variational Quantum [8] A. W. Harrow, A. Hassidim, and S. Lloyd, Algorithms,” arXiv:2012.09265. “Quantum algorithm for linear systems of [19] V. Havlíček, A. D. Córcoles, K. Temme, equations,” Physical Review Letters 103, A. W. Harrow, A. Kandala, J. M. Chow, 150502 (2009). and J. M. Gambetta, “Supervised learning [9] M. Schuld, I. Sinayskiy, and F. Petruccione, with quantum-enhanced feature spaces,” “The quest for a Quantum Neural Nature 567, 209 (2019). Quantum Information Processing Network,” [20] M. Schuld and N. Killoran, “Quantum 13 , 2567–2586 (2014). machine learning in feature Hilbert spaces,” [10] J. Romero, J. P. Olson, and Physical review letters 122, 040504 (2019). A. Aspuru-Guzik, “Quantum autoencoders [21] C. Blank, D. K. Park, J.-K. K. Rhee, and for efficient compression of quantum data,” F. Petruccione, “Quantum classifier with Quantum Science and Technology 2, 045001 tailored quantum kernel,” npj Quantum (2017). Information 6, 41 (2020). [11] C. Ciliberto, M. Herbster, A. D. Ialongo, [22] K. Bartkiewicz, C. Gneiting, A. Černoch, M. Pontil, A. Rocchetto, S. Severini, and K. Jiráková, K. Lemr, and F. Nori, L. Wossnig, “Quantum machine learning: a “Experimental kernel-based quantum classical perspective,” Proceedings of the Royal Society A: Mathematical, Physical machine learning in finite feature space,” Scientific Reports 10, 12356 (2020). and Engineering Sciences 474, 20170551 (2018). [23] Y. Liu, S. Arunachalam, and K. Temme, “A rigorous and robust quantum speed-up in [12] V. Dunjko and H. J. Briegel, “Machine supervised machine learning,” learning & artificial intelligence in the arXiv:2010.02174. quantum domain: a review of recent progress,” Reports on Progress in Physics [24] H.-Y. Huang, M. Broughton, M. Mohseni, 81, 074001 (2018). R. Babbush, S. Boixo, H. Neven, and J. R. [13] C. Zoufal, A. Lucchi, and S. Woerner, McClean, “Power of data in quantum arXiv:2011.01938 “Quantum Generative Adversarial Networks machine learning,” . for learning and loading random [25] C. S. Hamilton, R. Kruse, L. Sansoni, distributions,” npj Quantum Information 5, S. Barkhofen, C. Silberhorn, and I. Jex, 103 (2019). “Gaussian boson sampling,” Physical review [14] N. Killoran, T. R. Bromley, J. M. Arrazola, letters 119, 170501 (2017). M. Schuld, N. Quesada, and S. Lloyd, [26] M. Schuld, K. Brádler, R. Israel, D. Su, and “Continuous-variable quantum neural B. Gupt, “Measuring the similarity of networks,” Phys. Rev. Research 1, 033063 graphs with a Gaussian boson sampler,” (2019). Physical Review A 101, 032314 (2020). [15] E. Farhi and H. Neven, “Classification with [27] J. R. McClean, S. Boixo, V. N. Quantum Neural Networks on Near Term Smelyanskiy, R. Babbush, and H. Neven, Processors,” arXiv:1802.06002. “Barren plateaus in quantum neural

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 14 network training landscapes,” Nature [40] H. Buhrman, R. Cleve, J. Watrous, and Communications 9, (2018). R. De Wolf, “Quantum fingerprinting,” [28] M. Cerezo, A. Sone, T. Volkoff, L. Cincio, Physical Review Letters 87, 167902 (2001). and P. J. Coles, “Cost-Function-Dependent [41] U. Leonhardt, Essential Quantum Optics. Barren Plateaus in Shallow Quantum Cambridge University Press, Cambridge, Neural Networks,” arXiv:2001.00550. UK, 1st ed., 2010. [29] S. Wang, E. Fontana, M. Cerezo, [42] V. N. Vapnik, The Nature of Statistical K. Sharma, A. Sone, L. Cincio, and P. J. Learning Theory. Springer-Verlag, Berlin, Coles, “Noise-Induced Barren Plateaus in Heidelberg, 1995. Variational Quantum Algorithms,” [43] C. J. Burges, “A Tutorial on Support arXiv:2007.14384. Vector Machines for Pattern Recognition,” [30] H. J. Briegel, D. E. Browne, W. Dür, Data Mining and Knowledge Discovery 2, R. Raussendorf, and M. Van den Nest, 121–167 (1998). “Measurement-based quantum [44] G. James, D. Witten, T. Hastie, and computation,” Nature Physics 5, 19–26 R. Tibshirani, An Introduction to Statistical (2009). Learning: With Applications in R. Springer [31] E. Knill, R. Laflamme, and G. J. Milburn, Publishing Company, Incorporated, 2014. “A scheme for efficient quantum [45] A. Politi, J. C. Matthews, M. G. computation with linear optics,” Nature Thompson, and J. L. O’Brien, “Integrated 409, 46–52 (2001). quantum photonics,” IEEE Journal of [32] M. Schuld, “Supervised quantum machine Selected Topics in Quantum Electronics 15, learning models are kernel methods,” 1673–1684 (2009). arXiv:2101.11020. [46] M. Reck, A. Zeilinger, H. J. Bernstein, and [33] K. Mitarai, M. Negoro, M. Kitagawa, and P. Bertani, “Experimental realization of any K. Fujii, “Quantum circuit learning,” Phys. discrete unitary operator,” Physical review Rev. A 98, 032309 (2018). letters 73, 58 (1994). [34] B. M. Terhal and D. P. DiVincenzo, [47] A. E. Moylett and P. S. Turner, “Quantum “Classical simulation of simulation of partially distinguishable noninteracting-fermion quantum circuits,” boson sampling,” Phys. Rev. A 97, 062329 Physical Review A 65, 032325 (2002). (2018). [35] H. Pashayan, S. D. Bartlett, and D. Gross, [48] J. M. Arrazola, T. R. Bromley, J. Izaac, “From estimation of quantum probabilities C. R. Myers, K. Brádler, and N. Killoran, to simulation of quantum circuits,” “Machine learning method for state Quantum 4, 223 (2020). preparation and gate synthesis on photonic [36] W. Hoeffding, “Probability inequalities for quantum computers,” arXiv:1807.10781. sums of bounded random variables,” [49] K. Heya, Y. Suzuki, Y. Nakamura, and Journal of the American statistical K. Fujii, “Variational Quantum Gate association 58, 13–30 (1963). Optimization,” arXiv:1810.12745. [37] M. Van Den Nest, “Classical simulation of [50] H. Pashayan, J. J. Wallman, and S. D. quantum computation, the Gottesman-Knill Bartlett, “Estimating outcome probabilities theorem, and slightly beyond,” Quantum of quantum circuits using Information & Computation 10, 258–271 quasiprobabilities,” Physical review letters (2010), arXiv:0811.0898. 115, 070501 (2015). [38] D. Dieks, “Overlap and distinguishability of [51] S. Bravyi, D. Gosset, and R. Movassagh, quantum states,” Physics Letters A 126, “Classical algorithms for quantum mean 303–306 (1988). values,” arXiv:1909.11485. [39] M. Fanizza, M. Rosati, M. Skotiniotis, [52] N. Albert and S. Wilf Herbert, J. Calsamiglia, and V. Giovannetti, Combinatorial algorithms: for computers “Beyond the swap test: optimal estimation and calculators. Academic Press, 1978. of quantum state overlap,” Physical Review [53] L. Gurvits, “On the complexity of mixed Letters 124, 060503 (2020). discriminants and related problems,” in

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 15 International Symposium on Mathematical permanent,” arXiv:1212.0025. Foundations of Computer Science, [55] J. K. Percus, Combinatorial methods, vol. 4. pp. 447–458, Springer. 2005. Springer Science & Business Media, 2012. [56] R. García-Patrón, J. J. Renema, and [54] S. Aaronson and T. Hance, “Generalizing V. Shchesnovich, “Simulating boson and derandomizing Gurvits’s sampling in lossy architectures,” Quantum approximation algorithm for the 3, 169 (2019).

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 16 Appendix

A Classical probability estimation for Shor’s period-ﬁnding algorithm

In this section, we detail the classical probability estimation for Shor’s period-ﬁnding algorithm [2]. If N is an n bits integer to factor, the period-ﬁnding subroutine measures the output state

1 X X 2iπxy e N |yi |f(x)i (40) N x y in the computational basis, where f is a periodic function over {0,...,N − 1} which can be evaluated eﬃciently. The probability of obtaining an outcome y0, f(x0) is given by 2

1 X 2iπxy0 Pr [y0, f(x0)] = e N . (41) N f(x)=f(x0)

Now let  2iπxy0 e N if f(x) = f(x0), gx0,y0 : x 7→ (42) 0 otherwise.

The function gx0,y0 can be evaluated eﬃciently and we have 2

Pr [y0, f(x0)] = E [gx ,y (x)] , (43) x←N 0 0 where E denotes the expected value for x drawn uniformly randomly from {0,...,N −1}. By virtue x←N of Hoeffding inequality, this quantity may be estimated efficiently (in n, the number of bits of N) classically by sampling uniformly a polynomial number of values in {0,...,N − 1} and computing the modulus squared of the mean of gx0,y0 over these values. However, note that—the decision version of—probability estimation of quantum circuits is a BQP- complete problem almost by definition, since given a polynomially precise estimate of the probability of acceptance of an input x to a quantum circuit, one may determine whether it is accepted or rejected by the circuit. In particular, unless factoring is in P, probability estimation for the quantum circuit corresponding to Shor’s algorithm as a whole is hard for classical computers, and weak simulation of the period-finding subroutine is also hard, since in Shor’s algorithm the output samples from the period-finding subroutine are used for a different classical computation than probability estimation, namely obtaining promising candidates for the period.

B Eﬃciency of classical output probability estimation

For an interferometer U over m modes with n input photons and k adaptive measurements with outcomes p ∈ Φk,r, the expression of the probability of an outcome s obtained in the main text reads:

1 X 1 p 2 Prﬁnal[s] = Per U , (44) m,n s! p! (p,s),t p∈Φk,r

n m−n k+r−1 where t = (1 , 0 ). This is a sum over |Φk,r| = r moduli squared of permanents of square matrices of size n ≤ m. Permanents can be approximated eﬃciently using a simple randomized algorithm due to Gurvits: Lemma 2 ([53]). Let A be an m × m matrix. Then, Per A may be estimated classically with additive m m2 precision ±kAk with high probability, in time O( 2 ), where kAk is the largest singular value of A.

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 17 This algorithm has been reﬁned in the case of matrices with repeated lines or repeated columns:

Lemma 3 ([54]). Let B be an m × n matrix. Let q = (q1, . . . , qm) ∈ Φm,n and let A = Bq,1n be th the n × n matrix obtained from B by repeating qi times its i line. Then, Per A may be estimated classically with additive precision ±· √q1!...qm! kBkm with high probability, in time O( mn ), where kBk q1 qm 2 q1 ···qm is the largest singular value of B. Moreover, q ! . . . q ! | Per A| ≤ 1 m kBkm. (45) q q1 qm q1 ··· qm This lemma also allows to approximate eﬃciently | Per A|2. Indeed, let 0 < < 1 and let z be an √q1!...qm! m 2 2 estimate of Per A with additive error ± · q1 qm kBk , then |z| is a good estimate of | Per A| : q1 ···qm

2 2 |z| − | Per A| = ||z| − | Per A|| · (|z| + | Per A|)   q1! . . . qm! m q1! . . . qm! ≤ · kBk · | Per A| + · + | Per A| q q1 qm q q1 qm q1 ··· qm q1 ··· qm   (46) q1! . . . qm! m q1! . . . qm! q1! . . . qm! m ≤ · kBk ·  · + 2 kBk  q q1 qm q q1 qm q q1 qm q1 ··· qm q1 ··· qm q1 ··· qm 2 2 q1! . . . qm! 2m ≤ 3 · q1 qm kBk , q1 ··· qm where we used Eq. (45) in the third line. In particular, when B is an m×n submatrix of a unitary matrix (implying kBk ≤ 1), we may compute 2 an estimate of | Per A| , where A = Bq,1n is the n × n matrix obtained from B by repeating qi times 2 2 th q1! ...qm! mn its i line, with additive precision ± · q1 qm with high probability, in time O( 2 ). The matrices q1 ···qm in Eq. (44) are submatrices of unitary matrices, with repeated lines. Hence, estimating independently mn all the terms in the sum in Eq. (44), we obtain, in time O( 2 · |Φk,r|) and with high probability, an ˜ ﬁnal estimate P of the probability Prm,n [s] such that 1 1 p !2 . . . p !2s !2 . . . s !2 P˜ − ﬁnal[s] ≤ · X · 1 k 1 m−k Prm,n p1 pk s1 sm−k s1! . . . sm−k! p1! . . . pk! p ··· p s ··· s p∈Φk,r 1 k 1 m−k s1! . . . s ! p1! . . . p ! (47) ≤ · m−k X k ss1 ··· ssm−k pp1 ··· ppk 1 m−k p∈Φk,r 1 k

≤ · |Φk,r|, where we used that for all n ∈ N, n! ≤ nn. Note that a tighter bound may be obtained, but the above one is suﬃcient for our needs. In particular, when |Φk,r| = O(poly m), the above procedure ﬁnal provides a polynomially precise additive estimate of the probability Prm,n [s] in time O(poly m), with high probability. We have ! k + r − 1 |Φ | = k,r r (48) k (k + r)! = · . k + r k!r! This quantity is polynomial in k (resp. in r) when r = O(1) (resp. k = O(1)). Moreover, ! ! k + r − 1 k+r−1 k + r − 1 ≤ X r j j=0 (49) = 2k+r−1,

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 18 so for k = O(log m) and r = O(log m), we have |Φk,r| = O(poly m). For all other cases, i.e. k = O(log m) and r = O(m), or k = O(m) and r = O(log√ m), or k = O(m) and r = O(m), |Φk,r| is n n superpolynomial in m, using Stirling’s equivalent n! ∼ 2πn( e ) .

C Proof of Lemma1

We recall Lemma1 from the main text:

Lemma 1. Let r ∈ N. The inner product of two (not normalised) output states |ψpi and |ψqi of p q m-mode interferometers U and V with adaptive measurements outcome p, q ∈ Φk,r is given by

1 hψ |ψ i = √ X Per Ai Per Bj Per Ci,j , (50) p q p!q! i,j∈{0,1}n |i|=|j|=r where for all i, j ∈ {0, 1}n such that |i| = |j| = r,

i p† A = U(i,0m−n),(p,0m−k) (51) is an r × r matrix which can be obtained eﬃciently from U p,

j q B = V(q,0m−k),(j,0m−n) (52) is an r × r matrix which can be obtained eﬃciently from V q, and

i,j p† q C = U(1n−i,0m−n),(0k,1m−k)V(0k,1m−k),(1n−j,0m−n) (53) is an (n − r) × (n − r) matrix which can be obtained eﬃciently from U p and V q.

Proof. Consider the expression for the inner product obtained in Eq. (34):

1 1 hψ |ψ i = √ X Per U p† Per V q . (54) p q p!q! s! t,(p,s) (q,s),t s∈Φm−k,n−r

∗ It is reminiscent of the permanent composition formula [55]: for all m, n, c ∈ N , all s ∈ N, all u ∈ Φm,s and all v ∈ Φn,s, 1 Per [(MN) ] = X Per (M ) Per (N ) (55) u,v s! u,s s,v s∈Φc,s where M is a m×c matrix and N is a n×c matrix. However, this formula is not directly applicable to the expression in Eq. (54). In order to obtain a suitable expression, we ﬁrst make use of the Laplace p† formula for the permanent: we expand the permanent of Ut,(p,s) along the columns that are repeated q according to p and we expand the permanent of V(q,s),t along the rows that are repeated according to q. The generalised Laplace column expansion formula for the permanent reads: let n ∈ N∗, let W be an n × n matrix, and let j ∈ {0, 1}n. Then, X Per (W ) = Per (Wi,j) Per (W1n−i,1n−j), (56) i∈{0,1}n |i|=|j|

th th where Wi,j is the matrix obtained from W by keeping only the k rows and l columns such that th ik = 1 and jl = 1, respectively, and W1n−i,1n−j is the matrix obtained from W by keeping only the k th rows and l columns such that ik = 0 and jl = 0, respectively. This formula is obtained by applying the Laplace expansion formula for one column various times, for each column with index l such that jl = 1, and the same formula holds for rows.

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 19 p† We first apply the general column expansion formula in Eq. (56) to the matrix Ut,(p,s) with j = (1r, 0n−r) ∈ {0, 1}n, obtaining Per U p† = X Per U p† Per U p† . (57) t,(p,s) t,(p,s) i,j t,(p,s) 1n−i,1n−j i∈{0,1}n |i|=r Let us consider the matrix U p† appearing in this last expression, for i ∈ {0, 1}n. Its rows are t,(p,s) i,j obtained by keeping the first n lines of U p† since t = (1n, 0m−n), then by keeping only the lth rows th such that il = 1. Its columns are obtained by repeating pl times the l column for l ∈ {1, . . . , k} and r n−r sl times for l ∈ {k +1, . . . , m}, then by only keeping the first r columns since j = (1 , 0 ). However, since |p| = |j| = r, these are the columes repeated according to p. Hence,

p† p† U = U m−n m−k , (58) t,(p,s) i,j (i,0 ),(p,0 )

p† p† th where U(i,0m−n),(p,0m−k) is the matrix obtained from U by keeping only the l rows such that il = 1 th and removing the others, and by repeating pl times the l column for l ∈ {1, . . . , k} and removing the others. Similarly, with |s| = |1n − j| = n − r,

p† p† U = U n m−n k , (59) t,(p,s) 1n−i,1n−j (1 −i,0 ),(0 ,s)

p† p† th where U(1n−i,0m−n),(0k,s) is the matrix obtained from U by keeping only the l rows such that il = 0 th and removing the others, and by repeating sl times the l column for l ∈ {k +1, . . . , m} and removing the others. With Eqs. (57), (58) and (59) we obtain

p† X p† p† Per Ut,(p,s) = Per U(i,0m−n),(p,0m−k) Per U(1n−i,0m−n),(0k,s) i∈{0,1}n |i|=r (60) X i p† = Per A Per U(1n−i,0m−n),(0k,s) , i∈{0,1}n |i|=r where we have deﬁned, for all i ∈ {0, 1}n such that |i| = r,

i p† A := U(i,0m−n),(p,0m−k), (61) which is an r × r matrix independent of s that can be obtained eﬃciently from U p. q The same reasoning with the general row expansion formula for the matrix V(q,s),t and the rows i = (1r, 0n−r) gives Per V q = X Per V q Per V q (q,s),t (q,s),t i,j (q,s),t 1n−i,1n−j j∈{0,1}n |j|=r (62) X q q = Per V(q,0m−k),(j,0m−n) Per V(0k,s),(1n−j,0m−n) , j∈{0,1}n |j|=r

q q th where V(q,0m−k),(j,0m−n) is the matrix obtained from V by repeating ql times the l row for l ∈ th {1, . . . , k} and removing the others and by keeping only the l columns such that jl = 1, and where q q th V(0k,s),(1n−j,0m−n) is the matrix obtained from V by repeating sl times the l row for l ∈ {k+1, . . . , m} th and removing the others and by keeping only the l columns such that jl = 0. Deﬁning, for all j ∈ {0, 1}n such that |j| = r, j q B := V(q,0m−k),(j,0m−n), (63)

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 20 the expression in Eq. (62) rewrites

q X j q Per V(q,s),t = Per B Per V(0k,s),(1n−j,0m−n) , (64) j∈{0,1}n |j|=r where Bj are r × r matrices independent of s and can be obtained eﬃciently from V q. Plugging Eqs. (60) and (64) in Eq. (54) we obtain

" 1 hψ |ψ i = √ X Per Ai Per Bj p q p!q! i,j∈{0,1}n |i|=|j|=r (65) # 1 × X Per U p† Per V q . s! (1n−i,0m−n),(0k,s) (0k,s),(1n−j,0m−n) s∈Φm−k,n−r

The sum appearing in the second line may now be expressed as a single permanent using the permanent composition formula: for all i, j ∈ {0, 1}n such that |i| = |j| = r, let us deﬁne the (n − r) × (m − k) matrix ˜ p,i p† U := U(1n−i,0m−n),(0k,1m−k), (66) and the (m − k) × (n − r) matrix

˜ q,j q V := V(0k,1m−k),(1n−j,0m−n), (67) so that p† ˜ p,i q ˜ q,j U(1n−i,0m−n),(0k,s) = U1n−r,s and V(0k,s),(1n−j,0m−n) = Vs,1n−r . (68) With the permanent composition formula in Eq. (55) we obtain X 1 p† q ˜ p,i ˜ q,j Per U n m−n k Per V k n m−n = Per U V . (69) s! (1 −i,0 ),(0 ,s) (0 ,s),(1 −j,0 ) 1n−r,1n−r s∈Φm−k,n−r

Since U˜ p,iV˜ q,j is an (n − r) × (n − r) matrix we thus have

1 X Per U p† Per V q = Per U˜ p,iV˜ q,j . (70) s! (1n−i,0m−n),(0k,s) (0k,s),(1n−j,0m−n) s∈Φm−k,n−r

Then, Eq. (65) rewrites

1 hψ |ψ i = √ X Per Ai Per Bj Per Ci,j , (71) p q p!q! i,j∈{0,1}n |i|=|j|=r where we have deﬁned

Ci,j := U˜ p,iV˜ q,j p† q (72) = U(1n−i,0m−n),(0k,1m−k)V(0k,1m−k),(1n−j,0m−n), which is an (n − r) × (n − r) matrix that can be computed eﬃciently from U p and V q.

Accepted in Quantum 2021-07-02, click title to verify. Published under CC-BY 4.0. 21