Quantum algorithms for jet clustering

The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters.

Citation Wei, Annie Y. et al. “Quantum algorithms for jet clustering.” Physical Review D, 101, 9 (May 2020): 094015 © 2020 The Author(s)

As Published 10.1103/PHYSREVD.101.094015

Publisher American Physical Society (APS)

Version Final published version

Citable link https://hdl.handle.net/1721.1/128245

Terms of Use Creative Commons Attribution 4.0 International license

Detailed Terms https://creativecommons.org/licenses/by/4.0/ PHYSICAL REVIEW D 101, 094015 (2020)

Quantum algorithms for jet clustering

† ‡ Annie Y. Wei,1,* Preksha Naik ,1, Aram W. Harrow ,1, and Jesse Thaler 1,2,§ 1Center for Theoretical Physics, Massachusetts Institute of Technology, Cambridge, Massachusetts 02139, USA 2Department of Physics, Harvard University, Cambridge, Massachusetts 02138, USA

(Received 10 September 2019; revised manuscript received 28 February 2020; accepted 15 April 2020; published 14 May 2020)

Identifying jets formed in high-energy particle collisions requires solving optimization problems over potentially large numbers of final-state particles. In this work, we consider the possibility of using quantum computers to speed up jet clustering algorithms. Focusing on the case of electron-positron collisions, we consider a well-known event shape called thrust whose optimum corresponds to the most jetlike separating plane among a set of particles, thereby defining two hemisphere jets. We show how to formulate thrust both as a problem and as a Grover search problem. A key component of our analysis is the consideration of realistic models for interfacing classical data with a . With a sequential computing model, we show how to speed up the well-known OðN3Þ classical algorithm to an OðN2Þ quantum algorithm, including the OðNÞ overhead of loading classical data from N final-state particles. Along the way, we also identify a way to speed up the classical algorithm to OðN2 log NÞ using a sorting strategy inspired by the SISCone jet algorithm, which has no natural quantum counterpart. With a parallel computing model, we achieve OðN log NÞ scaling in both the classical and quantum cases. Finally, we consider the generalization of these quantum methods to other jet algorithms more closely related to those used for proton-proton collisions at the Large Hadron Collider.

DOI: 10.1103/PhysRevD.101.094015

I. INTRODUCTION high-energy physics. Our main results are summarized in Table I, where the computational scaling is given for N Jets are collections of collimated, energetic hadrons particles in the final state. We show how to improve the well- formed in high-energy particle collisions. With an appro- O N3 O N2 priate choice of jet clustering algorithm [1], jets are a robust known ð Þ classical algorithm [2] to an ð Þ quantum probe of quantum chromodynamics and a useful proxy for algorithm, which includes the cost of loading the classical determining the kinematics of the underlying hard scatter- data into a sequential architecture. ing process. The problem of identifying jets from collision On the other hand, we also show how to speed up the O N2 N data is a nontrivial task, however, since the jet clustering classical algorithm to ð log Þ, using a clever must be matched to the physics question of strategy from Ref. [3], which matches the quantum perfor- N interest. Moreover, it is a computationally intensive task, as mance up to log factors. Finally, using parallel computing O N N it often involves performing optimizations over potentially architectures, we achieve ð log Þ scaling in both the large numbers of final-state particles. classical and quantum cases, albeit for very different com- In this paper, we consider the possibility of using quantum putational reasons. computers to speed up jet identification. We focus on the Quantum algorithms have been shown to achieve speed- well-known problem of partitioning an electron-positron ups over classical algorithms [4], resulting, in theory, in collision event into two hemisphere jets, though our time savings which are even more pronounced over large results are relevant for other optimization problems beyond datasets. That said, many proposed quantum algorithms for machine learning tasks often omit considerations that would be needed to actually implement them in practice, *[email protected] † such as a strategy to interface classical data with a quantum [email protected][email protected] computing architecture. One solution is to assume the §[email protected] availability of qRAM [5], which would let our quantum computer access a classical dataset in superposition; Published by the American Physical Society under the terms of however, this additional hardware requirement may not the Creative Commons Attribution 4.0 International license. Further distribution of this work must maintain attribution to be easy to implement in practice. Here, we consider the author(s) and the published article’s title, journal citation, realistic applications of both quantum annealing [6–8] and DOI. Funded by SCOAP3. and Grover search [9–11] to jet finding, including the

2470-0010=2020=101(9)=094015(20) 094015-1 Published by the American Physical Society WEI, NAIK, HARROW, and THALER PHYS. REV. D 101, 094015 (2020)

TABLE I. Summary of classical and quantum thrust algorithms, where the asymptotic scaling is for a single collision event with N particles. All strategies have a classical space overhead of OðNÞ bits for read access to the classical data. The classical sorting strategies also require write access to OðN log NÞ bits. For ease of exposition throughout, we treat each real number as being specified to a constant Oð1Þ bits of precision.

Implementation Time usage usage Sections Classical [2] OðN3Þ … Sec. III A Classical with sort (using [3]) OðN2 log NÞ … Sec. III C Classical with parallel sort OðN log NÞ … Sec. III D Quantum annealing Gap Dependent OðNÞ Sec. IV Quantum search: sequential model OðN2Þ Oðlog NÞ Sec. VC Quantum search: parallel model OðN log NÞ OðN log NÞ Sec. VE

OðNÞ overhead of loading classical collision data into the Because collider data are classical (and will likely remain quantum computer. so for the foreseeable future), understanding the limitations The specific jet finding algorithm we use is based on imposed by data loading is essential to evaluate the thrust [12–14]. Thrust is an event shape widely measured in potential of quantum algorithms to speed up or improve electron-positron collisions [15–27]. The optimum value of data analysis pipelines. At the same time, it is important to thrust defines the most jetlike separating plane among a set assess potential classical improvements to existing collider of final-state particles, thereby partitioning the event into algorithms, and the sorting strategy of Ref. [3] is an two hemisphere jets. Algorithmically, it poses an interest- important example of how new classical strategies can ing problem because it can be viewed in various equivalent sometimes match the gains from quantum computation. ways—such as a partitioning problem or as an axis-finding Turning now to an extended outline of this paper, our problem—which in turn lead to different algorithmic quantum algorithms build on existing classical strategies to strategies. compute thrust. In Sec. II, we define thrust in its various We note that practical thrust computations typically equivalent manifestations, as both a partitioning problem involve only 10–1000 particles per event, so the current and an axis-finding problem. Then, in Sec. III, we review OðN3Þ classical algorithm [2] is certainly adequate to the classical algorithms for computing thrust based on a search task. That said, more efficient jet algorithms are of general over reference axes. As already mentioned, the best known 3 interest, for example, in the context of active area calcu- result in the literature requires OðN Þ time [2]. We show lations [28], which can involve up to millions of ghost how to improve it to OðN2 log NÞ using a sorting strategy particles. We also note that the current default jet algorithm inspired by SISCone [3], which appears to have no quantum at the Large Hadron Collider (LHC) is anti-kt [29], which analog (see Sec. VD). already runs in OðN log NÞ time [30,31], and it is unlikely The first quantum method we consider in Sec. IV that any quantum algorithm can yield a sublinear improve- involves formulating thrust as a quadratic unconstrained ment. On the other hand, anti-kt is a hierarchical clustering binary optimization (QUBO) problem, which can then be algorithm (i.e., a heuristic), whereas thrust is a global solved via quantum annealing [6,7]. This comes from optimization problem, and there are phenomenological viewing thrust as a partitioning problem and then consid- contexts where global jet optimization could potentially ering the brute force enumeration of all candidate parti- yield superior physics performance [32,33]; see also tions. See Refs. [55,56] for other studies of quantum Refs. [34–47]. Jet finding via global optimization has annealing for clustering with unique assignment. not seen widespread adoption, in part because of the The core results of this paper are in Sec. V, where we computational overhead, and we hope the quantum and describe quantum algorithms for computing thrust based on improved classical algorithms developed here spur more Grover search [9]. Although naively Grover search offers a research on alternative jet finding strategies. square root speedup over any classical search algorithm, in Beyond the specific applications to jet finding, we practice Grover search cannot yield sublinear algorithms. believe that the broader question of identifying realistic The reason is that data loading over a classical database of quantum algorithms for optimization problems should be of size N requires OðNÞ time, which limits the achievable interest to both the particle physics and quantum computing gains. That said, if the classical search space scales like communities. Indeed, we regard thrust as a warm-up OðNαÞ, we can still use the Grover strategy to reduce the problem for the more general development of quantum search loop to OðNα=2Þ, though there will be an additional algorithms for collider data analysis. (For other quantum additive (multiplicative) factor of OðNÞ if data loading has algorithms for collider physics, see Refs. [48,49] for Higgs to happen outside (inside) of the loop. Using the formu- boson identification, Refs. [50,51] for parton shower lation of thrust as a search over reference axes, we show generation, and Refs. [52–54] for track reconstruction.) that α ¼ 2 in the thrust case. Thus, we can attribute our

094015-2 QUANTUM ALGORITHMS FOR JET CLUSTERING PHYS. REV. D 101, 094015 (2020) speedup to the fact that data loading is performed in [32,33]. We conclude in Sec. VIII with some broader superposition, which means that it still requires only lessons about quantum algorithms for collider physics. OðNÞ time despite working over a search space of O N2 size ð Þ. II. DEFINITION OF THRUST The precise speedup achievable in our Grover search strategy depends on the assumed quantum computing We start by defining thrust [12–14], noting that it has paradigm. We implement two models for retrieving and multiple equivalent definitions that suggest different algo- processing the classical data, based on the abstract oper- rithmic strategies, as shown in Fig. 1. Thrust can be viewed ations LOOKUP and SUM. The sequential computing as a partitioning problem, which lends itself naturally to model requires O˜ ð1Þ and results in an OðN2Þ thrust quantum annealing. Thrust can alternatively be viewed as algorithm. Here we use O˜ ð·Þ to mean that we neglect factors an axis-finding problem, which we can frame as a quantum that are polylog in N. The parallel computing model search problem. Both definitions of thrust can be stated in requires O˜ ðNÞ qubits and results in an OðN log NÞ thrust terms of operator norms, and through this lens, they are in algorithm. Both computing models are applicable to any fact dual to each other. general problem where the size of the search space scales like OðNαÞ with α ≥ 2, which are precisely the problems A. Thrust as a partitioning problem that can typically be sped up with a realistic application of Consider a set of N three-momenta fp⃗ig in their center- Grover search. x y z of-momentum frame, where p⃗i ¼fpi ;pi ;pi g, In Sec. VI, we assess whether or not there is any quantum advantage for hemisphere jet finding. Formally, XN if one has read access to OðNÞ classical bits but only write p⃗i ¼ 0: ð1Þ access to Oðlog NÞ bits, then one cannot implement the i¼1 classical sorting strategy in Sec, III C. In that case, there is a quantum advantage for both sequential and parallel com- An intuitive formulation of thrust (though not exactly the puting models. With write access to OðN log NÞ classical original one [12,13]) is to separate the particles into a H ∪ H bits, though, classical sorting is possible, and the asymp- partition L R such that momenta on each side are as “ totic performance of our classical and quantum algorithms pencil-like" as possible. That is, we seek to maximize the is identical (up to log N factors) in both the sequential and quantity parallel cases. This equivalence appears to be special to P P 2 p⃗ 2 p⃗ j i∈HL ij j i∈HR ij algorithms like thrust where the search space scales like TðHLÞ¼ P ¼ P ; ð2Þ 2 N p⃗ N p⃗ OðN Þ, and we speculate that larger search spaces might i¼1 j ij i¼1 j ij benefit from Grover speedups even if classical sorting is possible. where the second equality follows from momentum con- “ ” Finally, in Sec. VII, we briefly consider generalizations servation. The quantity known as thrust corresponds to of our results to jet algorithms more closely related to those the maximum obtainable value, used at the LHC. We consider jet function maximization – T ¼ maxTðHLÞ: ð3Þ [43 45], showing that, with suitable modifications, it can HL be written in QUBO form for quantum annealing. We consider stable cone finding in the spirit of SISCone [3], The factor of 2 in Eq. (2) is conventional such that showing how a single-jet variant we dub SINGLECONE is 1=2 ≤ T ≤ 1, where T ¼ 1 corresponds to a perfectly amenable to quantum search. We also comment on quan- pencillike back-to-back configuration and T ¼ 1=2 is an tum multijet finding motivated by the XCone algorithm isotropic event.

FIG. 1. Two equivalent definitions of thrust as (left) a partitioning problem and (right) an axis-finding problem. The best known classical algorithm is based on plane partitioning via a reference axis rˆ (which in general differs from the thrust axis).

094015-3 WEI, NAIK, HARROW, and THALER PHYS. REV. D 101, 094015 (2020)

There is an equivalent geometric formulation of Eq. (2) where the Heaviside theta function picks out particles in due to Ref. [57]. Consider sequentially summing the three- just one hemisphere. momenta fp⃗ig to form a closed polygon. Each sequence Note that the optimal partitioning plane is not unique, yields a different polygon, and computing thrust is equiv- since there can be multiple planes that yield the same alent to maximizing twice the diagonal of the polygon over partition. We can exploit this fact to find a computationally all possible polygons, normalized by the circumference of convenient partitioning plane, defined by a normal refer- the polygon. The diagonal splits the polygon into two ence axis rˆ. This reference axis will in general be different nˆ halves, which yield the partition HL ∪ HR. The particles in from the thrust axis opt but nevertheless yield the same HL are said to be in the “left hemisphere jet" and the value of thrust via Eq. (2). Specifically, once the optimal particles in HR are said to be in the “right hemisphere jet.” partition is known via a reference axis, the thrust axis can be This definition immediately suggests a naive, brute-force determined from the total three-momentum in the left classical strategy for computing thrust. We can enumerate hemisphere, all Oð2NÞ possible partitions (which can be reduced to P N−1 p⃗i Oð2 Þ using momentum conservation), and then we sum nˆ Pi∈HL : opt ¼ p⃗ ð8Þ the momenta in each to determine the maximum, resulting j i∈HL ij in an OðN2NÞ algorithm. This is the version of thrust we will use for the quantum annealing formulation in Sec. IV, We will use this reference axis approach for the classical which corresponds to attacking the problem using quantum thrust algorithms in Sec. III and for the quantum search brute force. strategies in Sec. V. C. Duality of thrust definitions B. Thrust as an axis-finding problem Using the formalism of operator norms, we can show that An alternative definition of thrust is as an axis-finding these two definitions of thrust are in fact dual to each other. problem, which is a bit closer to the historical definition m Let M∶V → W be a map from V ¼ R with norm k · kα [12,13]. Let nˆ be a unit norm vector and define n to W ¼ R with norm k · kβ. The operator norm of M, P N nˆ p⃗ known as the induced α-to-β norm, is defined as T nˆ Pi¼1 j · ij : ð Þ¼ N p⃗ ð4Þ i¼1 j ij kMkα→β ≡ max kMvkβ: ð9Þ kvkα¼1 Thrust can then be determined by the maximum value of That is, we search over all vectors v in V with norm 1 and TðnˆÞ over nˆ, find the maximum norm for the vector Mv in W. The case when α and β are both the usual L2 norm corresponds to the T ¼ maxTðnˆÞ: ð5Þ jnˆj¼1 largest singular value of M, but in general kMkα→β can be NP hard to estimate [58]. (Here NP is the class of problems The optimal nˆ is known as the thrust axis, whose solution can be verified in polynomial time, so an NP hard problem is one that is at least as difficult as the nˆ ≡ T nˆ : opt argmax ð Þ ð6Þ hardest problem in NP.) By duality, we can rewrite this as jnˆj¼1 T T max kMvkβ ¼ max kM ykα ¼kM kβ→α; ð10Þ v 1 y 1 To gain some intuition for why Eqs. (3) and (5) are k kα¼ k kβ¼ nˆ equivalent, note that once we find the thrust axis opt,we y W W nˆ p⃗ > 0 where is in , the vector space dual to , defined as can partition the particles into those with opt · i and W Rn α β ¼ with dual norm k · kβ. Thus, the -to- norm of those with nˆ · p⃗i < 0. (It is an interesting bit of computa- opt M is the same as the β-to-α norm of MT. tional geometry to show that nˆ · p⃗i can never be exactly opt In the context of thrust, we are interested in the following zero for a finite number of particles.) Said another way, the v ∈ Rn nˆ norms for a vector : plane normal to opt partitions the event into left and right X hemispheres. Starting from a nonhemisphere partition, it is kvk1 ¼ jvij; ð11Þ always possible to increase the value of thrust in Eq. (2) by i flipping a particle from one side to the other, so the optimal rXffiffiffiffiffiffiffiffiffiffiffiffi 2 partition will be defined by a plane. Because of this kvk2 ¼ vi ; ð12Þ equivalence between axis finding and plane partitioning, i the thrust objective is sometimes written as kvk∞ ¼ maxjvij: ð13Þ P i N 2 Θ nˆ · p⃗i nˆ · p⃗i T nˆ i¼1Pð Þð Þ ; These are known, respectively, as the one-norm, two-norm, ð Þ¼ N p⃗ ð7Þ i¼1 j ij and sup-norm. By Hölder’s inequality, the space of vectors

094015-4 QUANTUM ALGORITHMS FOR JET CLUSTERING PHYS. REV. D 101, 094015 (2020) endowed with the p norm is dual to the space of vectors OðP;⃗ nˆÞ¼nˆ · P⃗þ λðnˆ 2 − 1Þ; ð17Þ 1 1 endowed with the q norm, where p þ q ¼ 1. In particular, the one-norm is dual to the sup-norm, and the two-norm is where λ is a Lagrange multiplier to enforce that the axis nˆ dual to itself. has unit norm. At this point, P⃗ and nˆ are completely Now consider the matrix Mij ¼ðp⃗iÞj, whose rows are independent entities, and nˆ does not play any role in the N three-momenta and whose columns are the px, py, determining the partition H. and pz components. This is a map from R3 to RN. Letting For fixed P⃗, we can optimize OðP;⃗ nˆÞ over nˆ, α ¼ 2 and β ¼ 1, the induced 2-to-1 norm of M is P⃗ nˆ : XN opt ¼ ð18Þ jP⃗j kMk2→1 ¼ max kMnˆk1 ¼ max jnˆ · p⃗ij: ð14Þ nˆ 1 2 k k2¼ nˆ ¼1 i 1 ¼ Plugging this into Eq. (17) yields We recognize the last term as the numerator of TðnˆÞ in O P⃗ ≡ O P;⃗ nˆ P⃗; Eq. (4). Since the denominator of TðnˆÞ is independent of nˆ, ð Þ ð optÞ¼j j ð19Þ this is equivalent to the definition of thrust via axis finding in Sec. II B. Thus, thrust takes the form of an induced 2-to-1 which is (half) of the thrust numerator in Eq. (2). nˆ O P;⃗ nˆ P⃗ norm problem. For fixed , we can optimize ð Þ over (or H By duality, with β ¼ ∞, thrust can alternatively be equivalently, over the partition ), viewed as a sup-to-2 norm problem, XN ⃗ P ¼ Θðnˆ · p⃗iÞp⃗i: ð20Þ XN opt T i¼1 kM k∞→2 ¼ max ksMk2 ¼ max sip⃗i : ð15Þ ksk∞¼1 si∈f−1;þ1g i¼1 2 Plugging this into Eq. (17) yields

This corresponds to the definition of thrust via partitioning XN si −1 O nˆ ≡ O P⃗ ; nˆ Θ nˆ p⃗ nˆ p⃗ ; in Sec. II A, since setting ¼ denotes flipping the ð Þ ð opt Þ¼ ð · iÞð · iÞ ð21Þ orientation of vector p⃗i relative to the partitioning plane, i¼1 while setting si ¼ 1 retains the orientation of p⃗i. Therefore, we see that the problem of computing thrust which is (half) of the thrust numerator in Eq. (7). in particle physics is in fact a special instance of the more Since the order of optimization is irrelevant to the final general problem of computing induced matrix norms. optimum, this again shows that the two thrust definitions While there exist choices of α and β for which efficient are dual. Either way, the maximum value of the objective algorithms for computing kMkα→β exist for arbitrary M,it function will be is believed that the general problem of computing the OðP⃗ ; nˆ Þ¼jP⃗ j; ð22Þ induced 2-to-1 norm and that of computing the induced opt opt opt ∞-to-2 norm are both NP hard [59–61]. This suggests that which, following Ref. [57], is just the maximum achievable thrust is an excellent test bed to explore possible gains from polygon diagonal. quantum computation. III. CLASSICAL ALGORITHMS D. Alternative duality derivation We now describe the best known classical algorithm O N3 There is alternative language to understand this thrust for thrust in the literature, which requires ð Þ time, and O N2 N duality that will be useful for the generalizations in then show how it can be improved to ð log Þ using a Sec. VII. This approach is based on Ref. [62], which sorting technique from Ref. [3]. We start by assuming a showed that different jet finding strategies can sometimes sequential classical computing model in this section and be derived from a common metaoptimization problem. end with a brief discussion of parallel classical computing. Consider a partition H (not necessarily defined by a plane) with total three-momentum, A. Plane partitioning via a reference axis X The best known classical thrust algorithm [2] uses the ⃗ 1 P ¼ p⃗i: ð16Þ reference axis approach discussed at the end of Sec. II B. i∈H This is the thrust algorithm implemented in PYTHIA as of

Our analysis is based on the following objective function 1Strangely, Ref. [2] claims OðN2Þ usage, which only includes that depends on both a choice of partition and a choice the number of partitions to check, not the computation of of axis: thrust itself.

094015-5 WEI, NAIK, HARROW, and THALER PHYS. REV. D 101, 094015 (2020) version 8 [63].2 The key realization is that, because of partition, leading to the OðN3Þ scaling. In Sec. III C, we can ⃗ Eq. (8), one only needs to search over inequivalent plane improve on this runtime by iteratively updating Pij in a partitions. Two particles are sufficient to determine a special order. separating plane, so there are OðN2Þ inequivalent plane partitions to consider. For each partition, determining 3 B. Doubling trick TðHLÞ takes OðNÞ, leading to an OðN Þ algorithm. To simplify the thrust algorithm, it is convenient to More specifically, for each pair of particles p⃗i and p⃗j, artificially double the number of particles. Starting from N one determines a reference axis rˆij normal to the plane 2N spanned by them, three-momenta, we create a list of length by including both p⃗k and its negative −p⃗k. Because p⃗k and −p⃗k can p⃗i × p⃗j never be in the same hemisphere, and because of the rˆij ≡ : ð23Þ momentum conservation relation in Eq. (2), this doubling jp⃗i × p⃗jj trick has no effect on the value of thrust. It does, however, Then, each particle p⃗k is either assigned to the hemisphere provide us with a convenient way to deal with the fourfold Hij if rˆij · p⃗k > 0 or ignored if rˆij · p⃗k < 0. Cases where ambiguity above, since we can now define the hemisphere H i j rˆij · p⃗k ¼ 0 are ambiguous, and we provide a general ij to always include particle and particle . strategy to deal with this in Sec. III B below. At minimum, To deal with cases where rˆij · p⃗k ¼ 0 (i.e., any time three we have to treat the cases where k ¼ i or j, which requires or more particles are coplanar), we offset the reference testing 2 × 2 ¼ 4 possibilities for whether or not p⃗i and/or axis by p⃗j should be included in Hij, for a total of 4NðN − 1Þ p⃗ p⃗ partitions. (This can be reduced by a factor of 2 using rˆ → rˆ ϵq⃗ ; q⃗ ≡ i j ; ij ij þ ij ij ⃗ þ ⃗ ð26Þ momentum conservation, since rˆij and rˆji define the same jpij jpjj hemispheres.) The final hemisphere jets are determined by the partition that maximizes and then take the formal ϵ → 0 limit. Specifically, if rˆij ·p⃗k ¼ 0, then particle p⃗k is included in Hij if q⃗ij ·p⃗k > 0 Tij ≡ TðHijÞ: ð24Þ and ignored otherwise. Crucially, Eq. (26) ensures that p⃗i and p⃗j are always in the 2 Note that, in general, none of the OðN Þ reference axes hemisphere Hij,but−p⃗i and −p⃗j are not. (One has to be considered will align with the actual thrust axis. mindful of the pathological situation where p⃗i and p⃗j are rˆ nˆ Nevertheless, the partitions defined by opt and opt will exactly antiparallel, though in this case, thrust is determined be identical. (In the idealized case of infinitesimal radiation by one of the other hemisphere partitions.) The hemisphere everywhere in the event, all possible separating planes three-momentum is now would be considered, so rˆ would then equal nˆ .) Once opt opt X the optimal partition is known, the thrust axis itself is ⃗ 1 Pij ¼ p⃗k; ð27Þ determined by Eq. (8). 2 k∈Hij In terms of computational complexity, for a fixed hemi- sphere Hij, it takes OðNÞ time to compute the hemisphere 1 where the factor of 2 compensates for the artificial doubling. three-momentumP in Eq. (16). The thrust denominator T N p⃗ O N We will use this doubling trick repeatedly in this paper, denom ¼ i¼1 j ij also takes ð Þ time, but it can be though not for quantum annealing in Sec. IV where it is precomputed since it is independent of the partition. Once counterproductive. To simplify the description of the P⃗ T O 1 ij and denom are known, though, it only takes ð Þ time algorithms, we will leave implicit the treatment of all to determine the value of Tij, rˆij · p⃗k ¼ 0 cases via Eq. (26). It is worth mentioning that 2 P⃗ an alternative way to deal with coplanar configurations is to T j ijj ; offset the momenta by a small random amount, but we find ij ¼ T ð25Þ denom the doubling trick to be more convenient in practice since it where we used Eq. (8) to derive this expression. For the best avoids the fourfold ambiguity automatically. known classical algorithm, there are OðN2Þ partitions, and we have to do an OðNÞ computation of TðHijÞ for each C. Improvements via sort The OðN3Þ algorithm can be further improved to run in 2 O N2 N 3 Version 6 of PYTHIA [64] uses a heuristic to approximate time ð log Þ. This can be achieved using a strategy thrust, via an iterative procedure that updates the partition starting from SISCone [3] which uses a clever choice of traversal from seed axes. While this method converges very quickly, it only order. Note that SISCone is intended for proton-proton finds a local maximum, not the global one [57], though this may be sufficient for practical applications. See related discussion in Ref. [32]. 3We thank Gregory Soyez for discussions related to this point.

094015-6 QUANTUM ALGORITHMS FOR JET CLUSTERING PHYS. REV. D 101, 094015 (2020)

To see this, it is convenient to transform to a coordinate system where p⃗i points in the z direction, i.e., p⃗i ¼ jp⃗ijð0; 0; 1Þ. For any j ≠ i,wecanwritep⃗j in spherical coordinates as p⃗j ¼jp⃗jjðsin θj cos ϕj; sin θj sin ϕj; cos θjÞ, where θj is the polar angle and ϕj is the azimuthal angle. Then rˆij ¼ð− sin ϕj; cos ϕj; 0Þ, so the partition is indeed determined by the single parameter ϕj, independent of θj. Specifically, particle k is in hemisphere Hij if

rˆij · pˆ k ¼ sin θk sinðϕk − ϕjÞð28Þ

is positive. This implies that 0 < ϕk − ϕj < π, where azi- muthal angle differences are calculated modulo 2π. Furthermore, because of the doubling trick, there is a simple way to determine which particles are in the partition. With the doubling, there are 2N possible choices for i, and FIG. 2. Illustration of the sorting algorithm around the p⃗i axis −p⃗ (seen from the top down). The dashed vectors correspond to the by Eq. (26) we know that the doubler i cannot be in the doubling trick. As the blue partitioning plane sweeps in azimuth, same partition as p⃗i. Using the above coordinate system, the hemisphere momentum is updated according to Eq. (30). we can sort the remaining 2N − 2 vectors according to their ϕ 0 ≤ ϕ ≤ ϕ ≤ ≤ ϕ < 2π coordinates, so that j1 j2 j2N−2 . (In cases where two particles happen to have identical collisions, whereas our interest here is in electron-positron values of ϕj, their relative ordering does not matter for the collisions, but the same basic strategy still applies. argument below, as long as the doublers are also put in the The goal of SISCone is to find conical jet configurations J a R same order.) Crucially, for a particle at position in this where the enclosed particles are within a distance from sorted list, its doubler (which is π away in azimuth) must be the cone axis nˆ J. Moreover, these cone jets must be stable, P at position a þ N − 1. To see why, note that a hemisphere P⃗ p⃗ meaning that the jet three-momentum J ¼ i∈J i is either contains a particle or contains its doubler, so there ˆ aligned with the cone axis nJ. Like thrust, SISCone involves must be exactly N particles in each hemisphere. Particle i is solving a partitioning problem where the naive brute-force N already accounted for, meaning that any candidate partition approach requires OðN2 Þ time. Like for thrust, one can must contain N − 1 entries from the sorted list. Since the 3 reduce the naive runtime for SISCone to OðN Þ using the fact sorted list is ordered by azimuth, and since the partitioning that two points lying on the circumference of a circle are is determined by azimuth alone via Eq. (28), the N − 1 sufficient to determine the cone constituents. There is an elements from position a to position a þ N − 2 inclusive eightfold ambiguity in the cone assignments, which we must be in a common partition, and the doubler must be the discuss further in Sec. VII C. next one on the list. Therefore, candidate thrust partitions The key insight of Ref. [3] is that one need not always take the form ⃗ 2 recompute PJ for all OðN Þ candidate cones. Ignoring the eightfold ambiguity, let the candidate cones be labeled Hi;j ¼fi; ja;ja 1; …;ja N−2g: ð29Þ by i and j. For fixed i, one can define a special traversal a þ þ order for j such that only one particle enters or leaves the cone at a time. There are N particles labeled by i, and for (Note that, as in Eq. (26), both particle i and particle ja are i j O N N H fixed , sorting over takes ð log Þ time. After the always contained in i;ja .) ⃗ 2 initial OðNÞ determination of PJ for the first j values in the These observations allow us to construct an OðN log NÞ ⃗ sorted list, updating the value of PJ for each j iteration only algorithm for thrust. The outer loop involves iterating over all 2N i requires Oð1Þ time, since you need to only add the choices for . The inner loop involves the following O N N momentum of a point entering the cone or subtract the ð log Þ algorithm. We perform the sorting procedure i O N N momentum of a point leaving the cone. Thus, the final above for fixed , which takes ð log Þ time. For the first 2 element in the sorted list, we determine the partition Hi;j algorithm is OðN log NÞ. 1 a 1 P⃗ We can apply exactly the same sorting strategy to the using Eq. (29) with ¼ . We can readily compute i;j1 via computation of thrust, as shown in Fig. 2. The reason is that Eq. (27) in time OðNÞ and then compute the associated thrust the reference axis rˆij depends only on the cross product value via Eq. (25) in Oð1Þ. For the subsequent 2N − 3 p⃗i × p⃗j. This means that for fixed p⃗i, we can choose an elements of the sorted list, we step through them one by one, p⃗ H i; j ;j ; …;j ordering of the j such that the partitions induced by updating the partition from i;ja ¼f a aþ1 aþN−2g rˆi1; …; rˆiN Hi;j i; ja 1;ja 2; …;ja N−1 f g are specified by a single sortable parameter. to aþ1 ¼f þ þ þ g. In doing so, we need

094015-7 WEI, NAIK, HARROW, and THALER PHYS. REV. D 101, 094015 (2020)

p⃗j p⃗j −p⃗j Equivalently, under the transformation si 2xi − 1,we to subtract a and add aþN−1 (which is the same as a by ¼ the doubling trick), leading to the update step, can frame the optimization problem as a QUBO problem, where the objective function takes the form ⃗ ⃗ Pi;j ¼ Pi;j − p⃗j ; ð30Þ aþ1 a a XN 1 OðfxigÞ ¼ Qijxixj ð32Þ where one has to remember the factor of in Eq. (27).From 2 i;j¼1 the updated momentum, we recompute the associated thrust O 1 value via Eq. (25) in ð Þ time. The total time from stepping for xi ∈ f0; 1g. Note that the fact that i, j are now summed 2N − 3 O N 2 through the partition momenta is ð Þ, so the inner with repeated indices and the fact that x ¼ xi allow us to O N N i loop is dominated just by the initial ð log Þ sorting step. absorb the linear terms into the quadratic terms. T i j The maximum ij over all and sorted determines the final For the thrust problem, it is convenient to first define the hemisphere jets. three-momentum of a candidate partition as

XN D. Parallel classical algorithm ⃗ P xi p⃗ixi; 33 O N2 N ðf gÞ ¼ ð Þ The sorting algorithm above requires ð log Þ oper- i¼1 ations. In a model with a single CPU and random-access 2 memory, this corresponds to time OðN log NÞ as well. We where xi ¼ 1 if particle p⃗i is in the partition and xi ¼ 0 can also consider parallel computing models in which the N otherwise. Following Eq. (25), the thrust of this partition is words of memory are accompanied by N parallel process- given by ors; see Ref. [65] for more discussion of these models. In vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi u this case, we will see that a runtime of OðN log NÞ can be XN 2jP⃗j 2 u achieved. For simplicity, we do not consider the general T x t p⃗ p⃗x x : ðf igÞ ¼ T ¼ T i · j i j ð34Þ case in which the number of parallel CPUs and the amount denom denom i;j¼1 of memory can be varied independently, nor will we discuss the varying models of parallel computing in Ref. [65]. Because of the square root factor, this is not a QUBO We briefly sketch here how the sorting strategy in problem, but since the optimal partition is the same for Sec. III C can be sped up with parallel processors. There any monotonic rescaling of TðfxigÞ, we can optimize the are three main computational bottlenecks: iterating over squared relation i C j i all particles ( iter), sorting over particles for fixed C XN ( sort), and determining the hemisphere constituents over 2 4 j i C TðfxigÞ ¼ p⃗i · p⃗jxixj; ð35Þ each for fixed ( hemi), leading to a runtime of T2 O C C C denom i;j¼1 ð iterð sort þ hemiÞÞ. For sequential classical computing, we found C ¼ OðNÞ, C ¼ OðN log NÞ, and C ¼ iter sort hemi which now takes the form of the QUBO problem in OðNÞ. A parallel computer cannot improve on C ,but iter Eq. (32), as desired. Finding the ground state of there are parallel computing algorithms for sorting [66] −T x 2 and partial sums [67] that would allow us to achieve ðf igÞ (note the minus sign) is the same as determining C ¼ C ¼ Oðlog NÞ, leading to a OðN log NÞ run- thrust. sort hemi The space usage of a quantum annealing algorithm is time. We will compare the quantum and classical parallel O N x architectures in Sec. VI. ð Þ, corresponding to one qubit for each i. The annealing time required depends on the spectral gap of the particular Hamiltonian, and we leave the question of IV. THRUST VIA QUANTUM ANNEALING determining the spectral gap of the thrust objective function The first quantum algorithm we describe is based on to future work. quantum annealing [6,7]. In a quantum annealer such as the D-Wave system [8], the solution to an optimization V. THRUST VIA QUANTUM SEARCH problem is encoded in the ground state of a target Hamiltonian. Such a Hamiltonian takes the form of an We now describe a quantum algorithm for thrust based Ising model, on Grover search. We first describe the algorithm in terms of two abstract operations, LOOKUP and SUM, both of XN XN H s h s J s s ; which perform data loading in superposition. Then, we ðf igÞ ¼ i i þ ij i j ð31Þ describe two computing models for loading the classical i 1 i

094015-8 QUANTUM ALGORITHMS FOR JET CLUSTERING PHYS. REV. D 101, 094015 (2020)

FIG. 3. Quantum search algorithm due to Dürr-Høyer to find the index corresponding to the maximum entry of an array A½i with K elements [11]. The number of Grover steps is chosen at random, since this is a search over an unknown number of marked times [10]. take OðNÞ in both the classical and quantum models, so demanding than simply having A stored on disk, as we will we gain from decreasing the effective search space from discuss below. OðN2Þ to OðNÞ. The sequential model results in an Recall that Grover search finds one marked item out of algorithm that requires OðN2Þ time and Oðlog NÞ qubits. an array of K items, assuming the ability to reflect about The parallel model requires OðN log NÞ time and the marked item. Reference [10] further extends Ref. [9] to OðN log NÞ qubits. We also assess how the resource find one marked item when there are t>1 marked items, requirements of these algorithms scale with the precision assuming the ability to reflect about the multiple marked of the computation. items. Generic Grover search then consists of the follow- ing steps: P 1ffiffiffi K (1) Prepare thep initialffiffiffiffiffiffiffiffi state jψ 0i¼p i 1 jii. A. Algorithm overview K ¼ (2) Repeat Oð K=tÞ times: Our quantum thrust algorithm is based on the quantum (a) Reflect about the marked states. maximum finding algorithm of Dürr and Høyer [11], which (b) Reflect about the initial state jψ 0i. returns the maximum element of an unsorted array with K t pffiffiffiffi When the number of marked items is unknown, elements in Oð KÞ time, assuming quantum query access Ref. [10] employs an exponential searching algorithm that to the array. This algorithm is itself a generalization of guesses the number of marked items, increasing the guess Grover search [9]. by a constant factor each time. This is a probabilistic In this context, quantum query access means that for an algorithm that performs a measurement for eachpffiffiffiffiffiffiffiffi guess, array A½1; …;A½K, we can efficiently perform a unitary finding a solution in overall expected time Oð K=tÞ. operation U such that The maximum finding algorithm of Ref. [11], summa- rized in Fig. 3, is based on this probabilistic exponential searching algorithm. It keeps track of the current best UAjiij0i¼jiijA½ii; ð36Þ maximum seen so far and considers marked states to be those that have a larger array entry value than the current along with its inverse U†. The first register, containing jii, maximum. It employs the Grover-based exponential should have dimension at least K, so that j1i; …; jKi are searching algorithm of Ref. [10] for an unknown number each orthogonal states of the register, and the second of marked states, performing measurements to obtain the register should be large enough to store the values A½i. maximum with probability at least 1=2. If desired, we can Note that Eq. (36) does not fully specify the unitary UA improve the success probability to 1 − η with η > 0,at since it does not specify its action when the second register the cost of an extra Oðlog 1=ηÞ factor, by performing is not initially in the state j0i. One possible way to define Oðlog 1=ηÞ rounds of the algorithm. UA fully is to have Ujiijxi¼jiijx þ A½ii with addition Our quantum thrust algorithms are then a direct appli- n defined over an appropriately sized finite ring such as Z2, cation of quantum maximum finding, but now to an array but this is not necessary for applications such as in with K ¼ OðN2Þ entries corresponding to the choice of Refs. [9,11]. Quantum query access to an array A is more separating plane. To deal with the fourfold ambiguity, we

094015-9 WEI, NAIK, HARROW, and THALER PHYS. REV. D 101, 094015 (2020) use the doubling trick of Sec. III C, including each original Oðlog NÞ for parallel. Notably, a wide class of collider vector p⃗k and its negative −p⃗k in the list of three-momenta observables can be computed in linear runtime [68],even to obtain a search space of size K ¼ 4N2. Our problem, those that would naively scale like a high polynomial power. now, is to find the maximum value of Tij with i and j each Using LOOKUP and SUM, our quantum thrust algo- ranging over 2N possible indices. This requires us to be rithm is described in Fig. 4. As with standard Grover able to load the momentum vectors corresponding to each search, we need to be able to reflect about the initial state array index, which means that the quantum algorithm must and the marked states, namely, those whose corresponding have some means of accessing the classical data. values of thrust are larger than the best maximum seen so far. To identify the marked states, we compute thrust for each choice of separating plane, using LOOKUP and SUM B. Data loading considerations to interface the quantum algorithm with the classical data. We can describe our quantum thrust algorithms in terms of We uncompute intermediate steps of our calculations using two abstract operations, LOOKUP and SUM. Their imple- standard methods (e.g., Sec. 3.2 of Ref. [4]) to make sure mentation will be described in Sec. VC for the sequential that, after computing Tij, the system can be reflected about model and Sec. VEfor the parallel model. Beyond thrust, the initial state. C these operations are quite general in their application to Let LOOKUP be the asymptotic cost of LOOKUP and C loading classical data into quantum algorithms. SUM be the asymptotic cost of SUM. The runtime of this O N2 O N C C Note that our search space is of size ð Þ, while data algorithm is ð ð LOOKUP þ SUMÞÞ since there is an loading over N items in a classical database takes time OðNÞ outer Grover search loop, while the inner loop is OðNÞ. Therefore, we can conceptualize our quantum dominated by one application of LOOKUP and one speedup as resulting from being able to perform data application of SUM. Note that the computation of the loading over the superposition of search space items. It initial guess for the maximum, Tmn, can be performed is important here that the set of search space items is not the in OðNÞ time classically, while preparation of the initial same as the set of data points. In general, any application of state and reflection about the initial state can each be α Grover search over a search space of size OðN Þ with α ≥ 2 performed in Oðlog NÞ time, the time required to perform a will result in a square root speedup, whereas for α < 2, the Hadamard gate. cost of the algorithm will be dominated by the OðNÞ data loading cost. C. Sequential computing model The LOOKUP operation is queried with one index The first computing model we consider is one in which corresponding to a given particle, returning the momentum one gate, classical or quantum, can be executed per time corresponding to that index, step. We should think of the classical computer as con- ⃗ trolling the overall computation. In a single time step, it can U jiij0i¼jiijp⃗ii: ð37Þ LOOKUP either (a) perform a classical logic gate, (b) choose a Note that the second register, initialized as j0⃗i, has to be quantum gate or measurement, or (c) read a word from the large enough to store the three-momenta to the desired input (e.g., a single momentum). Another way to think U (qu)bit accuracy. To make LOOKUP unitary, we define about this model is that we measure cost by the circuit size, U i q⃗ i q⃗ p⃗ q⃗ LOOKUPj ij i¼jij þ ii for general vectors , where i.e., the total number of gates. the addition is done modulo some value larger than the While fault-tolerant quantum computers are expected to maximum momentum encountered in the problem. To deal require parallel control to perform error correction, there U with pairs of particles, we can call LOOKUP twice on are still plausible models in which the cost of the compu- ⃗ ⃗ different registers to map jiijjij0ij0i → jiijjijp⃗iijp⃗ji. This tation will be proportional to the number of logical gates. 2 One possibility is that the cost is dominated by generating LOOKUP operation will be used to determine all OðN Þ magic states or by long-range interactions. Another pos- reference axes rˆij, taking OðNÞ time in the sequential sibility is that we are using a small quantum computer model and Oðlog NÞ time in the parallel model. without fault tolerance, but in an architecture such as a one- The SUM operation returns the sum over all momenta, dimensional ion trap, where the available gates are long possibly with a transformation fðp⃗; cÞ applied to each range and cannot be parallelized. momentum vector, Under this sequential model, the operations LOOKUP N and SUM each take OðNÞ time and require Oðlog NÞ U c 0 c Σ f p⃗; c ; SUMj ij i¼j ij ð k Þi ð38Þ k¼1 qubits. Specifically, LOOKUP requires a register of size Oðlog NÞ to store the query index i, along with a register to c where represents possible control qubits. From a given store the requested three-momentum p⃗i. It operates by reference axis rˆij, SUM will be used to calculate the value of performing a sequential scan through all N items in the Tij. It is crucial that calculating Tij for fixed i and j takes the classical database to fetch and return p⃗i. More concretely, O N O 1 U same runtime as LOOKUP, i.e., ð Þ for sequential and in ð Þ time, we can perform LOOKUP;i, defined by

094015-10 QUANTUM ALGORITHMS FOR JET CLUSTERING PHYS. REV. D 101, 094015 (2020)

FIG. 4. Our Grover-based quantum thrust algorithm, written in terms of the abstract LOOKUP and SUM operations. The symbols j0⃗i, j0ˆi, and j0i refer to initial states for a three-momentum, normalized axis, and real number, respectively. Note that we have applied the doubling trick from Sec. III B, such that each p⃗k has its negative −p⃗k in the set of three-momenta. Cases where rˆij · p⃗k ¼ 0 are treated implicitly via Eq. (26). A key difference compared to Fig. 3 is that the quantity to maximize, Tij, is calculated quantumly via the COMP_T subroutine.

U i !0 i !p ; O N LOOKUP;ij ij i¼jij ii ð39aÞ SUM takes time ð Þ because it also performs one pass through all N items in theP classical database while computing U j !0 i !0 ; j ≠ i: N f p⃗; c LOOKUP;ij ij i¼jij i if ð39bÞ and returning the sum i¼1 ð i Þ. With this implementation of LOOKUP and SUM, with U C C O N Then we implement LOOKUP in Eq. (37) by performing LOOKUP ¼ SUM ¼ ð Þ, the Grover-search based thrust U U U O N O N2 O N LOOKUP;1 LOOKUP;2 LOOKUP;N in time ð Þ. Similarly, algorithm in Fig. 4 requires ð Þ time and ðlog Þ qubits.

094015-11 WEI, NAIK, HARROW, and THALER PHYS. REV. D 101, 094015 (2020)

D. Quantum improvements via sort? connectivity but not all-to-all connectivity. For example, One might wonder whether the runtime of the quantum Brierley [72] describes how connecting each qubit to four thrust algorithm could be reduced from OðN2Þ to other qubits is enough to simulate full connectivity with O N3=2 N Oðlog NÞ time overhead. In what follows, we neglect any ð log Þ, using the same strategy that we used in O N Sec. III C to reduce the classical thrust algorithm time from ðlog Þ or other factors from converting the abstract 3 2 circuit model to a concrete architecture. OðN Þ to OðN log NÞ. The answer is yes, in principle, Parallel data retrieval requires first preloading all N but it would require a computing model beyond the database items into the OðNÞ qubits. This can be done in sequential one. Oð1Þ time, since it requires only parallel copy (or CNOT) Recall that two points define the partitioning plane, and operations from the classical bits onto the qubits. (Even a after selecting the first point, we could sort the second point cost of OðNÞ at this stage would not change the asymptotic according to a special traversal order. This allowed us to runtime, so one could also consider input models in which avoid the OðNÞ cost of resumming the momenta for each the data could only be accessed sequentially, such as tape candidate plane. Quantum algorithms require ΩðN log NÞ storage.) This results in the state time for sort [69], which means that they cannot be used to speed up this part of the classical algorithm. In principle, 1 0⃗ 2 0⃗… N 0⃗ ↦ 1 p⃗ 2 p⃗ … N p⃗ : though, we could still obtain a Grover square root speedup j ij ij ij i j ij i j ij 1ij ij 2i j ij Ni ð40Þ when searching over the OðNÞ candidates for the first point pffiffiffiffi Note that this is not the same as qRAM [5], since we are determining the partitioning plane. Combining the Oð NÞ O N N loading the classical data into a product state once, and not Grover search over the first point with the ð log Þ sort assuming any kind of query access to the data. O N3=2 N over the second point would then yield an ð log Þ Now, given our preloaded data, we can perform overall algorithm. LOOKUP in time Oðlog NÞ by performing binary search The challenge here is that to perform quantum sort, all of on the query index i to locate qubits jiijpii. The binary the data need to be stored somehow in quantum memory, search can be made unitary using a series of OðNÞ SWAP which goes beyond the sequential computing model above gates. Letting i ¼ i1i2…iM in binary, if i1 ¼ 1 we swap the where only one data point is ever accessed in a given time first N=2 ði; piÞ pairs with the last N=2 ði; piÞ pairs, if step. We leave to future work the design of a quantum i2 ¼ 1 we swap the first N=4 ði; piÞ pairs with the next N=4 computing architecture suitable for loading and sorting data ði; piÞ pairs, and so on. After Oðlog NÞ swaps, we end up from a classical database. with qubits jiijp⃗ii in the first position. We can then copy Assuming that such a sort-friendly architecture exists, p⃗ 2 j ii into a blank register and uncompute the swaps. one might ask about the origin of the OðN log NÞ to O N 3=2 Similarly, we can perform SUM in time ðlog Þ by OðN log NÞ speed up. Such an improvement is only combining the entries level by level up a binary search tree possible since the strategy in Sec. III C converts thrust into indexed by i,withOðNÞ additional registers to store the a structured search problem [70,71], which evades the intermediate steps. That is, we first add all pairs of entries 0 0 0 naive bounds on quantum search performance. Of course, corresponding to indices i, i where i1 ¼ i1;i2 ¼ i2; …; 0 0 no matter the degree of structure, we can never do better iM−1 ¼ i and iM ≠ i .ThenwehaveN=2 entries O N M−1 M than the ð Þ cost to examine each data point once. indexed by j ¼ j1j2…jM−1, and again we add all pairs of 0 0 entries corresponding to indices j, j where j1 ¼ j1;j2 ¼ 0 0 0 E. Parallel computing model j2; …;jM−2 ¼ jM−2 and jM−1 ≠ jM−1. Repeating this proc- The parallel computing model reduces the time usage ess Oðlog NÞ times allows us to sum all the entries in parallel. of the sequential model at the expense of additional Thus, the quantum thrust algorithm for the parallel 4 C C O N space usage. Under this model, the operations LOOKUP data loading model, with LOOKUP ¼ SUM ¼ ðlog Þ, O N N O N N and SUM each take Oðlog NÞ time but require OðN log NÞ requires ð log Þ time and ð log Þ qubits. qubits. An abstract version of this model is the standard F. Resource requirements model, in which on N qubits we can In the above discussion, we focused on the scaling of our perform up to N=2 two-qubit gates on as many disjoint Grover-based quantum thrust algorithm in terms of the pairs of qubits as we like. A controlling classical computer number of particles N. Here, we want to provide more with the same parallelism can also be used to process the information on the practical resource requirements for this measurement outcomes and feed the results back in to algorithm in terms of the required precision of the thrust the quantum computer. To implement this in an actual computation. quantum computer, we would need to assume long-range Thus far, we have been working with data in the form of three-vectors p⃗i, where we assumed that the register 4We thank Iordanis Kerenidis for discussions related to this holding p⃗i is of constant size. Just how large is this point. constant, given that using a finite number of qubits would

094015-12 QUANTUM ALGORITHMS FOR JET CLUSTERING PHYS. REV. D 101, 094015 (2020) result in digitization error? For typical collider physics applications, such as anticipated for a future eþe− collider, we would want a dynamic range on momenta from the MeV scale (i.e., per-mille accuracy on GeV-scale hadrons) to the TeV scale (i.e., the rough energy scale for CLIC), or 6 around 6 orders of magnitude. This means b ¼ log2 10 ≈ 20 bits of accuracy. Since we are keeping track of d ¼ 3 FIG. 5. An example loading circuit mapping jiij0ij0i ↦ dimensions, the register holding the p⃗i must be of size i 0 xi . In our example, i i1i0 is two bits and x0;x1; db 3b j ij ij i ¼ ð ¼ . Thus, the total number of qubits required x2;x3Þ¼ð3; 2; 3; 1Þ. The CCNOT gates are drawn with open is Oðlog2 N þ dbÞ for the sequential algorithm and (closed) circles if they are controlled on the source bit being OðNðlog2 N þ dbÞÞ for the parallel algorithm. zero (one). To be more specific, the sequential version of the algorithm in Fig. 4 requires two registers with log2 N qubits, four registers with db qubits, and one register with b O N N C b C qubits, apart from any ancillas used in arithmetic oper- ð ðlog2 þ COMP T þ ÞÞ, where COMP T is the gate cost of the COMP_T subroutine. ations, for a total of 2 log2 N þð4d þ 1Þb qubits. For N ¼ 128 particles (after the doubling trick), which is reasonable What is COMP_T? To estimate this, we consider the steps for most eþe− applications, this is around 300 qubits. Such in COMP_T from Fig. 4, noting that these steps consist of a device is not far beyond current ≈50-qubit computers, so either data loading operations like LOOKUP and SUM, or it is naively plausible that the first quantum computer able elementary arithmetic operations like addition, multiplica- tion, and division. to run the sequential quantum thrust algorithm (without p⃗ p⃗ error correction) could be ready in time to compute realistic In step 1, we load i, j using LOOKUP. Note that the thrust distributions at a future eþe− collider. Of course, this circuit that implements this looks like the following: first, depends on the gate connectivity of such a device as well as we have an ancilla bit controlled on each bit in the index log2 N register i i N…i2i1i0 ; that is, we have a C NOT the achievable coherence time, and as discussed below, j i¼jlog2 i circuit depth may be more constraining than the number of gate connecting the ancilla to each index register jiki. This N qubits. For the parallel architecture, we need Nðlog2 N þ requires a total of log2 CNOT gates [73]. Then, controlled dbÞ additional qubits for initial data loading [see Eq. (40)], on whether or not the ancilla bit is set, we want to transform ⃗ though more qubits would most likely be required to the blank register j0i into the register jp⃗ii. We set each bit simulate full connectivity and to store intermediate steps of p⃗i controlled on whether or not the ancilla bit is set, so in 4 of the SUM operation. This points to an Oð10 Þ qubit total we require db CNOT gates. Finally, we uncompute the N device, which is rather optimistic on the 20 year timescale, ancilla bit by again applying the Clog2 NOT gate connect- though this could be made more realistic by preclustering ing the ancilla to the jii register, again requiring log2 N particles to reduce N or by using a smaller value of b. gates. In Fig. 5, we give an example circuit for i ¼ i1i0, Next, we consider the number of gates required by the indexing two bits corresponding to items 0, 1, 2, 3 with Grover-based thrust algorithm. We first apply 2 log2ð2NÞ example values. We have such a circuit for all indices i, Hadamard gates to obtain the initial state, a uniform requiring OðNðlog2 N þ dbÞÞ gates in total. For fault- superposition over the indices i, j. We then apply OðNÞ tolerant quantum computers, this procedure can be further iterations of the Grover operator G, where G consists of two optimized [74], but this does not significantly change the reflections: the reflection over all states with a thrust value resource scaling. greater than the current maximum, an operation requiring The remaining steps in COMP_T involve performing basic the subroutine COMP_T, and the reflection about the initial arithmetic operations like addition, multiplication, and state. Note that the reflection about the initial state can be division. Circuits for elementary operations like addition effected with an application of H⊗2N, followed by a and multiplication can be found in Ref. [75], while fault- reflection about the all-zeros state, followed by an appli- tolerant versions can also be found in the literature [76,77]. ⊗2N cation of H . The Hadamards require 4 log2 2N gates Note that for an input of n bits, addition requires OðnÞ total, while the reflection about the all-zeros state can gates, while multiplication and division require Oðn2Þ 5 be obtained using a controlled-Z operator controlled on gates. Steps 2 and 4 in COMP_T involve a series of having the state j0i in the first log2 N registers, which multiplications and divisions with n ¼ db bits, thus requir- 2 2 requires log2 N CNOT gates. Similarly, after performing ing Oðd b Þ gates. In step 3, we apply SUM controlled on COMP_T, we can perform the reflection over all states with a the sign of each rˆij · p⃗k. Here, we first compute each rˆij · p⃗k thrust value greater than the current maximum using a Z b controlled- operator controlled on the bits representing 5Asymptotically faster multiplication circuits exist, but they do the thrust value, an operation requiring b CNOT gates. not yet outperform the Oðn2Þ algorithm until n ∼ 103−4; we thank Thus, the total gate usage of the algorithm scales like Craig Gidney for pointing this out.

094015-13 WEI, NAIK, HARROW, and THALER PHYS. REV. D 101, 094015 (2020) and then set an ancilla bit depending on the sign of the dot VI. IS THERE A QUANTUM ADVANTAGE? 2 2 product, requiring OðNd b Þ gates total. Next, for each p⃗k, 3 Starting from the previously best known OðN Þ classical we need to both load the value (using a circuit similar to the algorithm on a sequential computer, we found an improved one from step 1, requiring OðNðlog2 N þ dbÞÞ gates total), 2 2 OðN log NÞ classical algorithm and an OðN Þ quantum and then add it to a running sum using an adder circuit O N N db algorithm. Because these scalings are identical up to a if the ancilla bit is set, requiring ð ðlog2 þ ÞÞ gates log N factor, one might wonder if there is any real quantum C O N N d2b2 total. Thus, COMP T ¼ ð ðlog2 þ ÞÞ, and the total advantage for the task of hemisphere jet finding. gate usage of the entire algorithm will scale like Formally, there is a quantum advantage if we make a O N2 N d2b2 ð ðlog2 þ ÞÞ. rather restricted assumption about the computing model. Finally, we consider circuit depth, which involves con- The sequential quantum computing model in Sec. VConly sidering which gates can be run in parallel. Note that the requires read access to the OðNÞ classical dataset, whereas OðNÞ Grover iterations G must come one after the other. the sorting strategy in Sec. III C requires write access to Likewise, within each Grover iteration G, the two reflec- OðN log NÞ classical bits. Thus, if one restricts the com- tions must come after each other. The parallelization puting model to have write access to only Oðlog NÞ happens within the subroutine COMP_T, where we can classical bits, then the classical sorting strategy cannot parallelize LOAD and SUM in the parallel computing be implemented. In that case, the best classical algorithm model via preloading; that is, we execute the loading would be the OðN3Þ one from Ref. [2], which would be p⃗ 2 circuits in parallel so that all the i are in memory, and bested by our OðN Þ quantum algorithm. then we process the p⃗i in parallel. For any realistic application of thrust, this computing First, we perform all N preloads in parallel, resulting in a model is overly limited, since data from a single collider gate depth of 2 log2 N þ db gates; this involves performing event can easily be read into random-access classical all N operations in the sequential LOAD operation at once. memory. On the other hand, it is not possible to read in After everything has been preloaded in parallel memory, we the entire LHC dataset into memory, and indeed some can perform either LOAD or SUM. To perform LOAD, we collider datasets are only stored on tape drives. For this want to execute a series of log2 N swaps and then a copy, reason, there may be interesting quantum advantages for which requires Oðlog2 N þ dbÞ CNOT gates, so that the clustering algorithms that act on ensembles of events whole LOAD operation has a depth of Oðlog2 N þ dbÞ. (instead of on ensembles of particles in a single event). Meanwhile, to perform the SUM operation after everything See Ref. [79] for recent developments along these lines. has been preloaded in parallel memory, we note that we For the parallel computing models, there is no formal ˜ must execute the parallel LOAD operation for each p⃗k, then limit with a quantum advantage, since we need OðNÞ (qu) calculate and control on the quantity rˆij · p⃗k for each p⃗k, bits with read-write access in both the quantum and and then we must finally sum all the N vectors. The parallel classical cases. Note that the speed up in the classical load requires a gate depth of Oðlog2 N þ dbÞ, while the dot and quantum cases come from rather different sources. 2 product calculation requires a gate depth of Oðd2b2Þ. Classical sorting splits the OðN Þ search space into an O N O N Finally, we need to perform a series of log2 N additions, ð Þ outer loop and an ðlog Þ inner loop. By contrast, O N2 which requires db log2 N gates. Thus, the SUM operation the quantum algorithmffiffiffiffiffiffi searches the ð Þ search space as 2 2 p 2 requires a total circuit depth of Oðdb log2 N þ d b Þ. Then a whole in Oð N Þ runtime. C O db N d2b2 This last observation suggests that for even larger search COMPT ¼ ð log2 þ Þ, and the circuit depth of the entire parallel algorithm scales like OðNðdb log2 Nþ spaces, there might be a quantum advantage even if there d2b2ÞÞ. Note that for the sequential model, the circuit exist classical sorting strategies. If classical sorting can only 2 s O Nα depth is just the same as the gate count of OðN ðlog2 N þ sort of the search dimensions, then for an ð Þ search d2b2 space, the classical runtime would scale proportional to ÞÞ since we are not running operations in parallel. O Nα−s s N Thus, again taking an example with N ¼ 128 particles ð log Þ. The quantum runtime would scale propor- O Nα=2 (after the doubling trick), we would expect a circuit depth tional to ð Þ, which would be faster than the classical α > 2s M of around 107 gates for the sequential model and 105 for the case for . This might be relevant for the -jet O N2M parallel model. On a noisy device, we currently do not finding problem mentioned in Sec. VII D with an ð Þ expect to be able to execute an algorithm requiring more search space. than 103 gates [78], so again we believe that preclustering particles to reduce N or using a small value of b could make VII. GENERALIZATIONS these algorithms more realistic on a NISQ device. We note In this section, we discuss how to apply the quantum that because circuit depth and qubit usage come at a algorithms from Secs. IV and V to jet identification tradeoff, circuit depth is the limiting factor for the sequen- methods that generalize thrust. These algorithms are more tial model, while qubit usage is the limiting factor for the closely related to the ones used at the LHC, since they parallel model. involve a jet radius parameter R.

094015-14 QUANTUM ALGORITHMS FOR JET CLUSTERING PHYS. REV. D 101, 094015 (2020)

n Pμ μ μ μ μ OðP ;n Þ¼E − þ λðnμn Þ; 1 − cos R nˆ · P⃗− E cos R ¼ þ λðnˆ 2 − 1Þ; ð45Þ 1 − cos R where λ is again a Lagrange multiplier, and we maximize over both the choice of partition and the choice of axis. The second line makes it clear that R ¼ π=2 returns the thrust FIG. 6. Partitioning an event into a stable cone jet of radius R objective function in Eq. (17). and an unclustered region. This is the same as Fig. 1 Performing the same manipulations as in Sec. II D, the when R ¼ π=2. optimum axis (for fixed partition) is   ⃗ μ P We start with algorithms that divide the event into n ¼ 1; : ð46Þ opt P⃗ a single jet and an unclustered region, as in Fig. 6. j j (For the thrust problem, R ¼ π=2 and the unclustered region is the opposite hemisphere.) We then mention Since the optimum axis is aligned with the jet three- strategies to identify multiple jets. To simplify the dis- momentum, this is an example of a stable cone algorithm; cussion, we continue to use the ðpx;py;pzÞ coordinate see Sec. VII C below. The reduced SINGLECONE objective system for electron-positron collisions, noting that the function is methods below can be adapted to the standard proton- P⃗ − E R proton coordinate system of transverse momentum (pT), μ μ μ j j cos OðP Þ ≡ OðP ;n Þ¼ ; ð47Þ rapidity (y), and azimuth (ϕ). opt 1 − cos R which is an example of a jet function maximization A. SingleCone algorithm [43–45]. The optimum solution partitions the The generalizations we consider are all based on or event into a clustered region H and an unclustered region inspired by the analysis of Ref. [62], which showed that the (the complement of H). This definition of the problem thrust duality in Sec. II D holds for a one-parameter family naturally lends itself to quantum annealing in Sec. VII B. of jet finding algorithms. No matter which dual formulation Doing the dual manipulation, the optimum partition (for is used, we refer to this jet finding strategy as SINGLECONE, fixed axis) is since it finds a single stable cone jet of radius R. To match the literature, we use four-vector notation in XN Pμ pμΘ E 1 − R − n pμ : this section. The four-momentum of a particle is opt ¼ i ð ið cos Þ μ i Þ ð48Þ i¼1 μ pi ¼ðEi; p⃗iÞ; ð41Þ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Writing the Heaviside theta function requirement in three- 0 2 2 momentum language, where the energy Ei ≡ pi ¼ p⃗i þ mi depends on the mass mi of particle i. The four-momentum of a candidate H nˆ · p⃗i partition is > cos R; ð49Þ X Ei μ μ ⃗ P ¼ pi ≡ ðE; PÞ; ð42Þ i∈H we see that for massless particles (Ei ¼jp⃗ij), the jet constituents are those within an angular distance R of where E ≡ P0 is the total energy of the partition. A lightlike the jet axis. For R ¼ π=2, this yields the thrust hemisphere axis is given by regions. The reduced SINGLECONE objective function is now nμ ¼ð1; nˆÞ; ð43Þ O nμ ≡ O Pμ ;nμ ð Þ ð opt Þ 2   with nˆ ¼ 1. We contract indices with the mostly minus XN XN n · pi metric, ¼ Ei − min Ei; ; ð50Þ 1 − cos R i¼1 i¼1 μ 0 0 pμq ¼ p q − p⃗· q:⃗ ð44Þ where we dropped the Lagrange multiplier term for The SINGLECONE jet finder is based on maximizing the compactness. The second term in Eq. (50) is an example following objective function [62]: of an N-jettiness measure [80–82] with N ¼ 1, whose

094015-15 WEI, NAIK, HARROW, and THALER PHYS. REV. D 101, 094015 (2020) minimum yields the XCone jet algorithm [32,33]. This however, generally yield a different solution compared to definition of the problem naturally lends itself to quantum SINGLECONE. Unlike SINGLECONE, which yields perfectly search in Sec. VII C. conical jets for massless particles via Eq. (49), this QUBO jet finder has an effective jet radius that depends on the mass B. Jet function maximization of the jet [44,45]. Quadratic objective functions are also explored in Ref. [47] for jet clustering at the LHC. In future In the jet function maximization approach of Refs. [43– work, we plan to characterize the general phenomenological 45], the goal is to optimize Pμ for a global jet function. The properties of jets identified using QUBO objectives. original jet function from Ref. [43] can be written as

1 M2 C. Stable cone finding O ðPμÞ¼E − ; ð51Þ Georgi 2ð1 − cos RÞ E Stable cone algorithms search over candidate jet regions of radius R and select ones that are stable [34,83], where the jet mass is meaning that the center of the jet region aligns with the jet momentum. As shown in Eqs. (46) and (49), SINGLECONE 2 μ 2 ⃗2 M ≡ PμP ¼ E − P : ð52Þ is an example of a stable cone algorithm, which is closely related to SISCone [3]. In the limit M ≪ E, this matches the reduced SINGLECONE It is worth emphasizing two key differences between objective function of Eq. (47), though they yield different SINGLECONE and SISCone. First, SINGLECONE finds a single optimal jet regions for finite-mass jets. jet, whereas SISCone finds all stable cones, and a separate Since jet function maximization is a kind of partitioning split/merge step is needed to determine the final jet regions. problem, it is natural to try to write these objective That said, it is possible to run SISCone in progressive functions in QUBO form. However, the original jet removal (PR) mode, where one finds the most energetic function from Eq. (51) is not quadratic since it involves stable cone, removes the found jet constituents, and repeats a 1=E factor, and the SINGLECONE function in Eq. (47) is the SISCone procedure on the unclustered particles. In this way, SISCone-PR acts like an iterated application of not quadratic since jP⃗j involves a square root. Thus, these SINGLECONE. Second, SINGLECONE finds the jet region cannot be rewritten as QUBO problems without some kind E − O M2=E of modification. with the largest value of Eq. (47) (¼ ð Þ), whereas SISCone-PR would typically take the stable cone In the analysis of Sec. IV for thrust, we got around this E issue by squaring the thrust objective function, which with the largest plain energy . As we will see below, nevertheless yielded the same partitioning solution. This though, it is still possible to develop quantum algorithms approach does not work in this more general case because for stable cones with alternative jet hardness sorting of nonquadratic cross terms. schemes. It is straightforward to implement the SINGLECONE What we can do, however, is square the SINGLECONE algorithm (a.k.a. SISCone-PR with Eq. (47) ordering) via objective in Eq. (47) but only keep the lowest nontrivial 6 quantum search. Just as two points define a partitioning term in the M ≪ E limit. (Squaring and expanding plane, two points are enough to determine a cone region of Eq. (51) yields the same result.) This gives the following radius R [3]. (This is true up to an eightfold ambiguity, QUBO objective function: which is twice that of the thrust case because the two M2 candidate cones are not complements of each other as they O P E2 − QUBOð μÞ¼ 1 − R are for hemispheres.) We can use the LOOKUP operation cos O N2 2 to determine all ð Þ candidate reference axes (which are P⃗ − E2 cos R ¼ not the same as the jet axes, but yield the same partitions). 1 − cos R We can then use SUM to calculate Eq. (47) for a fixed   XN Pμ p⃗i · p⃗j − EiEj cos R reference axis, since finding for the particles in the ¼ xixj; ð53Þ candidate jet region is a linear operation. We finally use 1 − cos R i;j¼1 Grover search to find the partition that maximizes Eq. (47), and we are guaranteed that the found cone jet will be stable x ∈ 0; 1 R π=2 where again i f g. Taking ¼ in Eq. (53) then via Eq. (46). This algorithm now has the identical structure recovers the thrust (squared) problem. It is interesting that to thrust, with the same asymptotic scaling as in Table I, Eq. (53) has the same form as the generalized jet functions taking us from a classical OðN3Þ algorithm (without sort) n 2 α 2 in Ref. [44] (Ref. [45]) with ¼ ( ¼ ). to a quantum OðN log NÞ algorithm (with parallel data This objective function corresponds to a QUBO problem loading). and can thus be solved on a quantum annealer. It will, Note that the quantum maximum finding algorithm only returns one maximum element of an array, so we cannot use 6We thank Eric Metodiev for discussions related to this point. it to speed up an algorithm for identifying all stable cones.

094015-16 QUANTUM ALGORITHMS FOR JET CLUSTERING PHYS. REV. D 101, 094015 (2020)

We can, however, use it to find one stable cone with a well-separated jets of comparable energies. This objective different objective function from Eq. (47). For example, to function does not penalize empty jet regions, so it might be implement SISCone-PR with standard energy ordering, we interesting to run this algorithm with a large value of M to can use a subroutine consisting of two SUM operations in let the number of nonempty jet regions be determined series. The first SUM determines Pμ for the candidate jet dynamically. region, while the second SUM finds P˜ μ for all particles Compared to the single-jet case, the multijet case will within a radius R of Pμ. This subroutine would return P0 if likely be more difficult to implement on currently available Pμ ¼ P˜ μ, while it would return 0 if Pμ ≠ P˜ μ. One would quantum annealing hardware. Previous numerical studies then use Grover search to find the maximum subroutine [55] have shown that clustering problems that use multiple output, with the same asymptotic quantum gains as in the qubits to implement one-hot encoding are prone to errors. SINGLECONE case. The reason is that on annealing hardware, qubit couplings have a maximum dynamic range, which in turn limits the Λ D. Multiregion optimization effectiveness of the penalty term. In practice, this means that annealers often output a fuzzy assignment rather than a Typical collider studies involve more than one jet per hard assignment to one cluster. We would also like to argue event, so it is interesting to ask whether these quantum that this problem is conceptual in origin. The search space methods can be adapted to the multijet case. As already of the single-jet QUBO problem is 2N, whereas the search mentioned, one can use a PR strategy to identify multiple MN space of the multijet QUBO problem is 2 . However, the jet regions, so finding M jets just requires M iterations of QUBO quantum search space contains many extra unphys- the algorithms above. Except in specialized circumstances, N ical states, since the actual (non-QUBO) search space is the number of desired jet regions does not grow with and N N log M 2 size M ¼ 2 . While the most natural way to address is at most Oð1=R Þ, so the runtime of SINGLECONE-PR this would be to use qudits with d ¼ M instead of qubits, would scale linearly with M. That said, we are interested such hardware is not currently available. in simultaneously optimizing the jet regions as in XCone Turning to the quantum search case, finding M conical algorithm [32,33], in order to treat the overlapping jet jet regions naively requires searching a space of O N2M , regions in a more sophisticated way than just PR. ð Þ with the added complication of needing to treat overlapping The QUBO objective in Eq. (53) can be easily gener- jet regions. We are unaware of any classical approach to alized to the M-jet case using OðNðM þ 1ÞÞ qubits, suitable for quantum annealing. Instead of a binary assign- this problem apart from brute force, though one expects O N2Mþ1 ment of each particle to the clustered or unclustered region, an ð Þ algorithm for the XCone objective should be we can do a one-hot encoding with M þ 1 qubits per feasible, though it likely requires a more sophisticated particle to indicate their assignment to one of the M jet treatment of reference axes. (The current implementation of XCone in FASTJET CONTRIB 1.041 [31,84] only finds a regions or to the unclustered region. Specifically, let xir ∈ f0; 1g for i ∈ f1; …;Ng and r ∈ f0; 1; …;Mg. We assign local minimum starting from suitable seed axes.) Using quantum search with sequential (parallel) data loading, xi0 ¼ 1 if particle i is in the unclustered region, xir ¼ 1 O NMþ1 if particle i is in jet region r for r ∈ f1; …;Mg, and one might hope that this could be improved to ð Þ O NM N xir ¼ 0 otherwise. We then add a penalty term to the ( ð log Þ), though one would have to generalize the objective function such that, for fixed i, xir ¼ 1 for only LOOKUP and SUM operations to deal with the multijet one value of r. case. At minimum, LOOKUP would have to load the The multijet QUBO objective function is momenta into 2M registers (to label the candidate parti- tions), and SUM would have to have M distinct outputs   XM XN M p⃗i · p⃗j − EiEj cos R (for each of the jet regions). Even with quantum gains, O ðfxijgÞ ¼ xirxjr this is computationally daunting, motivating future studies QUBO 1 − cos R r¼1 i;j¼1 of multijet algorithm whose computational complexity   XN XM 2 grows only polynomially with M. 2 þ Λ 1 − xir : ð54Þ i 1 r 0 ¼ ¼ VIII. CONCLUSIONS Here, there is a copy of Eq. (53) for each ofP the M jet In this work, we demonstrated how quantum computers O − Q x x regions, taking the schematic form of ¼ i;j ij i j. could be applied to a realistic collider physics problem, The coefficient of the penalty term must be taken to be which requires interfacing a classical dataset with a 2 Λ >Nmaxij Qij to ensure that it is never favorable for a quantum algorithm. We focused on maximizing thrust to particle to be assigned to more than one jet region. Because identify hemisphere jets, but the quantum methods devel- Eq. (54) is quadratic in the momentum, it will not have the oped here are relevant to a broader range of optimization same behavior as XCone (which has a linear objective and cluster-finding problems. The asymptotic performance function), though we expect the results to be similar for of our quantum annealing and quantum search algorithms

094015-17 WEI, NAIK, HARROW, and THALER PHYS. REV. D 101, 094015 (2020) is summarized in Table I. We found a way to improve the typical collider tasks like filling a histogram involve O N3 O N previously best known ð Þ classical thrust algorithm to ð eventsÞ operations, such that data loading is already an OðN2Þ sequential quantum algorithm. Along the way, we the limiting factor. On the flip side, this motivates further 2 O N2 found an improved OðN log NÞ classical algorithm, based quantum investigations into classically ð eventsÞ data on the sorting strategy of Ref. [3]. Both the quantum manipulation strategies, such as the metric space approach and improved classical algorithms can be implemented recently proposed in Ref. [79], since they might be O N on parallel computing architectures with asymptotic reducible to ð eventsÞ quantum algorithms under suitable OðN log NÞ runtime. Formally, we found a quantum advan- circumstances. We also note that Grover search is limited to tage, but only when assuming a computing model with read a square-root speedup on unstructured search, whereas access to OðNÞ (qu)bits but write access to only Oðlog NÞ collider data have additional structures like symmetries and (qu)bits. heuristics which might lead to further quantum gains. Going beyond thrust, we briefly generalized our quantum methods to handle structurally similar jet clustering algo- ACKNOWLEDGMENTS rithms. These involve maximizing an objective function with a radius parameter R, which partitions the event into a conical We would like to thank Howard Georgi, Craig Gidney, jet region and an unclustered region. While we focused on Iordanis Kerenidis, Patrick Komiske, Andrew Larkoski, electron-positron collisions, it is known how to adapt these Eric Metodiev, Stephen Mrenna, Benjamin Nachman, methods to proton-proton collisions [45,47,62]. In future Gavin Salam, Matthew Schwartz, Gregory Soyez, and work, we plan to investigate the phenomenological perfor- Jean-Roch Vlimant for helpful conversations and sugges- mance of these “quantum friendly” jet algorithms at the LHC, tions. This work was supported by the Office of High to assess whether they offer improved physics performance Energy Physics of the U.S. Department of Energy (DOE) relative to hierarchical clustering schemes like anti-kt. under Grants No. DE-SC0012567 and No. DE-SC0019128 The main take home message from this work is that the (QuantISED). A. Y. W is additionally supported by a overhead of data loading must be carefully accounted for Computational Science Graduate Fellowship from the when evaluating the potential for quantum speedups on DOE. A. W. H is additionally supported by NSF Grants classical datasets. In many ways, optimization-based jet No. CCF-1452616, No. CCF-1729369, and No. PHY- algorithms are an ideal platform to think about quantum 1818914; ARO Contract No. W911NF-17-1-0433; and algorithms for collider physics, since these problems tend the MIT-IBM Watson AI Lab. J. T. is additionally sup- to involve searching over a large space of possibilities, ported by the Simons Foundation through a Simons OðNαÞ with α ≥ 2, and therefore benefit from Grover Fellowship in Theoretical Physics, and he benefited from search methods. By contrast, even though the number of the hospitality of the Harvard Center for the Fundamental N events in a collider data sample ( events) is usually much Laws of Nature and the Fermilab Distinguished Scholars larger than the number of final-state particles in a jet, program.

[1] G. P. Salam, Towards jetography, Eur. Phys. J. C 67, 637 [8] S. Boixo, T. Ronnow, S. Isakov, Z. Wang, D. Wecker, D. (2010). Lidar, J. Martinis, and M. Troyer, Evidence for quantum [2] H. Yamamoto, An efficient algorithm for calculating thrust annealing with more than one hundred qubits, Nat. Phys. 10, in high multiplicity reactions, J. Comput. Phys. 52, 597 218 (2014). (1983). [9] L. K. Grover, A fast quantum mechanical algorithm for [3] G. P. Salam and G. Soyez, A practical seedless infrared-safe database search, in Proceedings of the Twenty-Eighth cone jet algorithm, J. High Energy Phys. 05 (2007) 086. Annual ACM Symposium on Theory of Computing, STOC [4] M. A. Nielsen and I. L. Chuang, Quantum Computation ’96 (ACM, New York, 1996), pp. 212–219. and : 10th Anniversary Edition [10] M. Boyer, G. Brassard, P. Hoyer, and A. Tapp, Tight bounds (Cambridge University Press, New York, 2011), 10th ed. on quantum searching, Fortschr. Phys. 46, 493 (1998). [5] V.Giovannetti, S. Lloyd, and L. Maccone, Quantum Random [11] C. Durr and P. Hoyer, A quantum algorithm for finding the Access Memory, Phys. Rev. Lett. 100, 160501 (2008). minimum, arXiv:quant-ph/9607014. [6] T. Kadowaki and H. Nishimori, Quantum annealing in the [12] S. Brandt, C. Peyrou, R. Sosnowski, and A. Wroblewski, transverse Ising model, Phys. Rev. E 58, 5355 (1998). The principal axis of jets. An attempt to analyze high-energy [7] E. Farhi, J. Goldstone, S. Gutmann, and M. Sipser, collisions as two-body processes, Phys. Lett. 12, 57 (1964). Quantum computation by adiabatic evolution, arXiv:quant- [13] E. Farhi, A QCD Test for Jets, Phys. Rev. Lett. 39, 1587 ph/0001106. (1977).

094015-18 QUANTUM ALGORITHMS FOR JET CLUSTERING PHYS. REV. D 101, 094015 (2020)

[14] A. De Rujula, J. R. Ellis, E. G. Floratos, and M. K. Gaillard, [35] C. F. Berger et al., Snowmass 2001: Jet energy flow project, QCD predictions for hadronic final states in eþe− annihi- eConf C010630, P512 (2001). lation, Nucl. Phys. B138, 387 (1978). [36] L. Angelini, P. De Felice, M. Maggi, G. Nardulli, L. Nitti, [15] D. P. Barber et al., Tests of quantum chromodynamics and a M. Pellicoro, and S. Stramaglia, Jet analysis by determin- direct measurement of the strong coupling constant αS at istic annealing, Phys. Lett. B 545, 315 (2002). pffiffiffi s ¼ 30-GeV, Phys. Lett. 89B, 139 (1979). [37] L. Angelini, G. Nardulli, L. Nitti, M. Pellicoro, D. Perrino, [16] W. Bartel et al. (JADE Collaboration), Observation of planar and S. Stramaglia, Deterministic annealing as a jet cluster- three jet events in eþe− annihilation and evidence for gluon ing algorithm in hadronic collisions, Phys. Lett. B 601,56 bremsstrahlung, Phys. Lett. 91B, 142 (1980). (2004). [17] M. Althoff et al. (TASSO Collaboration), Jet production and [38] D. Y. Grigoriev, E. Jankowski, and F. V. Tkachov, Towards a fragmentation in eþe− annihilation at 12-GeV to 43-GeV, Standard Jet Definition, Phys. Rev. Lett. 91, 061801 (2003). Z. Phys. C 22, 307 (1984). [39] D. Y. Grigoriev, E. Jankowski, and F. V. Tkachov, Optimal [18] D. Bender et al., Study of quark fragmentation at 29-GeV: jet finder, Comput. Phys. Commun. 155, 42 (2003). Global jet parameters and single particle distributions, [40] S. Chekanov, A new jet algorithm based on the k-means Phys. Rev. D 31, 1 (1985). clustering for the reconstruction of heavy states from jets, [19] G. S. Abrams et al. (MARK-II Collaboration), First Eur. Phys. J. C 47, 611 (2006). Measurements of Hadronic Decays of the Z Boson, Phys. [41] Y.-S. Lai and B. A. Cole, Jet reconstruction in hadronic Rev. Lett. 63, 1558 (1989). collisions by Gaussian filtering, arXiv:0806.1499. et al. [42] I. Volobouev, FFTJet: A package for multiresolution particle [20] Y. K. Li (AMY Collaboration),pffiffiffi Multi-hadron event properties in eþe− annihilation at s ¼ 52 GeV to 57-GeV, jet reconstruction in the Fourier domain, arXiv:0907.0270. Phys. Rev. D 41, 2675 (1990). [43] H. Georgi, A simple alternative to jet-clustering algorithms, [21] D. Decamp et al. (ALEPH Collaboration), Measurement of arXiv:1408.1161. αs from the structure of particle clusters produced in [44] S.-F. Ge, The Georgi algorithms of jet clustering, J. High hadronic Z decays, Phys. Lett. B 257, 479 (1991). Energy Phys. 05 (2015) 066. J [22] W. Braunschweig et al. (TASSO Collaboration), Global jet [45] Y. Bai, Z. Han, and R. Lu, ET : A global jet finding properties at 14-GeV to 44-GeV center-of-mass energy in algorithm, J. High Energy Phys. 03 (2015) 102. eþe− annihilation, Z. Phys. C 47, 187 (1990). [46] L. Mackey, B. Nachman, A. Schwartzman, and C. Stansbury, 2 [23] K. Abe et al. (SLD Collaboration), Measurement of αs ðMZÞ Fuzzy jets, J. High Energy Phys. 06 (2016) 010. 0 JII from hadronic event observables at the Z resonance, [47] Y. Bai, Z. Han, and R. Lu, ET : A two-prong jet finding Phys. Rev. D 51, 962 (1995). algorithm, arXiv:1509.07522. [24] A. Heister et al. (ALEPH Collaboration), Studies of QCD at [48] A. Mott, J. Job, J. R. Vlimant, D. Lidar, and M. Spiropulu, Z0 centre-of-mass energies between 91-GeV and 209-GeV, Solving a Higgs optimization problem with quantum Eur. Phys. J. C 35, 457 (2004). annealing for machine learning, Nature (London) 550, [25] J. Abdallah et al. (DELPHI Collaboration), A study of the 375 (2017). energy evolution of event shape distributions and their [49] A. Zlokapa, A. Mott, J. Job, J.-R. Vlimant, D. Lidar, and means with the DELPHI detector at LEP, Eur. Phys. J. C M. Spiropulu, Quantum adiabatic machine learning with 29, 285 (2003). zooming, arXiv:1908.04480. [26] P. Achard et al. (L3 Collaboration), Studies of hadronic [50] D. Provasoli, B. Nachman, W. A. de Jong, and C. W. Bauer, event structure in eþe− annihilation from 30-GeV to 209- A quantum algorithm to efficiently sample from interfering GeV with the L3 detector, Phys. Rep. 399, 71 (2004). binary trees, arXiv:1901.08148. [27] G. Abbiendi et al. (OPAL Collaboration), Measurement of [51] C. W. Bauer, B. Nachman, D. Provasoli, and W. A. De Jong, event shape distributions and moments in e þ e → hadrons A quantum algorithm for high energy physics simulations, at 91-GeV—209-GeV and a determination of αs, Eur. Phys. arXiv:1904.03196. J. C 40, 287 (2005). [52] I. Shapoval and P. Calafiura, Quantum associative memory [28] M. Cacciari, G. P. Salam, and G. Soyez, The catchment area in HEP track pattern recognition, EPJ Web Conf. 214, of jets, J. High Energy Phys. 04 (2008) 005. 01012 (2019). [29] M. Cacciari, G. P. Salam, and G. Soyez, The anti-kt jet [53] F. Bapst, W. Bhimji, P. Calafiura, H. Gray, W. Lavrijsen, and clustering algorithm, J. High Energy Phys. 04 (2008) 063. L. Linder, A pattern recognition algorithm for quantum [30] M. Cacciari and G. P. Salam, Dispelling the N3 myth for the annealers, arXiv:1902.08324. kt jet-finder, Phys. Lett. B 641, 57 (2006). [54] A. Zlokapa, A. Anand, J.-R. Vlimant, J. M. Duarte, J. Job, D. [31] M. Cacciari, G. P. Salam, and G. Soyez, FastJet user manual, Lidar, and M. Spiropulu, Charged particle tracking with Eur. Phys. J. C 72, 1896 (2012). quantum annealing-inspired optimization, arXiv:1908.04475. [32] I. W. Stewart, F. J. Tackmann, J. Thaler, C. K. Vermilion, [55] V. Kumar, G. Bass, C. Tomlin, and J. Dulny, Quantum and T. F. Wilkason, XCone: N-jettiness as an exclusive cone annealing for combinatorial clustering, Quantum Inf. Proc- jet algorithm, J. High Energy Phys. 11 (2015) 072. ess. 17, 39 (2018). [33] J. Thaler and T. F. Wilkason, Resolving boosted jets with [56] F. Neukart, D. Von Dollen, and C. Seidel, Quantum-assisted XCone, J. High Energy Phys. 12 (2015) 051. cluster analysis, arXiv:1803.02886. [34] S. D. Ellis, J. Huston, and M. Tonnesmann, On building [57] S. Brandt and H. D. Dahmen, Axes and scalar measures of better cone jet algorithms, eConf C010630, 513 (2001). two-jet and three-jet events, Z. Phys. C 1, 61 (1979).

094015-19 WEI, NAIK, HARROW, and THALER PHYS. REV. D 101, 094015 (2020)

[58] V. Bhattiprolu, M. Ghosh, V. Guruswami, E. Lee, and [72] S. Brierley, Efficient implementation of quantum circuits M. Tulsiani, Inapproximability of matrix pð→Þq norms, with limited qubit interactions, arXiv:1507.04263. arXiv:1802.07425. [73] A. Barenco, C. H. Bennett, R. Cleve, D. P. DiVincenzo, [59] D. Steinberg, Computation of matrix norms with applications N. Margolus, P. Shor, T. Sleator, J. A. Smolin, and H. to robust optimization, Master’s thesis, Technion, 2005. Weinfurter, Elementary gates for quantum computation, [60] A. Bhaskara and A. Vijayaraghavan, Approximating matrix Phys. Rev. A 52, 3457 (1995). p-norms, arXiv:1001.2613. [74] G. H. Low, V. Kliuchnikov, and L. Schaeffer, Trading [61] J. M. Hendrickx and A. Olshevsky, Matrix P-norms are T-gates for dirty qubits in state preparation and unitary NP-hard to approximate if p ≠ 1; 2; ∞, arXiv:0908.1397. synthesis, arXiv:1812.00954. [62] J. Thaler, Jet maximization, axis minimization, and stable [75] V. Vedral, A. Barenco, and A. Ekert, Quantum networks for cone finding, Phys. Rev. D 92, 074001 (2015). elementary arithmetic operations, Phys. Rev. A 54, 147 [63] T. Sjstrand, S. Ask, J. R. Christiansen, R. Corke, N. Desai, (1996). P. Ilten, .S. Mrenna, S. Prestel, C. O. Rasmussen, and P. Z. [76] C. Gidney, Halving the cost of quantum addition, Quantum Skands, An introduction to PYTHIA 8.2, Comput. Phys. 2, 74 (2018). Commun. 191, 159 (2015). [77] H. Thapliyal, E. Muoz-Coreas, T. S. S. Varun, and T. S. [64] T. Sjostrand, S. Mrenna, and P. Z. Skands, PYTHIA 6.4 Humble, Quantum circuit designs of integer division opti- physics and manual, J. High Energy Phys. 05 (2006) 026. mizing T-count and T-depth, arXiv:1809.09732. [65] U. Vishkin, Thinking in parallel: Some basic data-parallel [78] J. Preskill, Quantum computing in the NISQ era and algorithms and techniques (2010), online class notes. beyond, Quantum 2, 79 (2018). [66] D. M. W. Powers, Parallelized with optimal [79] P. T. Komiske, E. M. Metodiev, and J. Thaler, The Metric speedup, in Proceedings of the International Conference Space of Collider Events, Phys. Rev. Lett. 123, 041801 on Parallel Computing Technologies (World Scientific, (2019). Novosibirsk, 1991). [80] I. W. Stewart, F. J. Tackmann, and W. J. Waalewijn, [67] R. E. Ladner and M. J. Fischer, Parallel prefix computation, N-Jettiness: An Inclusive Event Shape to Veto Jets, Phys. J. ACM 27, 831 (1980). Rev. Lett. 105, 092002 (2010). [68] P. T. Komiske, E. M. Metodiev, and J. Thaler, Cutting [81] J. Thaler and K. Van Tilburg, Identifying boosted objects multiparticle correlators down to size, to be published. with n-subjettiness, J. High Energy Phys. 03 (2011) 015. [69] P. Hoyer, J. Neerbek, and Y. Shi, Quantum complexities [82] J. Thaler and K. Van Tilburg, Maximizing boosted top of ordered searching, sorting, and element distinctness, identification by minimizing n-subjettiness, J. High Energy Algorithmica 34, 429 (2002). Phys. 02 (2012) 093. [70] A. Montanaro, Quantum walk speedup of backtracking [83] G. C. Blazey et al., Run II jet physics, in Proceedings of the algorithms, arXiv:1509.02374. QCD and Weak Boson Physics in Run II. Batavia, USA, [71] A. Montanaro, Quantum speedup of branch-and-bound 1999 (Fermilab, Batavia, 2000), pp. 47–77. algorithms, Phys. Rev. Research 2, 013056 (2020). [84] FastJet Contrib, http://fastjet.hepforge.org/contrib/.

094015-20