<<

Quantum Clustering

Esma A¨ımeur [email protected] Gilles Brassard [email protected] S´ebastien Gambs [email protected] Universit´ede Montr´eal, D´epartement d’informatique et de recherche op´erationnelle C.P. 6128, Succursale Centre-Ville, Montr´eal (Qu´ebec), H3C 3J7 Canada

Abstract Multidisciplinary by nature, By the term “quantization”, we refer to the Processing (QIP) is at the crossroads of process of using in order science, mathematics, physics and engineering. It con- to improve a classical , usually by cerns the implications of quantum mechanics for infor- making it go faster. In this paper, we initiate mation processing purposes (Nielsen & Chuang, 2000). the idea of quantizing clustering algorithms Quantum information is very different from its classi- by using variations on a celebrated quantum cal counterpart: it cannot be measured reliably and it algorithm due to Grover. After having intro- is disturbed by observation, but it can exist in a super- duced this novel approach to unsupervised position of classical states. Classical and quantum learning, we illustrate it with a quantized information can be used together to realize wonders version of three standard algorithms: divisive that are out of reach of classical information processing clustering, k-medians and an algorithm for alone, such as being able to factorize efficiently large the construction of a neighbourhood graph. numbers, with dramatic cryptographic consequences We obtain a significant speedup compared to (Shor, 1997), search in a unstructured database with a the classical approach. quadratic speedup compared to the best possible clas- sical algorithms (Grover, 1997) and allow two people to communicate in perfect secrecy under the nose of an 1. Introduction eavesdropper having at her disposal unlimited comput- ing power and technology (Bennett & Brassard, 1984). Unsupervised learning is the part of machine learning Machine learning and QIP may seem a priori to have whose purpose is to give to machines the ability to find little to do with one another. Nevertheless, they have some structure hidden within data. Typical tasks in already met in a fruitful manner (see the survey of unsupervised learning include the discovery of “natu- Bonner & Freivalds, 2002, for instance). In this paper, ral” clusters present in the data (clustering), finding we seek to speed-up some classical clustering algo- a meaningful low dimensional representation of the rithms by drawing on QIP techniques. It is impor- data (dimensionality ) or learning explicitly a tant to have efficient clustering algorithms in domains probability function (also called density function) that for which the amount of data is huge such as bioinfor- represents the true distribution of the data (density matics, astronomy and Web mining. Therefore, it is estimation). Given a training data set, the goal of a natural to investigate what could be gained in perform- clustering algorithm is to group similar datapoints in ing these clustering tasks if we had the availability of the same cluster while putting dissimilar datapoints a quantum computer. in different clusters. Some possible applications of clustering algorithms include: discovering sociological The outline of the paper is as follows. In Section 2, groups existing within a population, grouping auto- we review some basic concepts of QIP, in particular matically molecules according to their structures, clus- Grover’s algorithm and its variations, which are at the tering stars according to their galaxies, and gathering core of our clustering algorithm quantizations. In Sec- news or papers according to their topic. tion 3, we introduce the concept of quantization as well

th as the model we are using. We also briefly explain Appearing in Proceedings of the 24 International Confer- in that section the quantum subroutines based on ence on Machine Learning, Corvallis, OR, 2007. Copyright 2007 by the author(s)/owner(s). Grover’s algorithm that we are exploiting in order to Quantum Clustering Algorithms speed-up clustering algorithms. Then, we give a quan- tum information, whereas a double line carries classical tized version of divisive clustering, k-medians and the information; M denotes a . construction of a c-neighbourhood graph, respectively,  1 in Sections 4, 5 and 6. Finally, we conclude in Section 7 0 with probability /2 |0i H M 1 with a discussion of the issues that we have raised. 1 with probability /2

Figure 1. Example"%#$ of a simple . 2. Quantum Information Processing In this very simple example, we apply a Walsh– Quantum information processing draws its uncanny √1 √1 Hadamard gate to state |0i, which yields 2 |0i+ 2 |1i. power from three quantum resources that have no clas- The subsequent measurement produces either 0 or 1, sical counterpart. Quantum parallelism harnesses the √1 2 1 each with probability | 2 | = /2, and the state col- superposition principle and the linearity of quantum lapses to the observed classical value. This circuit can mechanics in order to compute a function simulta- be seen as a perfect random bit generator. neously on arbitrarily many inputs. Quantum inter- ference makes it possible for the logical paths of a The notion of has a natural extension, which is computation to interfere in a constructive or destruc- the quantum register. A quantum register |ψi, com- n tive manner. As a result of interference, computa- posed of n , lives in a 2 -dimensional Hilbert P2n−1 tional paths leading to desired results can reinforce space. Register |ψi = i=0 αi|ii is specified by com- one another, whereas other computational paths that plex amplitudes α0, α1,. . . , α2n−1 subject to normal- P 2 would yield an undesired result cancel each other out. ization condition |αi| = 1. Here, basis state |ii Finally, there exist multi-particle quantum states that denotes the binary encoding of integer i. Unitary oper- cannot be described by an independent state for each ations can also be applied to two or more qubits. For- particle (Einstein, Podolsky & Rosen, 1935). The cor- tunately (for implementation considerations), any uni- relations offered by these states cannot be reproduced tary operation can always be decomposed in terms of classically (Bell, 1964) and constitute an essential unary and binary gates. However, doing so efficiently resource of QIP called entanglement. (by a polynomial-size circuit) is often nontrivial. Figure 2 illustrates the process by which a function f 2.1. Basic Concepts is computed by a quantum circuit C. Because unitary In this section, we briefly review some essential notions operations must be reversible, we cannot in general of QIP. A detailed account of the field can be found simply go from |xi to |f(x)i. Instead, we must map in the book of Nielsen and Chuang (2000). A qubit (or |x, bi to |x, b + f(x)i, where the addition is performed quantum bit) is the quantum analogue of the classical in an appropriate finite group and the second input is a bit. In contrast with its classical counterpart, a qubit quantum register of sufficient size. In case of a Boolean can exist in a superposition of states. For instance, an function, |bi is a single qubit and we use the sum mod- can be simultaneously on two different orbits ulo 2, also known as the exclusive-or and denoted “⊕”. of the same atom. Formally, using the Dirac notation, In all cases, it suffices to set b to zero at the input of a qubit can be described as |ψi = α|0i + β|1i where α the circuit in order to obtain f(x). and β are complex numbers called the amplitudes of |xi |xi classical states |0i and |1i, respectively, subject to the C normalization condition that |α|2 + |β|2 = 1. When |bi |b + f(x)i state |ψi is measured, either |0i or |1i is observed, with probability |α|2 or |β|2, respectively. Further- Figure 2. Unitary computation of function f. more, are irreversible because the state When f is a Boolean function, it is often more con- of the system collapses to whichever value (|0i or |1i) venient to compute f in a manner that would have has been observed, thus losing all memory of former no classical counterpart: if x is the classical input, we amplitudes α and β. flip its quantum phase from + |xi to − |xi (or vice All other operations allowed by quantum mechan- versa) precisely when f(x) = 1. This process, which is ics are reversible (and even unitary). They are rep- achieved by the circuit given in Fig. 3, is particularly resented by gates, much as in a classical circuit. |xi (−1)f(x) |xi For instance, the Walsh–Hadamard gate H maps |0i to C √1 |0i + √1 |1i and |1i to √1 |0i − √1 |1i. Figure 1 illus- 2 2 2 2 |1i H H |1i trates the notions seen so far, where time flows from left to right. Note that a single line carries quan- Figure 3. a function by phase flipping. Quantum Clustering Algorithms interesting when it is computed on a superposition of 3. Quantization of Clustering all (or some) inputs. That operation plays a key role Algorithms in Grover’s algorithm (Section 2.2). As a motivating example, consider the following sce- 2.2. Grover’s Algorithm and Variations nario, which corresponds to a highly challenging clus- tering task. Imagine that you are an employee of the In the original version of Grover’s algorithm (Grover, Department of Statistics of the United Nations. Your 1997), we are given a Boolean function f as a black boss comes to you with the complete demographic data box and we are promised that there exists a unique of all the inhabitants of Earth and asks you to analyse x0 such that f(x0) = 1. Classically, finding that x0 this data with a clustering algorithm in the hope of would require an average of n/2 queries of the black discovering meaningful clusters. Seeing how reluctant box, where n is the number of points in the domain you seem to be in front of all this data, he tells you of f. Grover’s√ algorithm solves the same problem after not to worry because, in order to help you achieve roughly n accesses to the black box, but of course this task, he was able to “borrow” the prototype of those accesses are made in . a full-size quantum computer from the National Secu- Grover’s algorithm starts by using a stack of Walsh– rity Agency. Can this quantum computer be used to Hadamard gates on the all-zero state in order to create speed-up the clustering process? an equal superposition of all possible inputs. It then By the term quantization, we refer to the process of proceeds by repeating the so-called Grover iteration, starting from a classical algorithm and converting it which is composed of two steps: a call to the quantum into a quantum algorithm in order to improve it 1, circuit given in Fig. 3, which flips the phase of the generally by making it go faster. The first quantized unknown x such that f(x) = 1 (the “target state”) clustering algorithm, although it was not developed and an inversion about the average, which is indepen- for this purpose, is due to D¨urr, Heiligman, Høyer and dent of f. This iteration has to be repeated roughly Mhalla (2004). They have studied the quantum query π √ 4 n times. The effect of a single Grover iteration is complexity of graph problems and developed among to slightly increase the amplitude of the target state, other things a quantized version of Bor˚uvka’s algo- while decreasing the amplitudes of the other states. rithm (1926), capable of finding the minimum span- After the right number of Grover iterations, the ampli- ning tree of a graph in a time in Θ(n3/2), where n is tude of the target state is very close to 1, so that we are the number of vertices in the graph 2. Suppose that almost certain to obtain it if we measure the register each datapoint xi of the training set is represented at that time. by a vertex and that each pair of vertices (xi, xj) is Following Grover’s original idea, generalizations of his linked by an edge whose weight is proportional to some algorithm have been developed that deal with the distance measure Dist(xi, xj). Once the minimal span- case in which there are more than a single x so that ning tree of this graph has been computed, it is easy π p to group the datapoints into k clusters by removing f(x) = 1. In that case, roughly 4 n/t Grover iter- ations should be applied before measuring (Boyer, the k − 1 longest edges of this tree. Brassard, Høyer & Tapp, 1998), where t is the num- Although related, the task of quantizing clustering ber of solutions. In case the number t of solutions is algorithms should not be confused with the design of unknown, the same paper shows that it remains pos- classical clustering algorithms inspired from quantum sible to find one of them in a time proportional to mechanics (Horn and Gottlieb 2001; 2002) or the task p n/t. Other extensions of Grover’s algorithm have of performing clustering directly on quantum states been developed, in which it is possible to count (either (A¨ımeur,Brassard & Gambs, 2006). exactly or approximately) the number of solutions (Brassard, Høyer, Mosca & Tapp, 2002). 3.1. The Model Several applications of Grover’s algorithm have been In traditional clustering, the assumption is made that developed to find the minimum of a function (D¨urr the training data set Dn is composed of n points, & Høyer, 1996) and the c smallest values in its image √ denoted Dn = {x1, . . . , xn}. Each datapoint x corre- (D¨urr, Heiligman,√ Høyer & Mhalla, 2004) after Θ( n ) and Θ( cn ) calls on the function, respectively. Other 1 Not to be confused with an alternative meaning of applications can approximate the median or related quantization, which is to divide a continuous space into discrete pieces. statistics (Nayak & Wu, 1999) with a quadratic gain 2 In the case of a complete graph, all possible classical compared to the best possible classical algorithms. algorithms require a time in Ω(n2). Quantum Clustering Algorithms

d |ii |ii sponds to a vector of attributes. For instance, x ∈ R if points are described by d real attributes. The goal of |ji O |ji a clustering algorithm is to partition the set Dn in sub- sets of points called clusters, such that similar objects |bi |b + Dist(xi, xj)i are grouped together within the same cluster (intra- Figure 4. Illustration of the distance oracle: i and j are the similarity) and dissimilar objects are put in differ- indexes of two points from D and Dist(x , x ) represents ent clusters (inter-dissimilarity). A notion of distance n i j the distance between them. The addition b + Dist(xi, xj ) (or a similarity measure) between each pair of points is performed in an appropriate finite group between the is assumed to exist and is used by the algorithm to ancillary register b and the distance Dist(xi, xj ). decide on how to form the clusters. The explicit construction of O from a particular train- In this paper, we depart from this traditional setting ing set Dn is a fundamental issue, which we discuss in by adopting instead the framework of the black box Section 7. For now, we simply assume that the clus- model. Specifically, we assume that our knowledge tering instance to be solved is given as a black box, concerning the distance between points of the training which is the usual paradigm in quantum information data set is available solely through a black box, also processing as well as in Angluin’s classical model. known as an “oracle”. We make no a priori assump- tions on the properties of this distance, except that it 3.2. Quantum Subroutines is symmetric 3 and non-negative. (In particular, the triangle inequality need not hold.) This model is close In this section, we present three quantum subroutines, in spirit to the one imagined by Angluin (1988), which which we are going to use in order to accelerate classi- is used in computational learning theory to study the cal clustering algorithms. All these subroutines are query complexity of learning a function given by a variations on Grover’s algorithm. In fact, the first black box. A quantum analogue of Angluin’s model two are straightforward applications of former work by has been defined by Servedio (2001). The main differ- D¨urr et al. (1996; 2004), although they are fine-tuned ence between Angluin’s model and ours is that we are for our clustering purposes. The third subroutine is a not interested in learning a function but rather in per- novel, albeit simple, application of Grover’s algorithm. 4 forming clustering . The quant find max algorithm described below (Algo- In the classical black-box setting, a query corresponds rithm 1) is directly inspired by the algorithm of D¨urr to asking for the distance between two points xi and xj and Høyer (1996). It serves to find the pair of points by providing indexes i and j to the black box. In accor- that are farthest apart in the data set (the distance dance with the general schema given in Fig. 2, the cor- between those two points is called the “diameter” of responding quantum black box is illustrated in Fig. 4; the data set). A similar algorithm, which we do not we call it O (for “oracle”). In particular, it is pos- need in this paper, would find the datapoint that is sible to query the quantum black box in superposi- most distant from one specific point. tion of entries. For instance, if we apply the Walsh– Hadamard gate to all the input qubits initially set Algorithm 1 quant find max(Dn) to |0i (but leave the |bi part to |0i), we can set the Choose at random two initial indexes i and j entry to be a superposition of all the pairs of indexes Set dmax = Dist(xi, xj) of datapoints. In that case, the resulting output is a repeat 5 superposition of all the triples |i, j, Dist(xi, xj)i. Using Grover’s algorithm, find new indexes i and j

3 such that Dist(xi, xj) > dmax provided they exist; If the distance is not symmetric, the algorithms pre- Set d = Dist(x , x ) sented here can easily be modified at no significant increase max i j in the running time. until no new i, j are found 4 We are not aware of prior work in the study of clus- return i, j tering complexity in Angluin’s model, be it in the classical or quantum setting. However, a similar problem has been considered in the classical PAC (Probably Approximately The algorithm starts by choosing uniformly at random Correct) learning setting (Mishra, Oblinger & Pitt, 2001). two indexes i and j. A first guess for the diameter is The issue was to study the number of queries that are obtained simply as dmax = Dist(xi, xj). By virtue of necessary to learn (in the PAC learning sense) a specific the phase-flipping circuit described in Figures 5 and 6, clustering from a class of possible clusterings. Grover’s algorithm is then used to find a new pair (i, j) 5 Not to be confused with simply a superposition of all the distances between pairs of points, which would make of points, if it exists, such that Dist(xi, xj) > dmax. no quantum sense in general. If no such pair exists, we have found the diameter and the algorithm terminates. Otherwise, the tenta- Quantum Clustering Algorithms

|ii  points is minimum. This notion of median is partic- ± |ii|ji |ji O O ularly intuitive in the L1–norm sense but can be gen- eralized to other situations (see the survey of Small, |0i |0i 1990, for instance).

|dmaxi P |dmaxi Finding the median can be done classically by com- puting for each point inside the set its sum distance |1i H H |1i to all the other points and taking the minimum. This process requires a time in Θ(m2), again when m is the Figure 5. Phase-flipping component of Grover’s algorithm, number of points considered. In the general case in in which the output is identical to the input, except that the global phase of |ii|ji is flipped if and only if which there are no restrictions on the distance func- tion, we are not aware of a more efficient classical Dist(xi, xj ) > dmax. See Fig. 6 for definition of P. approach. Quantum mechanically, we can easily build the quantum circuit illustrated in Fig. 7, which takes |di |di |ii as input, 1 ≤ i ≤ m, and computes the sum of the distances between z and all the other points in Q . |dmaxi P |dmaxi i m For this, it suffices to apply the black box of Fig. 4 suc- |bi |b ⊕ [d > dmax]i cessively with each value of j, 1 ≤ j ≤ m. (We assume that Dist(z , z ) = 0.) This takes a time in Θ(m), but Figure 6. Sub-circuit P for use in Fig. 5, where [x] is the i i Iverson bracket defined by [x] = 0 if x is false and [x] = 1 see Section 7 for possible improvements. otherwise, and “⊕” denotes the exclusive-or. |ii |ii O Pm tive distance dmax is updated to be Dist(xi, xj) and |bi |b + j=1 Dist(zi, zj)i the procedure is repeated. It follows from the analysis Figure 7. Computing the sum of distances between z and of D¨urr and Høyer (1996) that convergence happens i all the other points in the set Q = {z , . . . , z }. after an expected number of queries in the order of m 1 m √ p, where p = n2 is the number of pairs of datapoints, The minimum-finding algorithm of D¨urr and Høyer hence the total number of queries is in O(n). (1996) can then be used to find√ the minimum such sum over all possible zi with Θ( m ) applications of the The second subroutine we are going to use for circuit of Fig. 7. Since each application of the circuit the quantization of classical clustering algorithms is takes a time in Θ(√m), the overall time to compute the directly inspired by the algorithm for finding the c median is in Θ(m m ) = Θ(m3/2). smallest values of a function, due to D¨urr, Heilig- man, Høyer and Mhalla (2004). We call this sub- routine quant find c smallest values. Finding the min- 4. Divisive Clustering imum of a function can be seen as a special case of One of the simplest ways to build a hierarchy of clus- the application of this algorithm for the case c = 1. ters is by starting with all the points belonging to Using the approach that we have just explained for a single cluster. The next step is to split this clus- quant find max, it is possible to adapt this algorithm ter into two subclusters. For this purpose, the two to search for√ the c closest neighbours of a point in a datapoints that are the farthest apart are chosen as time in Θ( cn ). seeds. Afterwards, all the other points are attached Our third and last subroutine is a novel algorithm, to their nearest seed. This division technique is then which we call quant cluster median, for computing the applied recursively on the resulting subclusters until all the points contained inside a cluster are sufficiently median of a set of m points Qm = {z1, . . . , zm}. When similar. See Algorithm 2 for details. the zi’s are simply numbers or, more generally, when all the points are colinear, the quantum algorithm of The part of this algorithm that is the most costly is to Nayak and Wu√ (1999) can be used to find the median find the two farthest points within the initial data set in a time in Θ( m ). However, we shall need to find d of n points. If the datapoints are given as vectors in R medians in the more general case in which all we know for an arbitrarily high dimension d, this process gener- about the points is the distance between each pair ally requires Θ(n2) comparisons 6. Quantum mechan- (the triangle inequality need not hold), when the algo- ically, however, we can use quant find max as a sub- rithm of Nayak and Wu (1999) does not apply. 6 However, if d is small (such as d = 1, 2 or 3) and we By definition, the median of Qm is a point within the are using a metric such as the Euclidean distance, linear or set whose sum (or average) distance to all the other subquadratic algorithms are known to exist. Quantum Clustering Algorithms

Algorithm 2 Div clustering(D) the points is the distance between them, in which case if points in D are sufficiently similar then it may be impossible to compute averages, hence to return D as a cluster apply the k-means algorithm. else Find the two farthest points xa and xb in D Algorithm 3 k-medians(D,k) using quant find max Choose k points uniformly at random to be the for each x ∈ D do initial centres of the clusters Attach x to the closest between xa and xb repeat end for for each datapoint in D do Set Da to be all the points attached to xa Attach it to its closest centre Set Db to be all the points attached to xb end for Call Div clustering(Da) for each cluster Q do Call Div clustering(Db) Compute the median of the cluster and make end if it its new centre end for until (quasi-)stabilization of the clusters routine to this algorithm, which finds the two farthest return the clusters found and their centres points in a time in Θ(n), as we have seen. For the sake of simplicity, let us analyse the situa- In order to analyse the efficiency of one iteration of tion if the algorithm splits the data set in two sub- this algorithm, let us assume for simplicity that the clusters of roughly the same size. 7 This leads to the clusters have roughly the same size n/k. (If not, construction of a balanced tree and the algorithm has the advantage of our quantum algorithm compared to a global running time T (n) given by asymptotic recur- the classical approach will only be more pronounced.) rence T (n) = 2T (n/2) + Θ(n), which is in Θ(n log n). If the medians were computed classically, each of them n 2 1 2 would need a time in Θ(( k ) ), for a total of Θ( k n ) for finding the centres of all k clusters. Quantum 5. k-medians mechanically, we have seen that it is possible to com- The k-medians algorithm, also called k-medoids, pute the median of a cluster of size n/k in a time in n p n (Kaufman & Rousseeuw, 1987) is a cousin of the Θ( k k ) using the quant cluster median subroutine. This yields a running time in Θ( √1 n3/2) for one itera- better-known k-means clustering algorithm. It is an k tion of the quantum k-medians algorithm, which is iterative algorithm, in which an iteration consists of p two steps. During the first step, each datapoint is n/k times faster than the classical approach, every- attached to its closest cluster centre. During the sec- thing else being the same in terms of convergence rate. ond step, the centre of each cluster is updated by choosing among all the points composing this cluster 6. Construction of a c-neighbourhood the one that is its median. The algorithm stops when Graph the centre of the clusters have stabilized (or quasi- stabilized). The algorithm is initialized with k random The construction of a neighbourhood graph is an points chosen as starting centres, where k is a param- important part in several unsupervised learning algo- eter supplied to the algorithm, which corresponds to rithms such as ISOMAP (Tenenbaum, de Silva & Lang- the desired number of clusters. ford, 2000) or the clustering method by (Harel & Koren, 2001). Suppose that the points of the The main difference between k-means and k-medians training set are the vertices of a complete graph, where is that k-means is allowed to use a virtual centroid that an edge between two vertices is weighted according to is simply the average of all the points inside the cluster. the distance between these two datapoints. A c-neigh- In contrast, for k-medians we restrict the centre of the bourhood graph can be obtained by keeping for each cluster to be a “real” point of the training set. One vertex only the edges linking it to its c closest neigh- advantage of k-medians over k-means is that it can be bours. Algorithm 4 gives a quantized algorithm for applied even if the only information available about the construction of a c-neighbourhood graph. 7 Admittedly, this is not an altogether realistic assump- For each datapoint, we can find its c closest neighbours tion, especially if the data set contains outliers. However, √ in that case, we should begin by following the usual classi- in a time in Θ( cn ) using quant find c smallest values. 3/2 cal practice of detecting and removing those outliers before This leads to a total cost in Θ(n ) for computing the proceeding to divisive clustering. global c-neighbourhood graph, provided we set c to be Quantum Clustering Algorithms

Algorithm 4 c-neighbourhood graph construction(D,c) sum of a set of values instead of simply adding them

for each datapoint xi of D do one by one as we propose in Fig. 7. Currently known Use quant find c smallest values to find the c algorithms to estimate the average (Grover, 1998) can- closest neighbours of xi not be used directly because of precision issues, but for each c closest neighbours of xi do methods based on amplitude estimation (Brassard, Create an edge between xi and the current Høyer, Mosca & Tapp, 2002) are promising. neighbour, which is proportional to the In order to make a fair comparison between a classical distance between these two points clustering algorithm and its quantized counterpart, it end for is also important to consider the best possible classical end for algorithm and the advantage that can be gained if we return the computed graph have a full description of the datapoints, rather than just the distance between them. For instance, in the case of the construction of a c-neighbourhood graph, a constant. Classically, if we have to deal with an arbi- we have seen in Section 6 that classical kd-trees can trary metric and we know only the distance between be used to compute this graph so efficiently that it pairs of points, this would require a time in the order may not be possible to gain a significant improvement of Θ(n2) to find the closest neighbours for each of the by quantizing the algorithm. It is therefore important n points. However, if we have access for each data- to study also the lower bounds that can be achieved point to all the d attributes that describe it, it is pos- for different clustering settings, both classically and sible to use Bentley’s multidimensional binary search quantum mechanically. In particular, in which situa- trees, known as kd-trees 8 (Bentley, 1975), to find the tion can (or cannot) the quantized version provide a c closest neighbours of a specific datapoint in a time significant improvement? For instance, in the case of in Θ(c log n). The construction of the kd-tree requires clustering with a minimal spanning tree, D¨urr, Heilig- to sort the datapoints according to each dimension, man, Høyer and Mhalla (2004) have proved that their which can be done in a time in Θ(dn log n), where d is algorithm is close to optimal. It follows that no clus- the dimensionality of the space in which the datapoints tering algorithm based on the construction of a mini- live and n is the number of datapoints. mal spanning tree, be it quantum or classical, can do better than Ω(n3/2). 7. Discussion and Conclusion Among the possible extensions to the study initiated in this paper, we note that the quantization approach In this paper, we have seen how to speed up a selection could be applied to other clustering algorithms. More- of classical clustering algorithms by quantizing some over, this quantization does not need to be restricted of their parts. However, the approach we have used is to using only variations on Grover’s algorithm: it could not necessarily realistic because it requires the avail- also use other techniques from the quantician’s tool- ability of a quantum black box that can be used to box, such as quantum random walks (Ambainis, 2003) query the distance between pairs of points in superpo- or quantum Markov chains (Szegedy, 2004). Devel- sition. Even though this is the model commonly used oping entirely new quantum clustering algorithms in quantum information processing, we reckon that, in instead of simply quantizing some parts of classical real life, we might not be given directly such a black algorithms is a most interesting research avenue, which box. Instead, we would be more likely to be given a could lead to more spectacular savings. Finally, we be- training data set Dn that contains the description of lieve that the quantization paradigm could also be ap- n datapoints. An important issue is how to construct plied to other domains of machine learning, such as di- ourselves, from this training set, an efficient quantum mensionality reduction and the training of a classifier. circuit that has the same functionality as the black box we had assumed throughout this paper. We recognize that this is a fundamental question, but it is currently Acknowledgements beyond the scope of this paper. We are grateful to the reviewers and Senior Program We believe that our quantized version of the k-medians Committee for their numerous suggestions. We also algorithm (Section 5) can be improved even further thank Alain Tapp for enlightening discussions. This by developing a quantum algorithm to estimate the work is supported in part by the Natural Sciences and 8 Originally, “kd tree” stands for “k-dimensional tree”. Engineering Research Council of Canada, the Canada Of course those trees would be d-dimensional in our case Research Chair programme, the Canadian Institute for but it would sound funny to call them “dd trees”! Advanced Research and QuantumWorks. Quantum Clustering Algorithms

References Grover, L. K. (1998). A framework for fast quan- tum mechanical algorithms. Proceedings of the A¨ımeur, E., Brassard, G. & Gambs, S. (2006). 30th ACM Symposium on Theory of Computing: Machine learning in a quantum world. Proceedings STOC’98 (pp. 53–62). of Canadian AI 2006 (pp. 433–444). Harel, D. & Koren, Y. (2001). On clustering using ran- Ambainis, A. (2003). Quantum walks and their dom walks. Proceedings of the 21st Conference on algorithmic applications. International Journal of Foundations of Software Technology and Theoretical Quantum Information, 1, 507–518. Computer Science: FSTTCS’01 (pp. 18–41). Angluin, D. (1988). Queries and concept learning. Horn, D. & Gottlieb, A. (2001). The method of quan- Machine Learning, 2, 319–342. tum clustering. Proceedings of the Neural Informa- Bell, J. (1964). On the Einstein-Podolsky-Rosen para- tion Processing Systems: NIPS’01 (pp. 769–776). dox. Physics, 1(3), 195–200. Horn, D. & Gottlieb, A. (2002). Algorithm for data Bennett, C. H. & Brassard, G. (1984). Quantum cryp- clustering in pattern recognition problems based on tography: Public key distribution and coin toss- quantum mechanics. , 88(1). ing. Proceedings of the IEEE Conference on Com- Kaufman, L. & Rousseeuw, P. (1987). Clustering by puters, Systems and Signal Processing, Bangalore, means of medoids. in Statistical Data Analysis Based India (pp. 175–179). on the L1–Norm and Related Methods, Y. Dodge Bentley, J. L. (1975). Multidimensional binary search (editor), North-Holland, Amsterdam (pp. 405–416). trees used for associative searching. Communica- Mishra, N., Oblinger, D. & Pitt, L. (2001). Sub- tions of the ACM, 18(9), 509–517. linear time approximate clustering. Proceedings Bonner, R. & Freivalds, R. (2002). A survey of of 12th ACM-SIAM Symposium on Discrete Algo- quantum learning. Proceedings of the Workshop on rithms: SODA’01 (pp. 439–447). Quantum Computation and Learning (pp. 106–119). Nayak, A. & Wu, F. (1999). The quantum query Bor˚uvka, O. (1926). O jist´emprobl´emu minim´aln´ım. complexity of approximating the median and related Pr´ace Moravsk´e Pˇrirodovˇedeck´e Spoleˇcnosti, 3, statistics. Proceedings of the 31st ACM Symposium 37–58. on Theory of Computing: STOC’99 (pp. 384–393). Boyer, M., Brassard, G., Høyer, P. & Tapp, A. (1998). Nielsen, M. & Chuang, I. (Eds.). (2000). Quan- Tight bounds on quantum searching. Fortschritte tum Computation and Quantum Information. Cam- Der Physik, 46, 493–505. bridge University Press. Brassard, G., Høyer, P., Mosca, M. & Tapp, A. (2002). Servedio, R. (2001). Separating quantum and classi- Quantum amplitude amplification and estimation. cal learning. Proceedings of the International Con- Contemporary Mathematics, 305, 53–74. ference on Automata, Languages and Programming: ICALP’01 (pp. 1065–1080). D¨urr, C., Heiligman, M., Høyer, P. & Mhalla, M. (2004). Quantum query complexity of some graph Shor, P. W. (1997). Polynomial-time algorithms for problems. Proceedings of the International Confer- prime factorization and discrete logarithms on a ence on Automata, Languages and Programming: quantum computer. SIAM Journal of Computing, ICALP’04 (pp. 481–493). 26, 1484–1509. D¨urr, C. & Høyer, P. (1996). A quantum algorithm for Small, C. G. (1990). A survey of multidimensional finding the minimum. Available on http://arxiv. medians. International Statistical Review, 58(3), org/quant-ph/9607014. 263–277. Einstein, A., Podolsky, B. & Rosen, N. (1935). Can Szegedy, M. (2004). Quantum speed-up of Markov quantum mechanical description of physical reality chain based algorithms. Proceedings of 45th IEEE be considered complete? Physical Review, 47, Symposium on Foundations of Computer Science: 777–780. FOCS’04 (pp. 32–41). Grover, L. K. (1997). Quantum mechanics helps in Tenenbaum, J. B., de Silva, V. & Langford, J. C. searching for a needle in a haystack. Physical Review (2000). A global geometric framework for nonlinear Letters, 79(2), 325–328. dimensionality reduction. Science, 290, 2319–2323.