Order Statistics in the Farey Sequences in Sublinear Time

Jakub Pawlewicz

Institute of Informatics, Warsaw University Banacha 2, 02-097 Warsaw, Poland [email protected]

Abstract. The paper presents the first sublinear algorithm for comput- ing order statistics in the Farey sequences. The algorithm runs in time √ O(n3/4 log n) and in space O( n ) for of order n. This is a significant improvement to the algorithm from [1] that runs in time O(n log n).

1 Introduction

The Farey sequence of order n (denoted Fn) is the increasing sequence of all irreducible from interval [0, 1] with denominators less than or equal to n. The Farey sequences have numerous interesting properties and they are well known in the number theory and in the combinatorics. They are deeply investigated in [2]. In this paper we study the following algorithmic problem. For given positive integers n and k compute the k-th element of the Farey sequence of order n. This problem is known as order statistics problem. The solution to the order statistics problem is based on a solution to a related rank problem, i.e. the problem of finding the rank of a given in the Farey sequence. Both, the order statistics problem and the rank problem, can be easily solved in quadratic time by listing all elements of the sequence (see [2, Problem 4-61] and Section 2.1). Faster solutions for order statistics are possible by reducing the main problem to the rank problem. The roughly linear time algorithm is presented in [1]. The authors present a solution of the rank problem working in time O(n) and in sublinear space. They also show how to reduce the order statistics problem to the rank problem by calling O(log n) instances of the rank problem. This gives an algorithm running in O(n log n) time. They remark that their solution to the rank problem could run in time O(n5/6+o(1)) if it was possible to compute Pn the sum i=1bxic, for a rational x, in this time or faster. This sum is related to counting lattice points in right triangles. A simple algorithm for that task running in logarithmic time can be found in [3]. Nevertheless, for completeness we present a simple logarithmic algorithm computing that sum in Section 3. The O(n5/6+o(1)) solution is complicated. For instance it involves summation of Möbius function and subexpotential integer factorization. In Section 2.3 we 2 Jakub Pawlewicz

3/4 present a simple algorithm√ for the rank problem with time complexity O(n ) and space complexity O( n ). We assume RAM as a model of computation.1 What remains is to show a faster reduction in order to find order statistics in sublinear time. In [1] the reduction was made in two stages. The first stage  j j+1  consists of finding out the interval n , n containing the k-th term in the sequence. The interval is computed by a binary search with calling the rank problem O(log n) times. The second stage tracks the searched term by checking all fractions in the interval. Since there are at most n such fractions, in the worst case this stage runs in O(n) time, which dominates time complexity of  j j+1  the reduction. In [1] the authors proposed to take smaller interval n2 , n2 , since this interval contains exactly one fraction. However, there is a problem of tracking that fraction, which is not solved by them. In Section 2.2 we show solution to that problem. We also show another, more direct reduction running in logarithmic time and using O(log n) calls to the rank problem. This reduction is obtained by exploring Stern–Brocot tree in a smart way.

2 Computing order statistics in the Farey sequences

2.1 An O(n2) time algorithm

We show two O(n2) time methods. The working time follows the number of 3 2 elements in the sequence Fn, which is asymptotically equal to π2 n . We use the following property of the Farey sequences. For two consecutive a c fractions b < d in a Farey sequence, the first fraction that appears between them a+c is b+d – the mediant of these two fractions. The mediant is already reduced and it firstly appears in Fb+d. Using this property one can successively compute all fractions. That way the Stern–Brocot tree2 is obtained. Farey fractions form a subtree. In–order traversal gives an O(n2) time and O(n) space algorithm. Space complexity depends on the depth of the Farey tree. The second O(n2) method is a straightforward application of a surprising a c e formula. For three consecutive fractions b < d < f the following holds:

e tc − a b + n = , where t = . f td − b d

That method works in optimal O(1) space.

2.2 Reduction to the rank problem

In order to achieve a solution faster than quadratic one we reduce the problem of finding given term of the Farey sequence to a problem of counting the number of

1 In RAM (Random Access Machine) single cell can store arbitrarily large integers. Cell access or arithmetic operations on cells are performed in constant time. Also memory complexity is measured in cells. 2 For a description of Stern–Brocot tree together with its properties we refer to [2]. Order Statistics in the Farey Sequences in Sublinear Time 3 fractions bounded by a real number (the rank problem). To be more precise, for given positive integer n and real number x ∈ [0, 1] we want to find the number a of fractions b belonging to sequence Fn not larger than x. We show how to solve the original problem given an algorithm for the rank problem.

Reduction in linear time. We recall the reduction from [1]. Firstly, the inter-  j j+1  val n , n containing the k-th term is searched by a binary search starting from  0 n   l r   l m  the interval n , n , splitting interval n , m into two smaller intervals n , n  m r   l+r   j j+1  and n , n , where m = 2 . Next, we track the fraction in n , n . Because 1 the size of the interval is n , it contains at most one fraction with denomina- tor b ≤ n. That fraction can be found in constant time since numerator must  (j+1)b−1  be n . We check all such fractions for all possible denominators. Total tracking time is O(n) and whole reduction also works in time O(n). It is sufficient to use the above reduction to construct roughly linear time so- lution, but it is not enough if we want to create a sublinear algorithm. Therefore, we need faster reduction. We show two reductions with logarithmic time.

Smaller interval. First, we can tune up the construction using intervals. As  j j+1  it was suggested in [1] we can find smaller interval n2 , n2 also by a binary search. Now in such interval there is at most one fraction in Fn which belongs 1 to that interval. This is because the size of the interval is n2 and because the a c following inequality holds for every two consecutive fractions b < d in sequence Fn: c a 1 1 − = ≤ . d b bd n2

The k-th term of the Farey sequence Fn is the only fraction from Fn in  j j+1  the interval n2 , n2 . What is left is to track this fraction. We use the Stern– Brocot tree to this task. The Stern–Brocot tree allows us to explore all irreducible 0 1 fraction in an organized way. We start from two fractions 1 and 0 . These fractions represent the interval where the k-th term resides. We repeatedly narrow that  j j+1  interval to enclose the interval n2 , n2 until we find the fraction. Assume we a c have already narrowed the interval to fractions b and d , such that a j j + 1 c < < ≤ . b n2 n2 d

a+c Then, in a single iteration we split the interval by the mediant b+d . If the mediant  j j+1  falls into the interval n2 , n2 , then the k-th term is found and this is the a+c a c a+c mediant b+d . Otherwise, we replace one of the fractions b and d by b+d . If a+c j a a+c j+1 c b+d < n2 , we replace fraction b , otherwise b+d ≥ n2 and we replace d . The above procedure guarantees successful tracking. However, the time com- plexity is O(n) since in the worst case n iterations are needed. For instance 1 1 1 for k = 1 we replace the right fraction n times consecutively to 1 , 2 ,..., n . This problem can be solved by grouping successive substitutions of left or right fractions. 4 Jakub Pawlewicz

Suppose we replace several times the left fraction. After first substitution we a+c c a+2c have fractions b+d and d . After second substitution the fractions become b+2d c and d and so on. Generally, after t substitutions of left fractions the final fraction a+tc is equal to b+td . In the above procedure we replace the left fractions as long as a + tc j < . (1) b + td n2 If t is the largest integer satisfying the above inequality then for the next mediant j a+(t+1)c a+(t+1)c j+1 we have n2 ≤ b+(t+1)d . If there is also b+(t+1)d < n2 , then the fraction is found and we can finish the search. Otherwise, the next mediant will substitute the right fraction. We see that we can make all successive iterations replacing the left fraction at once. We only need to determine the value of t. After rewriting (1) we get (n2c − jd)t < jb − n2a. (2)

j c 2 Because n2 < d we know that n c − jd > 0, so (2) is equivalent to jb − n2a t < . n2c − jd The largest t satisfying that inequality is ljb − n2am t = − 1. (3) n2c − jd Analogously we analyze a situation when we replace several times the right ta+c fractions. After t substitutions the right fraction is equal to tb+d . Replacement takes place as long as j + 1 ta + c ≤ . n2 tb + d The largest t satisfying the above inequality is j(j + 1)d − n2ck t = . (4) n2a − (j + 1)b We conclude that the procedure of tracking the fraction from the interval  j j+1  n2 , n2 can be much faster if we group steps in one direction. In the first iteration we make all steps to the left replacing the right fraction. Then, in the next iteration, we make all steps to the right replacing the left fraction. Next, we make all steps to the left and so on until we find the fraction from the given interval. In a single iteration, if we are going to the right, we replace the left a+tc fraction by b+td where t is given by (3) and if we are going to the left, we ta+c replace the right fraction by tb+d where t is given by (4). Excluding the first iteration we know that t is always at least one since the next mediant has to replace the opposite fraction. It means that the denominator is always replaced at least by a sum of the previous two denominators. Therefore, the sequence of successive denominators increases as fast as Fibonacci numbers. Thus, the number of iterations in this procedure is O(log n). Order Statistics in the Farey Sequences in Sublinear Time 5

Exploring Stern–Brocot tree directly. Suppose we are able to solve the a rank problem in “reasonable” time. This means that for every fraction b ∈ Fn we can compare it with the k-th term of the sequence Fn. Using only comparisons we can descend the Stern–Brocot tree down to the searched fraction. We start 0 1  from the interval 1 , 0 . Then, we repeatedly split the interval by the mediant and choose the interval containing the searched fraction. That is, if we have a c  a+c interval b , d , we take the mediant b+d and compare it with the k-th term. a+c If the number of fractions in Fn not larger than b+d equals k, then we get the a a+c  result; if it is larger than k, then the term lies in the interval b , b+d ; and if it a+c c  is less than k, then the term lies in b+d , d . As in the previous reduction in the worst case we have to call the rank prob- lem O(n) times and, as previously, we have to optimize the search by grouping moves in a single direction. However, here we cannot give an explicit formula for the number of steps t, because we can only ask on which side of a given fraction the searched term lies. Fortunately, there is a technique for finding t using at most O(log t) questions. The technique can be used when for any integer s we can ask whether s < t or s = t or s > t. First, for successive i = 0, 1, 2,... we check whether 2i < t. If it shows that t = 2i for some i, then we find t in i + 1 questions. Otherwise, we can find the smallest positive integer l such that 2l−1 < t < 2l after l + 1 questions. In this case we perform a binary search for t in the interval (2l−1, 2l). The binary search takes at most l − 1 questions so the whole procedure has at most 2l ≤ 2 log t questions. Using the above technique for grouping steps in a single direction we can achieve a method which asks only at most O(log n) questions. We formalize this method for clearer analysis. Let P0 = 1 and P1 = 0 so that the starting interval is P1 , P0 . In the i-th Q0 0 Q1 1 Q1 Q0 iteration, for i = 2, 3,..., we construct fraction Pi . For even i we move to the Qi left and replace the right fraction. For odd i we move to the right and replace the left fraction. Assume i is even. In that case the interval is Pi−1 , Pi−2 . Here we are moving Qi−1 Qi−2 to the left adding the left fraction to the right as many times as possible. We search for the largest ti such that the searched fraction is not larger than

t P + P i i−1 i−2 . tiQi−1 + Qi−2

Then we replace the right fraction by it, thus Pi = tiPi−1 + Pi−2 and Qi = tiQi−1 + Qi−2. When i is odd we proceed analogously but in the opposite direc- tion. We repeat calculating successive Pi until for some i that fraction is the k- Qi term of the sequence Fn. The above procedure is nothing new. In fact it has 6 Jakub Pawlewicz strict connection with continued fractions. One may prove that P 1 i = Qi 1 t + 2 . . 1 ti−1 + ti

Let us analyze the time complexity. First, observe that every ti is positive. For each i = 2, 3,... we ask at most 2li questions in the i-th iteration, where li−1 li Ph 2 ≤ ti < 2 . Suppose we made h iterations so the searched fraction is Qh and Qh ≤ n. From the recursive formula for Qi = tiQi−1 + Qi−2 we conclude that sequence Qi is increasing as fast as Fibonacci numbers, thus h = O(log n). By inequality Qi ≥ tiQi−1 we have

n ≥ Qh ≥ thQh−1 ≥ thth−1Qh−2 ≥ · · · ≥ th ··· t2Q1 = th ··· t2,

li−1 l +...+l2−(h−1) together with inequality ti ≥ 2 we get n ≥ 2 h , and

l2 + ... + lh ≤ log n + (h − 1) = O(log n). Therefore, the total number of questions is O(log n), since in all iterations we ask at most 2(l2 + ... + lh) questions.

2.3 Solution to the rank problem

Let Sn(x) be the number we are searching for, i.e. the number of irreducible a a fractions b such that b ≤ x and b ≤ n. For simplicity we will sometimes write Sn instead of Sn(x) because in fact x is fixed. Playing with symbol Sn(x) and grouping by gcd we can get a recursive for- mula, which will be a starting point to our algorithm: na a o Sn(x) = b ≤ n ∧ ≤ x ∧ gcd(a, b) = 1 b b na a o X na a o = b ≤ n ∧ ≤ x − b ≤ n ∧ ≤ x ∧ gcd(a, b) = d b b b b d≥2 n X X = bbxc − S n (x). b d c b=1 d≥2 We explain each step. For given constraints the number of irreducible fractions is the total number of fractions minus the number of fractions with gcd of nu- merator and denominator equal or larger 2. It is written in the second line of the equation. The number of fractions with given denominator b less than or equal to x is bbxc, so the number of all fractions less than or equal to x is the sum:

n X bbxc. b=1 Order Statistics in the Farey Sequences in Sublinear Time 7

We should also explain equality

na a o b ≤ n ∧ ≤ x ∧ gcd(a, b) = d = Sb n c(x). b b d

a a0d 0 0 Every fraction b with gcd(a, b) = d has form b0d , where a d = a, b d = b and 0 0 a0 0 n gcd(a , b ) = 1. It means that fraction b0 is irreducible and b ≤ d , since b ≤ n. a0 a The number of irreducible fractions such that 0 = ≤ x is exactly S n (x). b b b d c Let us look again at the recursive formula:

n X X S = bbxc − S n . (5) n b d c b=1 d≥2

In fact, x is always a in our algorithm. In that case the sum Pn bbxc can be calculated in O(polylog(n)). It is shown in Section 3. So the b=1 P only problem is to calculate the sum Sb n c. Let us focus on how many √ d≥2 d different summands there are. For d ≤ n all expressions Sb n c(x) are unique. If √ √ √ √ d d > n, then n < n, so for d > n there are at most n different summands. d √ Therefore, on the right hand side of the formula (5) there are O( n ) summands. Moreover, in deeper level of recursion, occurrence of symbol Si is only possible n if i = b d c, for some positive integer d. This property follows from the equality

 n    j n k d1 = . d2 d1d2

 n  We are left with computing Si where i is from set I = d | d ≥ 1 . We √ nj n k n n o split the set to two sets I1 = {1, 2,..., b nc} and I2 = √ ,..., b c, b c . √ b nc 2 1 Each of these sets has b nc elements and they sum to I. We use dynamic programming to calculate successive Si for increasing i ∈ I. To calculate Si we use formula i X X Si = bbxc − S i . (6) b d c b=1 d≥2 P √  As it was already mentioned actual size of the sum S i is O i . For d≥2 b d c each symbol Sj occurring in the sum we can find its multiplicity in constant n time, simply by finding interval for d values for which b d c = j. Therefore, the √  time complexity√ of calculating the right hand side of (6) is O i . The memory complexity is O( n ), since we have to store only Si for i ∈ I. 3/4 Surprisingly, the above algorithm for calculating all Si works in O(n ). We prove it in two parts. In the first part let us determine the time of calculating Si for all i ∈ I1:

X √ X q√  √ q√  O i  ⊆ O n = O n · n = O(n3/4). √ √ 1≤i≤ n 1≤i≤ n 8 Jakub Pawlewicz

 n  √ For the second part observe that I2 = d | 1 ≤ d ≤ n . Thus, the time complexity of calculating Si for all i ∈ I2 is r ! X  jnk  √ X 1 O = O n √ . √ d √ d 1≤d≤ n 1≤d≤ n

Using the asymptotic equality

X 1 √ √ = O x  1≤i≤x i we get the result √ q√  O n · n = O(n3/4).

Pn  a  3 Computing i=1 b i

Pn a In this section we present a simple algorithm for computing the sum i=1b b ic, a where b is a non-negative irreducible fraction. We remark that a polynomial time algorithm (in size of a, b and n) was previously presented in [4] or in [5]. However in these papers used methods are rather complicated for easy implementation. Much simpler algorithm for computing lattice points in a rational right triangle was presented in [3]. Although our algorithm is very similar, we decided to include a description for this specific task for two reasons. First, since a solution to the rank problem needs to calculate this sum, we want to make the whole procedure to be completed. Second, the sum is a special case of rational right triangle. Therefore, formulas used in an algorithm can be easier determined and slightly simplified comparing to formulas presented in [3]. A graphical representation of the given sum is shown in Fig. 1. The value

a x y = b

0 1 2 3 ··· n

Pn a Fig. 1: Graphical representation of the sum i=1b b ic. of the sum is the number of lattice points in a triangle bounded by X–axis and a lines x = n and y = b x excluding lattice points on X–axis. That representation will help to see properties of the sum. Order Statistics in the Farey Sequences in Sublinear Time 9

Let us denote n Xja k T (n, a, b) = i . b i=1 We develop recursive formulas for T (n, a, b). These formulas lead to a straight- forward polynomial time algorithm (in size of n, a, b).

3.1 Case n ≥ b

If n is divisible by b, we can derive a closed form. Let n = qb and look at Fig. 2. We can easily calculate the number of lattice points in the lower right triangle.

qa . . 2a a 0 0 b 2b ··· qb

Fig. 2: Case when n is divisible by b.

Observe that the lower right triangle is identical to the upper left triangle and it contains the same number of lattice points. Summing up both triangles we get the rectangle with diagonal counted twice. The number of lattice points in the rectangle is (qb + 1)(qa + 1) and the number of points on the diagonal is equal to q + 1. The sum of both values divided by two gives the number of lattice points in a triangle. Now, we subtract the number of lattice points on X-axis getting the result: (qa + 1)(qb + 1) + q + 1 q(qab − b + a + 1) T (qb, a, b) = − (qb + 1) = . (7) 2 2 More generally, suppose n ≥ b and let n = qb+r, where q ≥ 1 and 0 ≤ r < b. The sum can be splitted into three parts:

qb+r qb qb+r qb r X ja k Xja k X ja k Xja k Xja k i = i + i = i + (qb + i) b b b b b i=1 i=1 i=qb+1 i=1 i=1 qb r Xja k Xja k = i + r · aq + i . b b i=1 i=1 10 Jakub Pawlewicz

r Xj a k i qa b i=1

qb r · qa Xj a k i b i=1

qb r n

Fig. 3: Case n ≥ b.

See Fig. 3 for intuition. As a result we get the equation: T (qb + r, a, b) = T (qb, a, b) + rqa + T (r, a, b). (8) As a consequence of the above formula together with equation (7) we can reduce n below b in a single step. Therefore, in the succeeding sections we assume that a n < b. Notice that it also means that there is no integral point on the line y = b x for x = 1, 2, . . . , n.

3.2 Case a ≥ b If a = qb + r for some q ≥ 1 and 0 ≤ r < b, we can rewrite: n n n n n Xja k Xjqb + r k X Xjr k n(n + 1) Xjr k i = i = qi + i = q + i . b b b 2 b i=1 i=1 i=1 i=1 i=1 Thus in this case we have the formula: n(n + 1) T (n, qb + r, b) = q + T (n, r, b). (9) 2

a 3.3 Inverting b P a P b We use graphical representation to correlate sums b b ic and b a ic in one Pn a equation. In Fig. 4 area labelled S1 represents the sum i=1b b ic. The largest a x and y coordinates of lattice points in this area are n and b b nc respectively. Consider the rectangular set R of lattice points with x coordinates spanning a from 1 to n and with y coordinates spanning from 1 to b b nc. This set has size a nb b nc. Let S2 be complement of S1 in R. We assumed that n < b and there is a a no element in R lying on the line y = b x. Therefore, for given j = 1,..., b b nc, the number of lattice points with y coordinate set to j in area S2 is equal to b a nc b P b b b a jc. Hence, the size of S2 is j=1 b a jc. Since |S1| + |S2| = |R| we have

a n b nc Xja k Xb j b k ja k i + j = n n . b a b i=1 j=1 Order Statistics in the Farey Sequences in Sublinear Time 11

j a k n b

S2

S1

n

P a P b Fig. 4: Graphical representation of sums b b ic and b a ic.

Thus, the last recursive formula is ja k ja k  T (n, a, b) = n n − T n , b, a . (10) b b It allows to swap a with b in T (·, a, b). It can be used to make a ≥ b. Notice that after swapping a with b our assumption that n < b holds since if n < b, then a a b n < a and b b nc < a.

3.4 Final algorithm Combining presented recursive formulas for T (n, a, b) we can design the final algorithm. The procedure is similar to the Euclidean algorithm. First, if n ≥ b reduce n using (8) making n < b, then repeat the following steps until n or a reaches zero. If a < b, use (10) to exchange a with b. Next, use (9) to reduce a to a mod b. The number of steps in the above procedure is O(log max(n, a, b)) as it is in the Euclidean algorithm. The algorithm is fairly simple and it can be written in the recursive fashion.

4 Summary and remarks

We presented a simple sublinear algorithm for the rank problem.√ We showed that this algorithm has O(n3/4) time complexity and needs O( n ) space. The order statistics problem was reduced to the rank problem. We included two reductions. Both call the rank problem O(log n) times and run in O(log n) time. Therefore, we showed that the order statistics in the Farey sequences can be computed in O(n3/4 log n) time. In reduction exploring the Brocot–Stern tree, we showed how to find a ra- tional number if we are only allowed to compare it with fractions. We remark that this technique can be used in other fields. For instance, it can be used to expand a real number into a . We do not need the value of this number. We only need a comparison procedure between that number and an arbitrary fraction. For instance, numbers with such property are algebraic numbers. However, in this case there are other methods of expanding them into 12 Jakub Pawlewicz continued fractions [6]. The usefullness of the presented technique should be further investigated.

References

1. Pătras, cu, C.E., Pătras, cu, M.: Computing order statistics in the Farey sequence. In Buell, D.A., ed.: Algorithmic Number Theory. Volume 3076 of LNCS. Springer, Heidelberg (2004) 358–366 2. Graham, R.L., Knuth, D.E., Patashnik, O.: Concrete Mathematics. 2nd edn. Addison-Wesley, London, UK (1994) 3. Yanagisawa, H.: A simple algorithm for lattice point counting in rational polygons. Research report, IBM Research, Tokyo Research Laboratory (August 2005) 4. Barvinok, A.I.: A polynomial time algorithm for counting integral points in polyhe- dra when the dimension is fixed. Mathematics of Operations Research 19(4) (1994) 769–779 5. Beck, M., Robins, S.: Explicit and efficient formulas for the lattice point count in rational polygons using Dedekind–Rademacher sums. Discrete and Computational Geometry 27(4) (2002) 443–459 6. Brent, R.P., van der Poorten, A.J., te Riele, H.: A comparative study of algo- rithms for computing continued fractions of algebraic numbers. In Cohen, H., ed.: Algorithmic Number Theory. Volume 1122 of LNCS. Springer, Heidelberg (1996) 35–47