Generating random permutations by coin-tossing: classical algorithms, new analysis and modern implementation
Axel Bacher| , Olivier Bodini| , Hsien-Kuei Hwang~ , and Tsung-Hsi Tsai~
| LIPN, Universite´ Paris 13, Villetaneuse, France ~ Institute of Statistical Science, Acamedia Sinica, Taipei, Taiwan
March 10, 2016
Abstract Several simple, classical, little-known algorithms in the statistical literature for generating random permutations by coin-tossing are examined, analyzed and implemented. These algo- rithms are either asymptotically optimal or close to being so in terms of the expected number of times the random bits are generated. In addition to asymptotic approximations to the expected complexity, we also clarify the corresponding variances, as well as the asymptotic distribu- tions. A brief comparative discussion with numerical computations in a multicore system is also given.
1 Introduction
Random permutations are indispensable in widespread applications ranging from cryptology to statistical testing, from data structures to experimental design, from data randomization to Monte Carlo simulations, etc. Natural examples include the block cipher (Rivest, 1994), permutation tests (Berry et al., 2014) and the generalized association plots (Chen, 2002). Random permutations are also central in the framework of Boltzmann sampling for labeled combinatorial classes (Flajolet et al., 2007) where they intervene in the labeling process of samplers. Finding simple, efficient and cryptographically secure algorithms for generating large random permutations is then of vi- tal importance in the modern perspective. We are concerned in this paper with several simple classical algorithms for generating random permutations (each with the same probability of being
This research was partially supported by the ANR-MOST Joint Project MetAConC under the Grants ANR 2015- BLAN-0204 and MOST 105-2923-E-001-001-MY4.
1 generated), some having remained little known in the statistical and computer science literature, and focus mostly on their stochastic behaviors for large samples; implementation issues are also discussed.
Algorithm Laisant-Lehmer: when UnifŒ1; n! is available (Here and throughout this paper UnifŒa; b represents a discrete uniform distribution over all integers in the interval Œa; b.) The earliest algorithm for generating a random permutation dated back to Laisant’s work near the end of the 19th century (Laisant, 1888), which was later re-discovered in (Lehmer, 1960). It is based on the factorial representation of an integer
k c1.n 1/! c2.n 2/! cn 11! .0 6 k < n!/; D C C C where 0 6 cj 6 n j for 1 6 j 6 n 1. A simple algorithm implementing this representation then proceeds as follows; see (Devroye, 1986, p. 648) or (Robson, 1969). Let k UnifŒ0; n! 1. The first element of the ran- D dom permutation is c1 1, which is then removed from 1;:::; n . The next elementC of the random permutation Algorithm 1: LL.n; c/ f g will then be the .c2 1/st element of the n 1 remaining Input: c: an array with n elements elements. ContinueC this procedure until the last element Output: A random permutation on c is removed. A direct implementation of this algorithm begin 1 u UnifŒ1; n!; results in a two-loop procedure; a simple one-loop proce- WD dure was devised in (Robson, 1969) and is shown on the 2 for i n downto 2 do WD u 3 t ; j u i t 1; right; see also (Plackett, 1968) and Devroye’s book (De- WD i WD C 4 swap.ci; cj /; u t; vroye, 1986, ~XIII.1) for variants, implementation details WD and references. This algorithm is mostly useful when n is small, say less than 20 because n! grows very fast and the large number arithmetics involved reduce its effi- ciency for large n. Also the generation of the uniform distribution is better realized by the coin- tossing algorithms described in Section2.
Algorithm Fisher-Yates (FY): when UnifŒ1; n is available One of the simplest and mostly widely used algorithms (based on a given sequence of distinct numbers c1;:::; cn ) for generat- ing a random permutation is the Fisher-Yates or Knuth shuffle (in its modernf form byg Durstenfeld (Durstenfeld, 1964); see Wikipedia’s page on Fisher-Yates shuffle and (Devroye, 1986; Dursten- feld, 1964; Fisher and Yates., 1948; Knuth, 1998a) for more information). The algorithm starts by swapping cn with a ran- domly chosen element in c1;:::; cn (each with the Algorithm 2: FY.n; c/ same probability of beingf selected), andg then repeats Input: c: an array with n > 2 elements the same procedure for cn 1,..., c2. See also the re- Output: A random permutation on c cent book (Berry et al., 2014) or the survey paper (Rit- begin ter, 1991) for a more detailed account. 1 for i n downto 2 by 1 do WD Such an algorithm seems to have it all: single loop, 2 j UnifŒ1; i; swap.ci; cj / one-line description, constant extra storage, efficient WD
2 and easy to code. Yet it is not optimal in situations such as (i) when only a partial subset of the permuted elements is needed (see (Black and Rogaway, 2002; Brassard and Kannan, 1988)), (ii) when imple- mented on a non-array type data structure such as a list (see (Ressler, 1992)), (iii) when numerical truncation errors are inherent (see (Kimble, 1989)), and (iv) when a parallel or distributed comput- ing environment is available (see (Anderson, 1990; Langr et al., 2014)). On the other hand, at the memory access level, a direct generation of the uniform random variable results in a higher rate of cache miss (see (Andres´ and Perez´ , 2011)), making it less efficient than it seems, notably when n is very large; see also Section6 for some implementation and simulation aspects. Finally, this algo- rithm is sequential in nature and the memory conflict problem is subtle in parallel implementation; see (Waechter et al., 2011). Note that the implementation of this algorithm strongly relies on the availability of a uniform random variate generator, and its bit-complexity (number of random bits needed) is of linearithmic order (not linear); see (Lumbroso, 2013; Sandelius, 1962)) and below for a detailed analysis.
From unbounded uniforms to bounded uniforms? Instead of relying on uniform distribution with very large range, our starting question was: can we generate random permutations by bounded uniform distributions (for example, by flipping unbiased coins)? There are at least two different ways to achieve this:
Fisher-Yates type: simulate the uniform distribution used in Fisher-Yates shuffle by coin- tossing, which can be realized either by von Neumann’s rejection method (von Neumann, 1951) or by the Knuth-Yao algorithm (for generating a discrete distribution by unbiased coins; see (Devroye, 1986; Knuth and Yao, 1976)) and Section2, and
divide-and-conquer type: each element flips an unbiased coin and then depending on the outcome being head or tail divide the elements into two groups. Continue recursively the same procedure for each of the two groups. Then a random resampling is achieved by an inorder traversal on the corresponding digital tree; see the next section for details. This realization induces naturally a binary trie (Knuth, 1998b), which is closely related to a few other binomial splitting processes that will be briefly described below; see (Fuchs et al., 2014).
It turns out that exactly the same binomial splitting idea was already developed in the early 1960’s in the statistical literature in (Rao, 1961) and independently in (Sandelius, 1962), and an- alyzed later in (Plackett, 1968). The papers by Rao and by Sandelius also propose other variants, which have their modern algorithmic interests per se. However, all these algorithms have remained little known not only in computer science but also in statistics (see (Berry et al., 2014; Devroye, 1986)), partly because they rely on tables of random digits instead of more modern computer generated random bits although the underlying principle remains the same. Since a complete and rigorous analysis of the bit-complexity of these algorithms remains open, for historical reasons and for completeness, we will provide a detailed analysis of the algorithms proposed in (Rao, 1961) and (Sandelius, 1962) (and partially analyzed in (Plackett, 1968)) and two versions of Fisher-Yates
3 with different implementations of the underlying uniform UnifŒ0; n 1 by coin-tossing: one re- lying on von Neumann’s rejection method (Devroye and Gravel, 2015 ; von Neumann, 1951) and the other on Knuth-Yao’s tree method (Devroye, 1986; Knuth and Yao, 1976). As the ideas of these algorithms are very simple, it is no wonder that similar ideas also appeared in computer science literature but in different guises; see (Barker and Kelsey, 2007; Flajolet et al., 2011; Koo et al., 2014; Ressler, 1992) and the references therein. We will comment more on this in the next section. We describe in the next section the algorithms we will analyze in this paper. Then we give a complete probabilistic analysis of the number of random bits used by each of them. Implemen- tation aspects and benchmarks are briefly discussed in the final section. Note that Fisher-Yates shuffle and its variants for generating cyclic permutations have been analyzed in (Louchard et al., 2008; Mahmoud, 2003; Prodinger, 2002; Wilson, 2009) but their focus is on data movements rather than on bit-complexity.
2 Generating random permutations by coin-tossing
We describe in this section three algorithms for generating random permutations, assuming that a bounded uniform UnifŒ0; r 1 is available for some fixed integer r > 2. The first algorithm relies on the divide-and-conquer strategy and was first proposed in (Rao, 1961) and independently in (Sandelius, 1962), so we will refer to it as Algorithm RS (Rao-Sandelius). The other two ones we study are of Fisher-Yates type but differ in the way they simulate UnifŒ0; n 1 by a bounded uniform UnifŒ0; r 1: the first of these two simulates UnifŒ0; n 1 by a rejection procedure in the spirit of von Neumann (von Neumann, 1951) and was proposed and implemented in (Sandelius, 1962), named ORP (One-stage-Randomization Procedure) there, but for convenience we will refer to it as Algorithm FYvN (Fisher-Yates-von-Neumann); see also (Moses and Oakford, 1963); and the other one relies on an optimized version of Lumbroso’s implementation (Lumbroso, 2013) of Knuth-Yao’s DDG-tree (discrete distribution generating tree) algorithm (Knuth and Yao, 1976), which will be referred to as Algorithm FYKY (Fisher-Yates-Knuth-Yao). See also (Devroye, 1986, Ch. XV) on the “bit model” and the more recent updates (Devroye, 2010; Devroye and Gravel, 2015). For simplicity of presentation and practical usefulness, we focus in what follows on the binary rand-bit Bernoulli 1 case r 2. For convenience, let denote the random variable . 2 /, which returnsD zero or one with equal probability.
2.1 Algorithm RS: divide-and-conquer We describe Algorithm RS only in the binary case assuming an unbiased coin is available. Since we will carry out a detailed analysis of this algorithm, we give its procedure in recursive form as follows. (For practical implementation, it is more efficient to remove the recursions by standard techniques; see Section6.)
4 A sequence of distinct numbers c1;:::; cn is given. f g Algorithm 3: RS.n; c/ Input: c: a sequence with n elements 1. Each ci generates a rand-bit, one indepen- Output: A random permutation on c dently of the others; begin 2. Group them according to the outcomes being 1 if n 6 1 then 2 return 0 or 1, and arrange the groups in increasing c 3 if n 2 then order of the group labels. D 4 if rand-bit 1 then D 3. For each group of cardinality : 5 return .c2; c1/ 6 else (a) if 1, then stop; 7 return .c ; c / D 1 2 (b) if 2, then generate a rand-bit b D 8 Let A0 and A1 be two empty arrays and reverse their relative order if b 1; for i 1 to n do D WD (c) if > 2, then repeat Steps 1–3 for each 9 add ci into Arand-bit group. 10 return RS. A0 ; A0/,RS. A1 ; A1/ j j j j As an illustrative example, we begin with the sequence c1;:::; c6 . Assume that the flipped bi- c1c2c3c4c5c6 f  cg c c c c c à nary sequence is 1 2 3 4 5 6 . Then 1 0 1 1 0 0 c2c5c6 c1c3c4 we split the ci’s into the 0-group .c2; c5; c6/ and c2c5 c6 c1c4 c3 the 1-group .c1; c3; c4/, which can be written in the form .c2 c5 c6/.c1 c3 c4/. As both groups c2 c5 c4 c1 have cardinality larger than two, we run the same coin-flipping process for both groups. Assume that  c c c à  c c c à further coin-flippings yield 2 5 6 and 1 3 4 , respectively. Then we obtain 0 0 1 0 1 0 .c2 c5/ c6 .c1 c4/ c2. If the two extra coin-flippings needed to permute the two subgroups of size two are 0 and 1, respectively, then we get the random permutation c2 c5 c6 c4 c1 c3 : The splitting process of this algorithm is, up to the boundary conditions, essentially the same as constructing a random trie under the Bernoulli model or sorting using radixsort (see (Fuchs et al., 2014; Knuth, 1998b)), and was also briefly mentioned in (Flajolet et al., 2011). On the other hand, Ressler in (Ressler, 1992) proposed an algorithm for randomly permuting a list structure using a similar divide-and-conquer idea but performed in a rather different way. To the best of our knowledge, except for these references, this simple algorithm seems to remain unknown in the literature and we believe that more attention needs to be paid on its practical usefulness and theoretical relevance.
Essentially identical binomial splitting processes In addition to the above connection to trie and radixsort, the splitting process of Algorithm RS is also reminiscent of the so-called initializa- tion problem in distributed computing (or processor identity problem), where a unique identifier is to be assigned to each processor in some distributed computing environment; see (Nakano and
5 Olariu, 2000; Ravelomanana, 2007). Yet another context where exactly the same coin-tossing pro- cess is used to resolve conflict is the tree algorithm (or CTM algorithm, named after Capetanikis, Tsybakov and Mikhailov) in multi-access channel; see (Massey, 1981; Wagner, 2009). For more references on binomial splitting processes, see (Fuchs et al., 2014). Nowadays, it is well-known that the stochastic behaviors of these structures can be understood through the study of the binomial recurrence  à X n n fn gn 2 .fk fn k /; (1) D C k C 06k6n with suitably given initial conditions. In almost all cases of interest, such a recurrence often gives rise to asymptotic approximations (for large n) that involve periodic oscillations with minute ampli- 5 tudes (say, in the order of 10 ), which may lead to inexact conjectures (see for example (Massey, 1981)) but can be well described by standard complex-analytic tools such as Mellin transform (Flajolet et al., 1995) and saddle-point method (Flajolet and Sedgewick, 2009) (or analytic de- Poissonization (Jacquet and Szpankowski, 1998)); see (Fuchs et al., 2014) and the references com- piled there. From a historical perspective, such a clarification through analytic means was first worked out by Flajolet and his co-authors in the early 1980’s; see again (Fuchs et al., 2014) for a brief account. However, the periodic oscillations had already been observed in the 1960’s by Plackett in (Plackett, 1968) based on heuristic arguments and figures, which seems less expected because of the limited computer power at that time and of the proper normalization needed to visualize the fluctuations; see Figures1 and2 for the subtleties involved. Unlike Algorithm FY, Algorithm RS is more easily adapted to a distributed or parallel comput- ing environment because the random bits needed can be generated simultaneously. Furthermore, we will prove that the total number of random bits used is asymptotically optimal, namely, the expected complexity is asymptotic to n log2 n nFRS.log2 n/ O.1/, where FRS.t/ is a periodic function of period 1 with very small amplitudeC ( F .t/ 1C:1 10 5); see Figure1. Another j RS j 6 distinctive feature is that FRS is very smooth (infinitely differentiable), differing from most other periodic functions arising in the analysis below. Note that the information-theoretic lower bound n satisfies log2 n! n log2 n log 2 O.log n/. While the asymptotic optimality of such a sim- ple algorithm wasD already discussed C in detail in (Sandelius, 1962) and such an asymptotic pattern anticipated in (Plackett, 1968), the rigorous proof and the explicit characterization of the periodic function FRS are new. Also we show that the variance is relatively small (being of linear order with periodic fluctuations) and that the distribution is asymptotically normal.
2.2 Algorithm FYvN and FYKY We describe in this subsection the two versions of Algorithm FY: FYvN and FYKY. Both algo- rithms follow the same loop of Fisher-Yates shuffle and simulate successively the discrete uniform distributions UnifŒ1; n, ::: , UnifŒ1; 2 by flipping unbiased coins. To simulate UnifŒ1; k, both
algorithms generate first log2 k random bits. If these bits, when read as a binary representation, have a value less than k,d then returne this value plus 1 as the required random element; otherwise,
6 Algorithm FYvN rejects these bits and restart the same procedure until finding a value < k. Al- gorithm FYKY, on the other hand, does not reject the flipped bits but uses the difference between this value and k as the “seed” of the next round and repeat the same procedure. We modified and optimized these two procedures in a way to reduce the number of arithmetic operations, and we mark their differences in red color; see Algorithm4 and5. Algorithm 4: Algorithm FYvN Algorithm 5: Algorithm FYKY Input: c: an array with n elements Input: c: an array with n elements Output: A random permutation on c Output: A random permutation on c begin begin 1 for i n downto 2 by 1 do 1 for i n downto 2 by 1 do WD WD 2 j von-Neumann.i/ 1; 2 j Knuth-Yao.i/ 1; WD C WD C swap.ci; cj /; swap.ci; cj /; Procedure von-Neumann(n) Procedure Knuth-Yao(n) Input: a positive integer n Input: a positive integer n Output: UnifŒ0; n 1 Output: UnifŒ0; n 1 begin begin 1 u 1; x 0; 1 u 1; x 0; WD WD WD WD 2 while 1 1 do 2 while 1 1 do D D 3 while u < n do 3 while u < n do 4 u 2u; 4 u 2u; WD WD x 2x rand-bit; x 2x rand-bit; WD C WD C 5 d u n; 5 d u n; WD WD 6 if x > d then 6 if x > d then 7 return x d; 7 return x d; 8 else 8 else 9 u 1; x 0; 9 u d; WD WD WD
Note that both algorithms are identical when n 2k and n 3 for which the algorithm parameters evolve as in the following diagram. D D
.4; 3/ x d 2 .2; 1/ D .u; x/ .4; 2/ x d 1 n 3 D .1; 0/ D W .4; 1/ x d 0 .2; 0/ D .4; 0/ .1; 0/ recursive
While the difference of both algorithms in such a pseudo-code level is minor, we show that the asymptotic behavior of their bit-complexity for generating a random permutation of n elements differs significantly:
7 Algorithm Mean Variance Method RS n log n nF . / nG . / Analytic 2 C RS RS FYvN n.log n/F Œ1. / nF Œ2. / n.log n/2G . / Elementary vN C vN vN FYKY n log n nF . / nG . / Analytic 2 C KY KY
Here F . / and G. / are all bounded, continuous periodic functions of parameter log2 n. We see that the minor difference in Algorithm FYvN results not only in higher mean but also larger vari- ance, making FYvN less competitive in modern practical applications although it was used, for example, by Moses and Oakford to produce tables of random permutations (Moses and Oakford, 1963). Also the procedure von-Neumann in Algorithm4, as one of the simplest and most natural ideas of simulating a uniform by coin-tossing, was independently proposed under different names in the literature; see, for example, (Granboulan and Pornin, 2007; Koo et al., 2014); in particular, it is called “Simple Discard Method” in NIST’s (Barker and Kelsey, 2007) “Recommendation for random number generation using deterministic random bit generators.” Thus, we also include the analysis of FYvN in this paper although it is less efficient in bit-complexity. The mean and the variance of Algorithm FYvN were already derived in (Plackett, 1968) but only when n 2k . In addition to this approximation, we will also show that the variance is of a less commonD higher order n.log n/2, and the distribution remains asymptotically normal.
2.3 Outline of this paper We focus in this paper on a detailed probabilistic analysis of the bit-complexity of the three algo- rithms RS, FYvN and FYKY. Indeed, in all three cases we will establish a very strong local limit theorem for the bit-complexity of the form (although the variances are not of the same order)
2 x   3 Ãà j p kÁ e 2 1 x P Wn E.Wn/ x V.Wn/ p 1 O C j j ; D C D 2V.Wn/ C pn
1 uniformly for x o.n 6 /, where Wn represents the bit-complexity of any of the three algorithms RS, FYvN, FYKYD . Our method of proof is mostly analytic, relying on proper use of generating functionsf (includingg characteristic functions) and standard complex-analytic techniques (see (Fla- jolet and Sedgewick, 2009)). The diverse uniform estimates needed for the characteristic functions constitute the hard part of our proofs. The same method can be extended to clarify finer proba- bilities of moderate deviations, but for simplicity, we content with the above result in the central range. We also implemented these algorithms and tested their efficiency in terms of running time. The simulation results are given in the last section. Briefly, Algorithm FYKY is recommended when n 7 is not very large, say n 6 10 , and Algorithm RS performs better for larger n or when a multicore system is available. Finally, our analysis and simulations also suggest that the “Simple Discard Algorithm” recom- mended in NITS’s (Barker and Kelsey, 2007) “Recommendation for random number generation” is better replaced by the procedure Knuth-Yao in Algorithm5 whose expected optimality (in bit-complexity) was established in (Horibe, 1981).
8 3 The bit-complexity of Algorithm RS
We consider the total number Xn of times the random variable rand-bit is used in Algorithm RS for generating a random permutation of n elements. We will derive precise asymptotic approx- imations to the mean, the variance and the distribution by applying the approaches developed in our previous papers (Fuchs et al., 2014; Hwang, 2003; Hwang et al., 2010).
Recurrences and generating functions By construction, Xn satisfies the distributional recur- rence d Xn XIn Xn In n;.n > 3/; D „ƒ‚… C „ƒ‚… C 1-group 0-group
with the initial conditions X0 X1 0 and X2 1, where In denotes the binomial distribution 1 D D D with parameters n and 2 . Here the .Xn/’s are independent copies of the .Xn/’s and are independent of In. This random variable is, up to initial conditions, identical to the external path length of random tries constructed from n random binary strings. It may also be interpreted in may different ways; see (Fuchs et al., 2014; Knuth, 1998b) and the references therein. Xnt In terms of the moment generating function Pn.t/ E.e /, we have the recurrence WD Â Ã nt X n n Pn.t/ e 2 Pk .t/Pn k .t/.n > 3/; (2) D k 06k6n
t with P0.t/ P1.t/ 1 and P2.t/ e . From these relations, we see that the bivariate Poisson generating functionD PD.z; t/ e z PD Pn.t/ zn satisfies the functional equation Q WD n>0 n! .et 1/z 1 t 2 P.z; t/ e P e z; t Q.z; t/; (3) Q D Q 2 C Q where t z 1 t t Q.z; t/ 1 e ze 1 e z 2 e : Q WD C 4 C m m z P E.Xn / n Let now fm.z/ m!Œt P.z; t/ e z denote the Poisson generating function Q WD Q D n>0 n! of the mth moment of Xn. From (3), we see that ( f .z/ 2f z g .z/; Q1 Q1 2 1 D z CQ f2.z/ 2f2 g2.z/; Q D Q 2 CQ
with g1.0/ g2.0/ 0, where Q DQ D ( z 3 g1.z/ z ze 1 z ; Q D C 4 (4) z 2 z z 2 z 11 g2.z/ 2f1 4zf1 2zf z z ze 1 z : Q D Q 2 C Q 2 C Q10 2 C C C 4
9 Mean value From the recurrence (2), we see that the mean n E.Xn/ can be computed recursively by WD Â Ã X 1 n n n n 2 k .n 3/; D C k > 06k6n with 0 and 1. Let H P j 1 denote the harmonic numbers and 0 1 2 n 16j6n denote Euler’sD constant.D D WD
Theorem 1. The expected number n of random bits used by Algorithm RS for generating a ran- dom permutation of n elements satisfies the identity  à n Hn 1 1 3 1 X . k /.n/ 3 1 k ; (5) n D log 2 C 2 4 log 2 log 2 .n k / C 4 k Z 0 2 nf g C 2ki for n 3, where is the Gamma function and k . Asymptotically, n satisfies > WD log 2
n n log n nFRS.log n/ O.1/; (6) D 2 C 2 C
where FRS.t/ is a periodic function of period 1 whose Fourier series expansion is given by  à 1 3 1 X 3 2kit FRS.t/ . k / 1 k e ; D log 2 C 2 4 log 2 log 2 C 4 k Z 0 2 nf g the Fourier series being absolutely convergent.
Proof. To derive a more effective asymptotic approximation to n, we begin with the expansion
j X . 1/ 3j 7 j g1.z/ z : Q D .j 1/! 4 j>2
n n n We then see that the sequence n n!Œz fQ1.z/, where Œz f .z/ denotes the coefficient of z in the Taylor expansion of f , satisfiesQ WD
n Œz g1.z/ n Q .n 2/: Q D 1 21 n > n z It follows, by Cauchy convolution, that the coefficient n n!Œz e fQ1.z/ has the closed-form expression WD Â Ã X n k k 3k 7 n . 1/ 1 k .n > 1/; D k 1 2 4 26k6n which, by standard integral representation for finite differences (see (Flajolet and Sedgewick, 1995)), can be expressed as
1 Z 2 i  à n 1 C 1 .n/.s/ 3 s 1 s ds; n D 2i 1 i .n s/.1 2 / C 4 2 1 C
10 1 where the integral path is the vertical line .s/ 2 . By moving the line of integration to the < D 2ki right and by collecting all residues at the poles k .k Z/, we obtain D log 2 2  à n Hn 1 1 3 1 X . k /.n/ 3 1 k Rn; n D log 2 C 2 4 log 2 log 2 .n k / C 4 C k Z 0 2 nf g C where 1 Z 2 i  à 1 C 1 .n/.s/ 3 Rn s 1 s ds: WD 2i 1 i .n s/.1 2 / C 4 2 1 C Since there is no other singularity lying to the right of the imaginary axis, we deform first the integration path into a large half-circle to the right, and then prove that the integral tends to zero as the radius of the circle tends to infinity. In this way, we deduce that Rn 0 for n > 3, proving the identity (5). The asymptotic approximation (6) then follows from the asymptoticÁ expansion for the ratio of Gamma functions (see (Erdelyi´ et al., 1953, ~1.18))
.n/ Â . 1/ Â 4 ÃÃ k k k k n 1 O j 2j .k 0/: .n k / D 2n C n ¤ C Indeed, the O.1/-term in (6) can be further refined by this expansion, and be replaced by
1 Â 3 Ã X k 1 .1 k /. k 1/ 1 k n O n ; 2 log 2 C C 4 C k Z 2 the series on the right-hand side defining another bounded periodic function. Finally, the Fourier series is absolutely convergent because of the estimate (see (Erdelyi´ et al., 1953, ~1.18))