Utilizing the linear Diophantine problem of Frobenius for a faster sequence

Dr. Bharti Temkin Maximilian Berger

November 27, 2004

Abstract not. Those which can’t be represented with pos- itive integer coefficients can be represented with a negative coefficient: TBD: Abstract of the paper Theorem 2. We assume the two positive inte- ger x1 and x2 are relatively prime. Then, ev- ery positive integer that can not be represented 1 Introduction with positive integer coefficients c1, c2 as a lin- ear combination of x1 and x2: x = c2x2 + c1x1 can be represented with one negative coefficient TBD c11: x = c2x2 − c11x1.

TBD: Where is this prooven? 2 The linear Diophantine prob-

lem of Frobenius We may safely limit the coefficent c2 to the range [0..x1 − 1]. Should c2 be greater we can increase c1 instead. The linear Diophantine problem of Frobenius is equivalent to the coin exchange problem: What Having done this, we can now represent every is the largest integer that can not be represented integer in a table, using c1 as the x-axis and c2 with positive integers x1... xn > 1 that are rela- as the y axis, as shown in Figure 1. tively prime? 3 8 13 18 23 28 33 38 43 ... This problem has not be solved generally yet. 1 6 11 16 21 26 31 36 ... However, [5] has solved it for the case of n = 2: 4 9 14 19 24 29 ... 2 7 12 17 2 ... Theorem 1. We assume the two positive inte- 0 5 10 15 ... ger x1 and x2 are relatively prime. Then, every integer x > x1x2 −x1 −x2 can be represented as a Figure 1: Tabular display for integers that can be linear combination of x1 and x2: x = c2x2 +c1x1 represented with the positive coefficients 5 and 7 with positive integer coefficients c1, c2. [5] This representation gives us another way to prove Theorem 2. Since c1 and c2 are positive, What about the integers 1leqx ≤ (x1 −1)(x2 −1) the entry at xa, yb must be larger than any entry ? Some of them can still be represented with at xi, yb where i < a and larger than any entry positive integer coefficients, however, some can at xa yj where j < b.

1 The largest number with a negative coefficient by inserting every element into a already sorted c1 must therefore be the one hat has the largest list. k-insertion sorts every kth element in possible c2 and the lagerst possible c1. a list.

Definition 1. A list of n elements ei is said to be The largest possible c2 is x1 − 1 (as we defined k-orderd, if ei ≤ ei+ck∀1 ≤ i ≤ n, 1 ≤ c, c, i, k ∈ earlier). The largest possible negative c1 is −1. + Therefore, the largest integer that can not be N represented is: x = (x1 − 1)x2 − 1(x1). This is equivalent to x1x2 − x1 − x2. Shellsort calls k- with decreasing values of k. To ensure that the list is sorted, The next question is: How many integers x ≤ the last step is k = 1. (x1 −1)(x2 −1) can not be represented with pos- itive coefficients?

If we look at the tabular representation again, 4 Shellsort performance with this question is asking how many numbers are relatively prime numbers left of the column of c1 = 0 ?

+ For c2 this is 0, since 0x2 will always be 0 and We will assume that k, l ∈ N are two relatively there are no positive integers less than 0. prime numbers. If we sort a list of elements ei with a k- and a l-insertion sort, we will get a For any c this is b c2x2 c. In particular, for c = k,l-ordered list with the properties 2 x1 2 (x1−1)x2 x1 − 1 this is b c. x1 ei ≤ ei+c1k

If we sum all these up for c2 = [0..x1 − 1] we get and (x −1)(x −1) 1 2 ej ≤ ej+c l 2 . 2 If we set j = i + c1k we can combine these two: TBD: How????? e ≤ e Theorem 3. We assume the two positive integer i i+c1k+c2l x and x are relatively prime. Then (x1−1)(x2−1) 1 2 2 From the Theorem ?? we know that c1k+c2l can integers can not be represented as a linear com- represent every integer x ≥ (k − 1)(l − 1) thus bination of x1 and x2: x = c2x2 + c1x1 with positive integer coefficients c1, c2. ei ≤ ei+(k−1)(l−1)+1

which leads us to the follwing Unfortunately, the current research only gives good explanations for two relatively prime num- Theorem 4. Every element in a k, l-ordered list bers. There are several papers that try to find is less than (k − 1)(l − 1) indexes away from its upper and lower bounds for the general case. So sorted position in a 1-sorted list, if 1 ≤ k < l, k far, no general formula has been found. ?? and l relative prime.

Also, if we look at Theorem 3 we cann see that 3 Shellsort Theorem 5. Every element in a k, l-sorted list (k−1)(l−1) has at most 2 elements appearing before Shellsort, as suggested in TBD, is a repeated ver- that should appear later, if 1 ≤ k < l, k and l sion of k-insertion sort. Insertion sort sorts a list relative prime.

2 5 Shellsort performance with 7 Constructing a Shellsequence non relatively prime numbers We now tried to construct a Shell sequences based on assumptions 1 and 2. If we take two numbers k,l, that are not relatively prime, only numbers that have the common fac- To satisfy assumption 2 we construct an ideal tor gcd(k, l) in it can be represented in terms of sequence f(n) with the follwoing properties: k and l. f(1) = 1 (1) Applying this to Shellsort, this means until the f(n) = f(n − 1)c where c > 1 ∈ R (2) very last step, there will always be values that may need to move through the whole list. Shell- To satisfy assumption 1 we define the sequence sorts strenght, however, lies in eliminating those s(n) as follows: s(n) will be the smallest integer elements, so we arrive at the following greater than f(n) that is relatively prime to all s(2)..s(n − 1). Assumption 1. For a Shellsort sequence to be effective, all the numbers have to be relatively prime 8 Findind the best growth fac- tor

To find the best growth factor, we will use the 6 Growth of a Shellsort se- above method to construct shell sequences. We will then appply these to sort arrays of different quence sizes that contain random data.

For c we used the range 1.50..3.00 in incre- The growth of the sequence is another important ments of 0.05. For the array size, we used factor. If the sequence grows to slow, to many 103, 104, 105, 106. Each sort has been done 5 shellsort passes will be made, leading to unnecce- times to assure accuracy. The results can be sary comparisons, which take to much time. found in Figures 2, 3 and 4.

If the sequene grows to fast, the advantages of As you can see from Figure 4, the faster the se- Shellsort are gone, and the behaviour gets closer quence grows, the less overhead is involved, and and close to that of straight insertion sort. the actual runs faster. This is, how- ever due to the fact that we were doing inte- If we look at effective sequences that can be fond ger comparisons, which are very fast on modern in literature, most of them grow approximately computers, and the management overhead dom- by a factor of 2. No rule for perfect Shellse- inates the runtime. quence growth has been found, but this seems to be good. If comparisons would dominate the runtime, we need to minize those. If we look at Figure 2 we see multiple minima, at c = 2.2, 2.35, 2.45 Assumption 2. The Shell sequence may neither and 2.55. We will therefore examine the range grow to fast nor to slow. A factor close to 2 2.1..2.6 closer, in steps of 0.01. The results can seems to give the best results. be found in Figure 5.

3 c 106 105 104 103 c 106 105 104 103 1.5 148096157 4033761 268781 19761 1.5 296192315 8067523 537563 39522 1.55 94642467 3489655 261369 18845 1.55 189284934 6979310 522738 37691 1.6 66307896 3222457 253077 18089 1.6 132615792 6444915 506154 36179 1.65 51195019 3071179 243260 17250 1.65 102390038 6142358 486520 34501 1.7 43806001 3076849 242129 17040 1.7 87612002 6153698 484259 34081 1.75 38690668 2969786 230646 16381 1.75 77381337 5939572 461293 32762 1.8 36338939 2922294 226307 15934 1.8 72677879 5844589 452615 31869 1.85 34896162 2850479 219403 15491 1.85 69792324 5700958 438807 30983 1.9 35119277 2883186 222804 15491 1.9 70238555 5766372 445608 30983 1.95 34096450 2770747 213018 15108 1.95 68192900 5541494 426036 30216 2 38363880 2872861 213661 14823 2 76727761 5745722 427323 29646 2.05 33867497 2740231 210997 14689 2.05 67734994 5480462 421994 29378 2.1 33571066 2714777 206737 14647 2.1 67142132 5429554 413474 29295 2.15 33354981 2695285 205777 14379 2.15 66709963 5390571 411554 28759 2.2 33162445 2677880 203966 14326 2.2 66324891 5355760 407932 28653 2.25 33428464 2698651 208228 14090 2.25 66856929 5397303 416457 28180 2.3 32995665 2655091 199975 13956 2.3 65991331 5310182 399950 27912 2.35 32921442 2648227 200888 14014 2.35 65842884 5296454 401777 28028 2.4 33329546 2676205 201823 13889 2.4 66659092 5352411 403647 27778 2.45 32853407 2631833 199408 13738 2.45 65706815 5263666 398817 27477 2.5 33138322 2654605 200421 13845 2.5 66276645 5309210 400842 27691 2.55 32992964 2628471 197599 13704 2.55 65985928 5256942 395198 27408 2.6 33033944 2628661 198060 13643 2.6 66067889 5257323 396120 27286 2.65 33684086 2685786 202121 13597 2.65 67368172 5371573 404242 27194 2.7 33978169 2713627 206348 13810 2.7 67956339 5427254 412696 27620 2.75 34582299 2757018 207685 13960 2.75 69164598 5514036 415371 27920 2.8 33991997 2672757 201107 13824 2.8 67983994 5345514 402215 27648 2.85 34088609 2710268 203217 13770 2.85 68177219 5420536 406435 27540 2.9 34198867 2697561 203390 13733 2.9 68397734 5395123 406780 27466 2.95 35282875 2806847 209515 14078 2.95 70565750 5613694 419030 28156 3 43128969 2934564 207531 13542 3 86257939 5869128 415063 27084

Figure 2: Number of data comparisons for dif- Figure 3: Number of data movements for differ- ferent values of c ent values of c

4 c 106 105 104 103 2.1 33575123 2713080 207419 14493 2.11 33501922 2707440 207390 14560 2.12 33415394 2704853 206592 14333 2.13 33449975 2704185 206917 14470 2.14 33474324 2709096 207465 14563

6 5 4 3 2.15 33407843 2695299 205862 14540 c 10 10 10 10 2.16 33419773 2704901 206963 14339 1.5 0.3864 0.0382 0.002 0 2.17 33770344 2699969 205263 14433 1.55 0.3808 0.0342 0.002 0 2.18 34017114 2755154 211278 14754 1.6 0.3766 0.036 0 0 2.19 33383283 2693135 205064 14296 1.65 0.3786 0.036 0.002 0 2.2 33172462 2675981 204456 14257 1.7 0.3766 0.038 0 0 2.21 33696887 2711675 205141 14329 1.75 0.391 0.036 0.002 0 2.22 35901099 2938219 216467 14273 1.8 0.3864 0.0362 0.002 0 2.23 33222245 2681024 205006 14246 1.85 0.3846 0.038 0.0022 0 2.24 33477421 2698259 206255 14006 1.9 0.3826 0.032 0.002 0 2.25 33445329 2706222 204857 14200 1.95 0.3844 0.028 0.004 0 2.26 32981825 2659946 202310 14120 2 0.3866 0.03 0 0 2.27 33578549 2704620 203846 14142 2.05 0.3702 0.0282 0 0.002 2.28 33006563 2649877 200930 13996 2.1 0.3484 0.026 0.004 0 2.29 33065820 2655830 200378 14063 2.15 0.3446 0.024 0.002 0 2.3 33005193 2658651 201434 13887 2.2 0.3324 0.03 0.002 0 2.31 32931701 2650098 200944 13843 2.25 0.3224 0.024 0 0 2.32 33249471 2676437 202770 13896 2.3 0.3184 0.024 0.004 0 2.33 33078478 2658218 200746 13842 2.35 0.3066 0.0222 0.002 0 2.34 33056568 2650571 200763 13855 2.4 0.2984 0.02 0.002 0 2.35 32933741 2648072 201409 13731 2.45 0.2964 0.0242 0.004 0 2.36 32809977 2636629 200258 13771 2.5 0.2822 0.024 0.002 0 2.37 33241054 2666198 203039 14130 2.55 0.2884 0.022 0.002 0 2.38 32937912 2645452 201385 14068 2.6 0.2786 0.02 0 0.002 2.39 33223200 2662427 204323 14144 2.65 0.2684 0.02 0.002 0 2.4 33364435 2673239 203491 14052 2.7 0.2664 0.018 0 0 2.41 33144837 2654708 201372 14107 2.75 0.2682 0.02 0.004 0 2.42 34241799 2759769 207570 14037 2.8 0.2606 0.024 0 0 2.43 33301732 2675290 203259 14419 2.85 0.2504 0.02 0 0 2.44 33093051 2654245 200847 13791 2.9 0.2484 0.018 0 0 2.45 32868232 2633840 198648 13749 2.95 0.2464 0.0202 0 0 2.46 33279719 2676711 203720 14160 3 0.242 0.018 0 0 2.47 33184021 2661495 201188 13944 Figure 4: Actual time used for different values of 2.48 33296984 2664019 198741 13645 c on an AMD Athlon XP-M 3000, Windows XP 2.49 33186545 2643143 198062 13671 2.5 33117859 2656282 200962 13917 2.51 33327188 2664920 200380 13655 2.52 34110950 2732160 208303 13818 2.53 33112391 2651522 200332 13708 2.54 33105570 2641384 198631 13584 2.55 32966759 2624270 197888 13507 2.56 32944178 2624409 197545 13561 2.57 33359233 2669948 201205 13802 5 2.58 33561907 2676129 202047 13807 2.59 33154492 2647119 199909 13719 2.6 33030361 2634062 199742 13627

Figure 5: Number of data comparisons for dif- ferent values of c in the closeup range 2.1..2.6 As we can see from Figure 5, the minima for the Knuth: 1743392200 581130733 193710244 best sequence are at c = 2.36 and c = 2.45. The 64570081 21523360 7174453 2391484 first 20 values of these sequences are: 797161 265720 88573 29524 9841 3280 1093 364 121 40 13 4 1 For 2.36: 1 3 7 16 34 74 173 409 964 2273 5365 12650 29851 70447 166255 392356 925961 2185265 5157221 12171041 Gonnet

For 2.45: 1 3 7 16 37 89 218 530 1300 3181 7793 19091 46775 114595 280754 687847 Gonnet has proposed a different sequence in [?], 1685224 4128802 10115554 24783109 that is defined backwards:

s(n) = size (5) s(i) = 5 · s(i + 1)/11 fors(i + 1) ≥ 5(6) 9 Comparing the new Shell se- s(i) = 1 fors(i + 1) < 5 (7) quences with existing Shell sequences One interesting thing to note about this sequence is that the ratio of s(i + 1)/s(i) = 2.2, which we have seen as a local minia in out earlier testing. To see if our new sequences stand a chance against previousely known Shell sequences, we Gonne: 10000000 4545454 2066115 939143 need to compare it with proven sequences. 426883 194037 88198 40090 18222 8282 3764 1710 777 353 160 72 32 14 6 2 1

Shell Sedgewick

The original shell sequence proposed in [?]. It is generated as follows: s(i) = 2i. This results Sedgewick has proposed multiple sequences. One in powers of two. This sequence has been proven that can be found in [?] is generated with the not to be very good. However, it is included since following method: it is the original sequence. s(0) = 1 (8) 2 Orig: 16777216 8388608 4194304 2097152 s(i) = t(i) − (3t(i))/2 + 1where (9) 1048576 524288 262144 131072 65536 t(i) = 2i−1 (10) 32768 16384 8192 4096 2048 1024 512 256 128 64 32 16 8 4 2 1 which will result in

Sedge: 1073692673 268410881 67096577 Knuth 16771073 4191233 1047041 261377 65153 16193 4001 977 233 53 11 2 1

Knuth proposed a different sequence in [?]. This Another one of Sedwicks sequences can be found sequence is defined as follows: in [?]. Its generation is:

s(0) = 1 (3) s(0) = 1 (11) s(i) = 3 · s(i − 1) + 1 (4) s(i) = 4i + 3i−1 + 1 (12)

6 which results in the following sequence: 72 64 54 48 36 32 27 24 18 16 12 9 8 6 4 3 2 1 Sed2: 1073790977 268460033 67121153 16783361 4197377 1050113 262913 65921 16577 4193 1073 281 77 23 8 1 Papernov-Stasevich

Pratt Papernov and Stasevich have proposed another sequence that is based on the powers of two:

Pratt tried to minimize work by defining a se- s(0) = 1 (13) quence in [?] that is not relatively prime. Every s(i) = 2(i + 1) − 1 (14) step except the last step will therefore be fast. The sequence is generated by taking all possible multiples of which is: 2i3jfori, j ≥ 0 P/S and then sorting them. The sequence up to 107 67108863 33554431 16777215 8388607 is: 4194303 2097151 1048575 524287 262143 131071 65535 32767 16383 8191 4095 2047 1023 511 255 127 63 31 15 7 3 1 Pratt: 10077696 9565938 8957952 8503056 7962624 7558272 7077888 6718464 6377292 5971968 5668704 5308416 5038848 4782969 4718592 4478976 4251528 3981312 3779136 Stover 3538944 3359232 3188646 2985984 2834352 2654208 2519424 2359296 2239488 2125764 1990656 1889568 1769472 1679616 1594323 Michael Stover has tried to find good Shellsort 1572864 1492992 1417176 1327104 1259712 sequences through evolution in [?]. His two best 1179648 1119744 1062882 995328 944784 sequences are: 884736 839808 786432 746496 708588 663552 629856 589824 559872 531441 Stov7: 499871 494198 451488 128823 524288 497664 472392 442368 419904 35957 13353 5467 2673 1097 340 171 58 393216 373248 354294 331776 314928 24 9 4 1 Stov8: 498201 461299 275360 294912 279936 262144 248832 236196 62025 18168 8186 3716 1325 444 177 61 221184 209952 196608 186624 177147 23 13 4 1 165888 157464 147456 139968 131072 5 124416 118098 110592 104976 98304 93312 unfortunately, these are only defined up to 5·10 . 5 82944 78732 73728 69984 65536 62208 They are very efficient for array sizes up to 5·10 , 59049 55296 52488 49152 46656 41472 but not much more. 39366 36864 34992 32768 31104 27648 26244 24576 23328 20736 19683 18432 17496 16384 15552 13824 13122 12288 Comparison 11664 10368 9216 8748 8192 7776 6912 6561 6144 5832 5184 4608 4374 4096 3888 3456 3072 2916 2592 2304 2187 2048 1944 We generated all these sequences and ran them 1728 1536 1458 1296 1152 1024 972 864 on random arrays of the size 107, 106, 105, 104. 768 729 648 576 512 486 432 384 324 288 Each run hase been done 5 times to assure accu- 256 243 216 192 162 144 128 108 96 81 rary. The results are in Figures 6, ?? and 7.

7 Sequence 107 106 105 104 Orig overflow 533589920 17433000 679873 knuth overflow 65274204 3934718 247046 Gonne 427369479 36082280 2944356 228294 Sedge 525495478 42444005 3311372 240488 Sed2 505044384 40791204 3153956 230992 Pratt overflow 135701128 9479898 614783 P/S overflow 64330653 3936064 246610 Stov7 420727339 33445865 2613719 197196 Stov8 427160220 33718519 2640358 198636 2.36 394013512 32856552 2641970 200794 2.45 394277061 32842447 2636230 198510

Figure 6: Number of data comparisons for different Shell sequences.

Sequence 107 106 105 104 Orig overflow 208186381 34866000 1359746 knuth overflow 130548408 7869437 494092 Gonne 854738959 72164560 5888712 456589 Sedge overflow 84888010 6622744 480977 Sed2 overflow 81582408 6307912 461985 Pratt overflow 271402257 18959797 1229567 P/S overflow 128661307 7872129 493220 Stov7 841454678 66891731 5227439 394393 Stov8 854320440 67437039 5280716 397272 2.36 788027024 65713104 5283940 401589 2.45 788554123 65684894 5272460 397020

Figure 7: Number of data read/writes for different Shell sequences

8 Sequence 107 106 105 104 11 References Orig 4.5824 0.375 0.0302 0.004 knuth 2.796 0.2328 0.02 0 Gonne 3.8476 0.3184 0.028 0 References Sedge 2.3316 0.1962 0.014 0 Sed2 2.3274 0.1942 0.014 0 [1] Rdseth, .J. ”On a Linear Diophantine Prob- Pratt 35.4688 2.5474 0.1584 0.006 lem of Frobenius.” J. reine angew. Math. P/S 4.3542 0.3564 0.0262 0 301, 171-178, 1978. Stov7 3.1784 0.2822 0.02 0.002 Stov8 2.9982 0.2646 0.022 0.002 [2] Rdseth, .J. ”On a Linear Diophantine Prob- 2.36 3.6834 0.2924 0.024 0 lem of Frobenius. II.” J. reine angew. Math. 2.45 3.529 0.2922 0.0162 0 307/308, 431-440, 1979.

Figure 8: Sorting time for different Shell se- [3] Selmer, E.S. ”The Linear Diophantine quences Problem of Frobenius.” J. reine angew. Math. 293/294, 1-17, 1977

[4] P. Erdos and R. L. Graham, On a lin- unfortunately some of the test results for 107 are ear Diophantine problem of Frobenius, Acta incomplete due to integer overflows. However, Arith., 21(1972), 399-408 the final test results are clearly visible: [5] J.J. Sylvester, Mathematical questions with their solutions, Educational Times 41 For pure speed Sedgewicks sequences and Knuth (1884), 21 sequence are best. This is due to the fact that these sequences grow very fast and therefore have [6] , C. A. R. Hoare, Comp. J. 5, 1962 less overhead. Please keep in mind that the ex- ample sort was a sort on integer values, which are [7] -of-Three, R. C. Singleton, CACM very easy to compare. For simple items, these se- 12, 1969 quences seem to be the best. [8] D. R. Musser, “Introspective Sorting and Selection Algorithm”, Software Practice If the items get more complex, we want to re- and Experience 27(8):983, 1997 duce the number of data read/writes and com- parisons. The best sequences for this are Stovers [9] R. Sedgewick, “ in C++, Third sequences and the two sequemces developed in Edition”: p. 285ff., 1998 this paper. Unfortunately, Stovers sequences are only defined up to 5·105, so for arrays with larger sized the sequences defines in this paper are the best.

10 Summary

TBD

9