<<

Permutation Generation Methods*

ROBERT SEDGEWlCK Program ~n and Dwlsmn of Applled Brown Unwersity, Prowdence, Rhode Island 02912

This paper surveys the numerous methods that have been proposed for permutatmn by computer. The various algorithms which have been developed over the years are described in detail, and zmplemented in a modern ALc,oL-hke language. All of the algorithms are derived from one rumple control structure. The problems involved with implementing the best of the algorithms on real com- puters are treated m detail. Assembly-language programs are derived and analyzed fully. The paper is intended not only as a survey of permutation generation methods, but also as a tutomal on how to compare a number of different algorithms for the same task Key Words and Phrases: permutations, combmatomal algorithms, code optimlzatmn, analysis of algorithms, lexicographlc ordering, random permutatmns, recursion, cyclic rotatzon. CR Categories: 3.15, 4.6, 5 25, 5.30. INTRODUCTION tation generation was being used to solve problems. Most of the problems that he Over thirty algorithms have been pub- described are now handled with more so- lished during the past twenty years for phisticated techniques, but the paper stim- generating by computer all N! permuta- ulated interest in permutation generation tions of N elements. This problem is a by computer per se. The problem is simply nontrivial example of the use of computers stated, but not easily solved, and is often in combinatorial mathematics, and it is used as an example in programming and interesting to study because a number of correctness. (See, for example, [6]). different approaches can be compared. The study of the various methods that Surveys of the field have been published have been proposed for permutation gener- previously in 1960 by D. H. Lehmer [26] ation is still very instructive today because and in 1970-71 by R. J. Ord-Smith [29, 30]. together they illustrate nicely the rela- A new look at the problem is appropriate tionship between , recursion, and at this time because several new algo- iteration. These are fundamental concepts rithms have been proposed in the inter- in computer science, and it is useful to vening years. have a rather simple example which illus- Permutation generation has a long and trates so well the relationships between distinguished history. It was actually one them. We shall see that algorithms which of the first nontrivial nonnumeric prob- seem to differ markedly have essentially lems to be attacked by computer. In 1956, the same structure when expressed in a C. Tompkins wrote a paper [44] describing modern language and subjected to simple a number of practical areas 'where permu- program transformations. Many readers * Thin work was supported by the Natmnal Science may find it surprising to discover that Foundatmn Grant No. MCS75-23738 ~'top-down" (recursive) and '~bettom-up"

Copyright © 1977, Associahon for Computing Machinery, Inc. General permismon to repubhsh, but not for profit, all or part of this materml is granted provided that ACM's copymght notice is given and that reference is made to the publication, to its date of issue, and to the fact that repnntmg privileges were granted by permission of the Association for Computing Machinery

Computing Surveys, Vol 9, No 2, June 1977 138 • R. Sedgewick

CONTENTS help, but not as much as one might think. Table 1 shows the values of N! for N -< 17 along with the time that would be taken by a permutation generation program that produces a new permutation each micro- second. For N > 25, the time required is far greater than the age of the earth! INTRODUCTION For many practical applications, the 1 METHODS BASED ON EXCHANGES sheer magnitude of N! has led to the devel- Recur~lve methods Adjacent exchanges opment of "combinatorial search" proce- counting dures which are far more efficient than "Loopless" algorithms permutation enumeration. Techniques Another lterahve method 2 OTHER TYPES OF ALGORITHMS such as mathematical programming and Nested cycling backtracking are used regularly to solve Lexlcograpluc algorlthras Random permutataons optimization problems in industrial situa- 3 IMPLEMENTATION AND ANALYSIS tions, and have led to the resolution of A recurslve method (Heap) several hard problems in combinatorial An lteratlve method (Ives) A cychc method (Langdon) mathematics (notably the four-color prob- CONCLUSION lem). Full treatment of these methods ACKNOWLEDGMENTS would be beyond the scope of this paper- REFERENCES they are mentioned here to emphasize that, in practice, there are usually alter- natives to the "brute-force" method of gen- erating permutations. We will see one ex- ample of how permutation generation can sometimes be greatly improved with a backtracking technique. v In the few applications that remain where permutation generation is really re- (iterative) design approaches can lead to quired, it usually doesn't matter much the same program. which generation method is used, since Permutation generation methods not the cost of processing the permutations far only illustrate programming issues in high-level (procedural) languages; they TABLE 1. APPROXIMATE TIME NEEDED TO GENERATE also illustrate implementation issues in ALL PERMUTATIONS OF N (1 /zsec per permutation) low-level (assembly) languages. In this pa- N NI Time per, we shall try to find the fastest possible way to generate permutations by com- 1 1 puter. To do so, we will need to consider 2 2 some program "optimization" methods (to 3 6 get good implementations) and some 4 24 mathematical analyses (to determine 5 120 which implementation is best). It turns out 6 720 that on most computers we can generate 7 5040 each permutation at only slightly more 8 40320 9 362880 than the cost of two store instructions. 10 3628800 3 seconds In dealing with such a problem, we must 11 39916800 40 seconds be aware of the inherent limitations. 12 479001600 8 minutes Without computers, few individuals had 13 6227020800 2 hours the patience to record all 5040 permuta- 14 87178291200 1 day tions of 7 elements, let alone all 40320 15 1307674368000 2 weeks permutations of 8 elements, or all 362880 16 20922789888000 8 months permutations of 9 elements. Computers 17 355689428096000 10 years

Computing Surveys, Vol 9, No 2, June 1977 Permutation Generation Methods • 139 exceeds the cost of generating them. For P[1]:=:P[2] example, to evaluate the performance of to mean "exchange the contents of array an operating system, we might want to try elements P[1] and P[2]". This instruction all different permutations of a fLxed of gives both arrangements of the elements tasks for processing, but most of our time P[1], P[2] (i.e., the arrangement before the would be spent simulating the processing, exchange and the one after). For N = 3, not generating the permutations. The several different of five ex- same is usually true in the study of combi- changes can be used to generate all six natorial properties of permutations, or in permutations, for example the analysis of sorting methods. In such applications, it can sometimes be worth- P[1] =:P[2] P[2]:=:P[3] while to generate "random" permutations P[1] =:P[2] to get results for a typical case. We shall P[2]-=:P[3] examine a few methods for doing so in this P[1]:=-P[2]. paper. In short, the fastest possible permuta- If the initial contents of P[1] P[2] P[3] are tion method is of limited importance in A B C, then these five exchanges will pro- practice. There is nearly always a better duce the permutations B A C, B C A, way to proceed, and if there is not, the C B A,C A B, andA C B. problem becomes really hopeless when N It will be convenient to work with a is increased only a little. more compact representation describing Nevertheless, permutation generation these exchange sequences. We can think of the elements as passing through "permu- provides a very instructive exercise in the implementation and analysis of algo- tation networks" which produce all the rithms. The problem has received a great permutations. The networks are com- deal of attention in the literature, and the prised of "exchange modules" such as that techniques that we learn in the process of shown in Diagram 1 which is itself the carefully comparing these interesting al- gorithms can later be applied to the per- haps more mundane problems that we face from day to day. DIAGRAM 1 We shall begin with simple algorithms that generate permutations of an array by permutation network for N = 2. The net- successively exchanging elements; these work of Diagram 2 implements the ex- algorithms all have a common control change given above for N = 3. structure described in Section 1. We then The elements pass from right to left, and a will study a few older algorithms, includ- new permutation is available after each ing some based on elementary operations exchange. Of course, we must be sure that other than exchanges, in the framework of the internal permutations generated are this same control structure (Section 2). Fi- distinct. For N = 3 there are 35 = 243 nally, we shall treat the issues involved in possible networks with five exchange mod- the implementation, analysis, and "opti- ules, but only the twelve shown in Fig. 1 mization" of the best of the algorithms are "legal" (produce sequences of distinct (Section 3). permutations). We shall most often repre- sent networks as in Fig. 1, namely drawn 1. METHODS BASED ON EXCHANGES vertically, with elements passing from top A natural way to permute an array of to bottom, and with the permutation se- elements on a computer is to exchange two of its elements. The fastest permutation algorithms operate in this way: All N! per- mutations of N elements are produced by a :I I sequence of N!-1 exchanges. We shall use the notation DIAGRAM 2.

Computing Surveys, Vol 9, No 2, June 1977 140 • R. Sedgewick

~ABC ~ABC ABC A C D A--A D--D B BAC ACB 'CBA BCA BCA --'CAB CBA BAC 'BAC o ciEB o-VA ^ CAB CAB 'BCA D~G~ 3. ACB CBA ACB sively, we can build up networks for four A B C ~ A BC ~A B C BAC ACB CBA elements from one of these. For example, BCA BCA CAB using four copies of the f'Lrst network in ACB CBA ACB Fig. 1, we can build a network for N = 4, ~ CAB CAB BCA as shown in Diagram 3. This network fills CBA BAC BAC P[4] with the values D, C, B, A in de- creasing alphabetic (and we could ~ABC ~ABC ABC BAC ACB ~ CBA clearly build many similar networks CAB CAB BCA which fill P[4] with the values in other ACB CBA ACB BCA BCA ~ CAB orders). CBA BAC BAC The corresponding network for five ele- ments, shown in Diagram 4, is more com- A B C ~ A B C ~A BC plicated. (The empty boxes denote the net- BAC ACB CBA work of Diagram 3 for four elements). To CAB CAB BCA get the desired decreasing sequence in CBA BAC BAC ~ BCA BCA CAB P[5], we must exchange it successively ACB CBA ACB with P[3], P[1], P[3], P[1] in-between gen- erating all permutations of P[1],... ,P[4]. FIGURE 1. Legal permutation networks for three In general, we can generate all permu- elements. tations of N elements with the following recursive procedure: quences that are generated explicitly writ- ten out on the right. Algorithm 1. It is easy to see that for larger N there procedure permutations (N); will be of legal networks. begin c: = 1; The methods that we shall now examine loop: will show how to systematically construct if N>2 then permutatmns(N-1) networks for arbitrary N. Of course, we endif; while c

Computang Surveys, Vol 9, No. 2, June 1977 Permutation Generation Methods • 141 c:=1 until N do .'- were it not for the this method in which the indices can be need to test the control counter c within easily computed and it is not necessary to the loop. The array B[N,c] is an index precompute the index table. table which tells where the desired value The fLrst of these methods was one of the of P[N] is after P[1],. • • ,P[N-1] have been earliest permutation generation algo- run through all permutations for the cth rithms to be published, by M. B. Wells in time. 1960 [47]. As modified by J. Boothroyd in We still need to specify how to compute 1965 [1, 2], Wells' algorithm amounts to B[N,c]. For each value of N we could spec- using ify any one of (N-l)! sequences in which /~-c ifN is even and c> 2 to fill P[N], so there are a total of t~N,c] (N-1)!(N-2)!(N-3)!. • • 3!2!1! different ta- - 1 otherwise, bles B[N,c] which will cause Algorithm 1 to properly generate allN! permutations of or, in Algorithm 1, replacing P[1], • .. ,P[N]. P[B[N,c ]]:=:P[N] by One possibility is to precompute BIN,c] if (N even) and (c>2) by hand (since we know that N is small), then P[N]:=:P[N-c] continuing as in the example above. If we else P[N]:=:P[N-1] endif adopt the rule that P[N] should be filled It is rather remarkable that such a simple with elements in decreasing order of their method should work properly. Wells gives original index, then the network in Dia- a complete formal proof in his paper, but gram 4 tells us that B[5,c] should be 1,3,1,3 many readers may be content to check the for c = 1,2,3,4. ForN = 6 we proceed in the method for all practical values of N by same way: if we start withA B C D E F, constructing the networks as shown in the then the l~LrstN = 5 subnetwork leaves the example above. The complete networks for elements in the order C D E B A F, so N = 2,3,4 are shown in Fig. 2. that B[6,1] must be 3 to get the E into P[6], In a short paper that has gone virtually leaving C D F B A E. The second N = 5 unnoticed, B.R. Heap [16] pointed out sev- subnetwork then leaves F B A D C E, so eral of the ideas above and described a that B[6,2] must be 4 to get the D into P[6], method even simpler than Wells'. (It is not etc. Table 2 is the full table for N <- 12 clear whether Heap was influenced by generated this way; we could generate per- Wells or Boothroyd, since he gives no ref- mutations with Algorithm 1 by storing erences.) Heap's method is to use these N(N- 1) indices. There is no reason to insist that P[N] B(N,c)=( I fiN is odd should be filled with elements in decreas- lfY is even, ing order. We could proceed as above to build a table which fills P[N] in any order or, in Algorithm 1, to replace we choose. One reason for doing so would P[B[N,c]]:=:P[N] by be to try to avoid having to store the table: ifN odd then P[N]:=:P[1] else P[N]:=:P[c]endif there are at least two known versions of Heap gave no formal proof that his method works, but a proof similar to Wells' will TABLE 2. I_~EX TA~LEB[N, c] FOaAJP,_,oRrrHM 1 show that the method works for all N. 2 1 (The reader may find it instructive to ver- 3 11 ify that the method works for practical 4 123 values of N (as Heap did) by proceeding as 5313 1 6343 23 we did when constructing the index table N 753153 1 above.) Figure 3 shows that the networks 852721 23 for N = 2,3,4 are the same as for Algo- 971553 371 rithm 1 with the precomputed index table, 1078165 4923 but that the network for N = 5, shown in 1197531 9753 1 Diagram 5, differs. (The empty boxes de- 12963109 4389 23 note the network for N = 4 from Fig. 3.)

Computing Surveys, Vol. 9, No. 2, June 1977 142 R. Sedgewick

N=4 jumping to the place following the call on ABCD return. The following program results di- BACD rectly when we remove the recursion and BCAD the loops from Algorithm 1: t:=N; CBAD begtn: c[d:=l; CABD loop: if t>2 then t:=t-1; go to begtn end[f; N=2 ACBD return: if c[t]>-t then go to extt end[f; AB ACDB P[B[c#]]] =:P[t]; BA CADB c[~]:=c[t] + l; H go to loop; C DA B extt" if t2: t.=t-1; c[t]:=l repeat; BCA BDAC return, if c[t]>-z CBA then if t2. t:=t-1 repeat; rived at their nonrecursive programs di- return" [f c[t]>-t rectly, it is instructive here to derive a then if t2:t:=z-1 repeat; loop: and incrementing i on return, we ensure [f c[t]

Computing Surveys, Vol 9, No 2, June 1977 Permutation Generation Methods • 143

N=4 really, this is done by turning the permu- ABC D tation generation program into a proce- BAC D dure which returns a new permutation C A B D each time it is called. A main program is A C B D then written to call this procedure N! times, and process each permutation. (In B C A D this form, the permutation generater can N=2 C BAD be kept as a library subprogram.) A more DBA C efficient way to proceed is to recognize that B A il B DA C the permutation generation procedure is !l A DB C really the "main program" and that each DA BC permutation should be processed as it is BADC generated. To indicate this clearly in our A BDC programs, we shall assume a macro called N=3 process which is to be invoked each time a i I AC DB ABe new permutation is ready. In the nonre- CADB ! BA C cursive version of Algorithm 1 above, if we DACB put a call to process at the beginning and AeB A DC B another call to process after the exchange BOA C DA B statement, then process will be executed DCAB N! times, once for each permutation. From DCBA now on, we will explicitly include such C DBA calls to process in all of our programs.

B De A The same transformations that we ap- plied to Algorithm 1 yield this nonrecur- D B CA sive version of Heap's method for generat- C BDA ing and processing all permutations of BC DA P[1], • • • ,P[N]: FIGURE 3. Heap's algorithm for N = 2,3,4. Algorithm 2 (Heap) is attractive because it is symmetric: each ~:=N; loop: c[~]:=l while ~>2:~:=~-1 repeat; time through the loop, either c[i] is initial- process; ized and i incremented, or c[i] is incre- loop' if c/t]

-B B- ~C ter [45], apparently independently. They -C C- -D discovered that it was possible to generate ~D a ^ all N! permutations of N elements with DIAGRAM 5. N!-1 exchanges of adjacent elements.

Computing Surveys, Col 9, No 2, June 1977 144 • R. Sedgewick

A---~ B B B B Nffi4 ABCD BAC D i.!-Tii ! BCAD E E ' E ETA DL~GRAM 6 BCDA C BDA

The method is based on the natural idea N=2 C BA D that for every permutation of N-1 ele- CABD A B ments we can generate N permutations of ACBD N elements by inserting the new element B A A C DB into all possible positions. For example, for H five elements, the first four exchange mod- CADB ules in the permutation network are as CDA B shown in Diagram 6. The next exchange is CDBA N= P[1]:=:P[2], which produces a new permu- DC BA A BC tation of the elements originally in P[2], DCAB BA C P[3], P[4], P[5] (and which are now in P[1], DAC B BCA P[2], P[3], P[4]). Following this exchange, ADCB we bring A back in the other direction, as C BA ADBC illustrated in Diagram 7. Now we ex- CAB D/kBC change P[3]:=:P[4] to produce the next ACB permutation of the last four elements, and DBAC continue in this manner until all 4! permu- DBCA tations of the elements originally in P[2], BDCA P[3], P[4], P[5] have been generated. The BDAC network makes five new permutations of BA DC the five elements for each of these (by ABDC putting the element originally in P[1] in all possible positions), so that it generates FIGURE 4. Johnson-Trotter algorithm for N = 2, a total of 5! permutations. 3, 4. Generalizing the description in the last paragraph, we can inductively build the Continuing the example above, we get the network for N elements by taking the net- full network for N = 5 shown in Figure 5. work for N-1 elements and inserting By connecting the boxes in this network, chains of N-1 exchange modules (to sweep we get the network for N = 4. the first element back and forth) in each To develop a program to exchange ac- space between exchange modules. The cording to these networks, we could work main complication is that the subnetwork down from a recursive formulation as in for N-1 elements has to shift back and the preceding section, but instead we shall forth between the first N-1 lines and the take a bottom-up approach. To begin, last N-1 lines in between sweeps. Figure imagine that each exchange module is la- 4 shows the networks for N = 2,3,4. The belled with the number of the network in modules in boxes identify the subnetwork: which it first appears. Thus, for N = 2 the if, in the network for N, we connect the module would be numbered 2; for N = 3 output lines of one box to the input lines of the five modules would be labelled 33233; the next, we get the network for N-1. for N = 4 the 23 modules are numbered 44434443444244434443444;

A.B I for N ~ 5 we insert 5 5 5 5 between the numbers above, etc. To write a program to C- : :2:T:T: : D generate this sequence, we keep a set of E incrementing counters c[i], 2 < i <- N, DIAa~ 7. which are all initially 1 and which satisfy

Computing Surveys, Vol. 9, No. 2, June 1977 Permutation Generation Methods • 145

i,,, [ ~ ! :: I ',,",','II llll '""II lllIl.II ]Ill ll.l

FIGURE 5 Johnson-Trotter algorithm for N = 5.

1 < c[i] <- i. We fmd the highest index i while i>1: whose counter is not exhausted (not yet if d/~] then k:=c[~]+x equal to i), output it, increment its else k:=~-c[~]+x endif; P[k]:=:P[k + 1]; counter, and reset the counters of the process; larger indices: c[i]:=c[i] + l; i:=1; loop while i<-N: i:=~+1; c[i]:=l repeat; repeat; c[1]: =0; Although Johnson did not present the al- loop: gorithm in a programming language, he i:=N; loop while c[,]=i: c[i]:=l; i:=,-i repeat; did give a very precise formulation from while ,>1: which the above program can be derived. comment exchange module ~s on level ~; Trotter gave an ALC~OLformulation which c[~]:=c[~] + l is similar to Algorithm 3. We shall exam- repeat; ine alternative methods of implementing When i becomes 1, the process is com- this algorithm later. pleted- the statement c[1] = 0 terminates An argument which is often advanced in the inner loop in this case. (Again, there favor of the Johnson-Trotter algorithm is are simple alternatives to this with "event that, since it always exchanges adjacent variables" [24], or in assembly language.) elements, the process procedure might be Now, suppose that we have a Boolean simpler for some applications. It might be variable diN] which is true if the original possible to calculate the incremental effect P[1] is travelling from P[1] down to P[N] of exchanging two elements rather than and false if it is travelling from P[N] reprocessing the entire permutation. (This P[1]. Then, when i = N we can replace the observation could also apply to Algorithms comment in the above program by 1 and 2, but the cases when they exchange nonadjacent elements would have to be if d[N] then k:=c[N] else k:=N-c[N] endif; handled differently.) P[k]:,= :P[k + 1]; The Johnson-Trotter algorithm is often This will take care of all of the exchanges inefficiently formulated [5, 10, 12] because on level N. Similarly, we can proceed by it can be easily described in terms of the introducing a Boolean d[N-1] for level values of elements being permuted, rather N-1, etc., but we must cope with the fact than their positions. If P[1],. • ", P[N] are that the elements originally in originally the integers 1,-.., N, then we P[2],... ,PIN] switch between those loca- might try to avoid maintaining the offset x tions and P[1],...,P[N-1]. This is han- by noting that each exchange simply in- dled by including an offset x which is in- volves the smallest integer whose count is cremented by 1 each time a d[i] switches not yet exhausted. Inefficient implementa- from false to true. This leads to: tions involve actually searching for this Algorithm 3 (Johnson-Trotter) smallest integer [5] or maintaining the in- ~:=1; verse permutation in order to find it [10]. loop while ~

Computing Surveys, Vol. 9, No. 2, June 1977 146 • R. Sedgewick

rithms 2 and 3. The similarities become digit which is not 9 and change all the striking when we consider an alternate nines to its right to zeros. If the digits are implementation of the Johnson-Trotter stored in reverse order in the array method: c[N],c[N - 1], ... ,c[2],c[1] (according to the way in which we customarily write Algorithm 3a (Alternate Johnson-Trotter) numbers) we get the program z" =N; loop: c[t]:=l; d/l]:=true; while l>l: ~.=~ -1 repeat; t:=N, loop c[~]:=O while t>l l:=~-I repeat; process, loop: loop: if c[~]<9 then c[d:=c[t]+ l; z =1 if c[t]

Computing Surveys, Vol 9, No 2, June 1977 Permu~n Ge~ra~n Met~ • 147 N=4 ~=4 i 111 iiii 1121 1112 • 1211 1113 1221 1114 1311 1121 1321 1122 N-2 2111 N=2 1123 11 ii 2121 1124 21 12 2211 1131 2221 1132 2311 1133 N=3 2321 N=3 1134 111 iii 3111 1211 121 112 3121 1212 211 113 3211 1213 221 121 3221 1214 311 122 3311 1221 321 123 3321 1222 4111 1223 4121 1224 42 ll 1231 4221 1232 4811 1233 4321 1234 (a) (b)

Fmu~ 6. Factorial counting: c[N],..., c[1]. (a) Using Mgorithm 2 (rec~lve). (b) Using Mgo- rithm 3a (iterative). in Algorithm 3 has three main purposes: to Algorithm 3b (Loopless Johnson-Trotter) find the highest index whose counter is not ::=0; exhausted, to reset the counters at the loop while : 1: til they are needed. Finally, rather than if d/d then k: =c[~]+x[d computing a "global" offset x, we can else k: =:-c[d+x[i] endif; maintain an array x[i] giving the current P[k]:=:P[k + 1]; offset at level i: when d[i] switches from process; false to true, we increment x[s[i]]. These c[Q: =c[d + 1; changes allow us to replace the inner repeat; "loop...repeat" in Algorithm 3 by an This algorithm differs from those de- "ft. • • endif'. scribed in [10, 11, 12], which are based on

Computing Surveys, Vol. 9, No 2, June 1977 148 * R. Sedgewick the less efficient implementations of the Johnson-Trotter algorithm mentioned :[: I__l I ~- T ^ above. The loopless formulation is pur- il[l I : L_: ported to be an improvement because each DIAGRAM 8 iteration of the main loop is guaranteed to produce a new permutation in a ffLxed up the network for N elements from the number of steps. network for N-2 elements. We begin in However, when organized in this form, the same way as in the Johnson-Trotter the unfortunate fact becomes apparent method. For N = 5, the first four ex- that the loopless Algorithm 3b is slower changes are as shown in Diagram 6. But than the normal Algorithm 3. Loopfree im- now the next exchange is P[1]:=:P[5], plementation is not an improvement at all! which not only produces a new permuta- This can be shown with very little analysis tion of P[1],...,P[4], but also puts P[5] because of the similar structure of the al- back into its original position. We can per- gorithms. If, for example, we were to count form exactly these five exchanges four the total number of times the statement more times, until, as shown in Diagram 8, c[i]:=l is executed when each algorithm we get back to the original configuration. generates all N! permutations, we would At this point, P[1],...,P[4] have been find that the answer would be exactly the rotated through four permutations, so that same for the two algorithms. The loopless we have taken care of the case N = 4. If we algorithm does not eliminate any such as- (inductively) permute three of these ele- signments; it just rearranges their order of ments (Ives suggests the middle three) execution. But this simple fact means that then the 20 exchanges above will give us Algorithm 3b must be slower than Algo- 20 new permutations, and so forth. (We rithm 3, because it not only has to execute shall later see a method which makes ex- all of the same instructions the same num- clusive use of this idea that all permuta- ber of times, but it also suffers the over- tions of N elements can be generated by head of maintaining the x and s arrays. rotating and then generating all permuta- We have become accustomed to the idea tions of N-1 elements.) Figure 7 shows the that it is undesirable to have programs networks for N = 2,3,4; the full network with loops that could iterate N times, but for N = 5 is shown in Fig. 8. As before, if this is simply not the case with the John- we connect the boxes in the network for N, son-Trotter method. In fact, the loop iter- we get the network for N-2. Note that the ates N times only once out of the N! times exchanges immediately preceding the that it is executed. Most often (N-1 out of boxes are redundant in that they do not every N times) it iterates only once. If N produce new permutations. (The redun- were very large it would be conceivable dant permutations are identified by paren- that the very few occasions that the loop theses in Fig. 7.) However, there are rela- iterates many times might be inconven- tively few of these and they are a small ient, but since we know that N is small, price to pay for the lower overhead in- there seems to be no advantage whatso- curred by this method. ever to the loopless algorithm. In the example above, we knew that it Ehrlich [10] found his algorithm to run was time to drop down a level and permute "twice as fast" as competing algorithms, the middle three elements when all of the but this is apparently due entirely to a original elements (but specifically P[1] simple coding technique (described in Sec- and P[5]) were back in position. If the ele- tion 3) which he applied to his algorithm ments being permuted are all distinct, we and not to the others. can test for this condition by intially sav- ing the values of P[1], • • • ,P[N] in another Another Iterative Method array Q[1],-.. ,Q[N]: In 1976, F. M. Ives [19] published an ex- Algorithm 4 (Ives) change-based method like the Johnson- ~:=N; Trotter method which does represent an loop: c[~]'=l; Q[~].=P[~], while ~<1. ~ =~-1 repeat; improvement. For this method, we build process,

Computing Surveys, Vol 9, No 2, June 1977 Permutation Generation Methods • 149 loop. counters are used. However, we can use if c[t]l: t:=t-1 repeat, else t:=l; process; process loop endif, if c[t]

D BC A 2. OTHER TYPES OF ALGORITHMS (A B C D)

A BC A C B D In this section we consider a variety of B AC C A B D algorithms which are not based on simple ,• C BAD exchanges between elements. These algo- BC A rithms generally take longer to produce all B DA A C B C permutations than the best of the methods CAB A BDC already described, but they ave worthy of C BA BA D C study for several reasons. For example, in B D A C some situations it may not be necessary to B D C A generate all permutations, but only some A DC B "random" ones. Other algorithms may be DA C B of practical interest because they are based DC A B on elementary operations which could be D C BA as efficient as exchanges on some com- (A C B D) puters. Also, we consider algorithms that generate the permutations in a particular (A B C D) order which is of interest. All of the algo- FmURE 7 Ires' algorithm for N = 2,3,4 rithms can be cast in terms of the basic

r ii i[ ii 1,~1 ii ][ 1[ Ira 11 I[ II b I[ II II L,I II II II. Id. ]1 II 1I I~ ,~ ,[, il[ Jl II ll, pllifl i~lt IJ ~{iili]~ Ill'ifE::E ~l[I II l} JlI!1[I ~,l II )1JJ II IJIlIqll'lg ~l~ II [l ~1u ,I 1[[ ll, l[ll~ [I I~! t~ II 11 II IP I[ II il ~1~ II II li IP II II II IP [I li ~ll Ig I I

FIGURE 8 Ives' algorithm for N = 5

Computing Surveys, Vol 9, No 2, June 1977 150 • R. Sedgewick "factorial counting" control structure de- loop: scribed above. rotate(i) if c[t]2 then permutatmns(N-1) loop: else process end[f; if N>2 then permutattons(N-1) P[c ]: = :P[N]; end[f; while c2:t:=t-1 repeat; it is often possible easily to achieve great process; savings with this type of procedure. Often,

Computing Surveys, Vol. 9, No. 2, June 1977 Permutation Generation Methods • 151 N=4 permutations generated by permuting its A B C D initial elements will not satisfy the crite- BA C D ria either, so the exchanges and the call (A B CD) permutatmns(N-1) can be skipped. Thus large numbers of permutations never need B C A O be generated. A good backtracking pro- C B A D gram will eliminate nearly all of the proc- (B ¢ A D) essing (see [44] for a good example), and CA B D such methods are of great importance in AC B D many practical applications. (C A B D) Cycling is a powerful operation, and we (A B CD) should expect to find some other methods B C DA which use it to advantage. In fact, Tomp- kins' original paper [44] gives a general N=2 C B DA proof from which several methods can be A B (B C DA) constructed. Remarkably, we can take Al- B A C D B A gorithm 5 and switch to the counter sys- H(A B) DC BA tem upon which the iterative algorithms (C D BA) in Section 2 were based: DB C A t:=N; loop: c[t]:=l while t>l. t:=t-1 repeat; B D CA process; N 3 (D B CA) loop: (B C OA) rotate(N +1 - t); --4 A BC C D A B ifc/d

Computing Surveys, Vol 9, No 2, June 1977 152 • R. Sedgewick loop: when very fast rotation is available is it rotate(t); the method of choice. We shall examine if P[z]=Q[t] then t:=N else t.=t-1 endif; the implementation of this algorithm in process; while t->l repeat; detail in Section 3. This is definitely the most simply ex- Lexicographic Algorithms pressed of our permutation generation al- A particular ordering of the N! permuta- gorithms. If P[1],...,P[N] are initially tion of N elements which is of interest is 1,...,N, then we can eliminate the ini- "lexicographic", or alphabetical, ordering. tialization loop and replace Q[i] by i in the The lexicographic ordering of the 24 per- main loop. Unfortunately, this algorithm mutations of A B C D is shown in Fig. runs slowly on most computers-only lla. "Reverse lexicographic" ordering, the result of reading the lexicographic se-

N=4 quence backwards and the permutations from right to left, is also of some interest. A BC D Fig. llb shows the reverse lexicographic B C DA ordering of the 24 permutations of C DAB ABCD. D A B C The natural definition of these orderings (A B C D) has meant that many algorithms have B C A D been proposed for lexicographical permu- C A D B tation generation. Such algorithms are in- herently less efficient than the algorithms A DBC N=2 of Section 1, because they must often use D B C A more than one exchange to pass from one (B CAD) B A permutation to the next in the sequence C ABD (A B) (e.g., to pass from A C D B to A D B C A B D C in Fig. lla). The main practical reason B DCA that has been advanced in favor of lexico- D CAB graphic generation is that, in reverse or- (C A B D) der, all permutations of P[1],.--,PIN-l] N= are generated before P[N] is moved. As (A B C D) A BC with backtracking, a processing program BA C D could, for example, skip (N-l)! permuta- B C A A C D B tions in some instances. However, this CAB C D BA property is shared by the recursive algo- (A B C) D B A C rithms of Section 1- in fact, the general BAC (B A CD) structure of Algorithm 1 allows P[N] to be AC B A C B D filled in any arbitrary order, which could CBA C B DA be even more of an advantage than lexico- (B A C) B D A C graphic ordering in some instances. (A B C) Nevertheless, lexicographic generation DA C B is an interesting problem for which we are (A C B D) by now well prepared. We shall begin by C B A D assuming that P[1],...P[N] are distinct. BA D C Otherwise, the problem is quite different, AD C B since it is usual to produce a lexicographic D C BA listing with no duplicates (see [3, 9, 39]. (C B A D) We shall fmd it convenient to make use (BA CD) of another primitive operation, reverse(i). This operation inverts the order of the ele- (A B C D) ments in P[1], • • • P[i]; thus, reverse(i) is FIGURE 10. Langdon's algorithm for N = 2,3,4. equivalent to

Computing Surveys, Vol 9, No. 2, June 1977 Permutation Generation Methods • 153

ABCD ABCD "dramatic improvement in the state of the ABDC BACD art" of computer programming since the ACBD ACBD algorithm is easily expressed in modern ACDB CABD languages; the fact that the method is over 160 years old is perhaps a more sobering ADBC BCAD comment on the state of the art of com- ADCB CBAD puter science.) BACD ABDC The direct lexicographic successor of the BADC BADC permutation BCAD ADBC BACFHGDE BCDA DABC is clearly BDAC BDAC BACFHGED, BDCA DBAC CABD ACDB but what is the successor of this permuta- CADB CADB tion? After some study, we see that CBAD ADCB H G E D are in their lexicographically CBDA DACB highest position, so the next permutation CDAB CDAB must begin as B A C G • • -. The answer is the lexicographically lowest permuta- CDBA DCAB tion that begins in this way, or DABC BCDA BACGDEFH. DACB CBDA DBAC BDCA Similarly, the direct successor of DBCA DBCA HFEDGCAB DCAB CDBA in reverse is DCBA DCBA (a) (b) DEGHFCAB, ~otra~ 11. ~m~grap~c o~eHng of A B C D. and its successor is (a) In natural o~er. (b) In rever~ order. EDGHFCAB.

~:=i; The algorithm to generate permutations loop while ~ P[i-1]. If there is no cial hardware is available. However, it such i, then the elements are in reverse seems to be inherent in lexicographic gen- order and we terminate the algorithm eration. since there is no successor. (For efficiency, The furst algorithm that we shall con- it is best to make P[N+I] larger than all sider is based on the idea of producing each the other elements-written P[N+I] = permutation from its lexicographic prede- ~-and terminate when i = N+I.) Other- cessor. Hall and Knuth [15] found that the wise, we exchange P[i] with the next-low- method has been rediscovered many times est element among P[1],..-,P[i-1] and since being published in 1812 by Fischer then reverse P[1],. • • ,P[i-1]. We have and Krause [14]. The ideas involved first appear in the modern literature in a rudi- Algorithm 7 (Fischer-Krause) mentary form in an algorithm by G. P[N+I]=~; process; Schrack and M. Shimrat [40]. A full for- loop. mulation was given by M. Shen in 1962 t:=2; loop while P[~]P[i]:j:=j + 1 repeat; the problem as an example to illustrate a P[~]:=:P~];

Computing Surveys, Vol. 9, No 2, June 1977 154 • R. Sedgewick reverse(~ -1); There seem to be no advantages to this process; method over methods like Algorithm 1. repeat; Howell [17, 18] gives a lexicographic Like the Tompkins-Paige algorithm, this method based on treating P[1], • • • ,P[N] as algorithm is not as inefficient as it seems, a base-N number, counting in base N, and since it is most often scanning and revers- rejecting numbers whose digits are not dis- ing short strings. tinct. This method is clearly very slow. This seems to be the first example we have seen of an algorithm which does not Random Permutations rely on ~'factorial counting" for its control structure. However, the control structure IfN is so large that we could never hope to here is overly complex; indeed, factorial generate all permutations of N elements, counters are precisely what is needed to it is of interest to study methods for gener- eliminate the inner loops in Algorithm 7. ating ~random" permutations of N ele- A more efficient algorithm can be de- ments. This is normally done by establish- rived, as we have done several times be- ing some one-to-one correspondence be- fore, from a recursive formulation. A sim- tween a permutation and a random num- ple recursive procedure to generate the ber between 0 andN!-l. (A full treatment permutations of P[1],.-. ,P[N] in reverse of pseudorandom number generation by lexicographic order can be quickly devised: computer may be found in [22].) procedure lexperms(N) ; First, we notice that each number be- begin c:= 1; tween 0 and N!-1 can be represented in a loop: mixed radix system to correspond to an if N>2 array c[N],c[N-1],. • • ,c[2] with 0 -< c[i] <- then lexperms(N-1) end[f; ~-1 for 2 -< i -< N. For example, 1000 while c2:~:=~-1 repeat; Such arrays c[N],c[N-1],...,c[2] can process; clearly be generated by the factorial count- loop: ing procedure discussed in Section 1, so if c[l] <~ then P[~ ]:='P[c[l]], reverse(~-l ) , that there is an implicit correspondence c[~]:=c[z]+ l; ~:=2; between such arrays and permutations of process; 1 2 ... N. The algorithms that we shall else c[l]:=l; ~:=~+1 examine now are based on more explicit end[f; correspondences. while ~-

Computing Surveys, Vol 9, No 2, June 1977 Permutation Generation Methods • 155 we begin by writing down the first ele- 1 ment, 1. Since c [2] = 0, 2 must precede 1, 12 312 or 3412 21. 24513 Similarly, c [3] = 2 means that 3 must be 325614 preceded by both 2 and 1: 2436715 213 so that the permutation 2 4 3 6 7 1 5 cor- responds to 1000. In fact, 2 4 3 6 7 1 5 is Proceeding in this manner, we build up the 1000th permutation of 1,. • • ,7 in lexi- the entire permutation as follows: cographic order: there are 6! permutations 2143 before it which begin with 1, then 2.5! 25143 which begin 2 1 or 2 3, then 4! which be- 256143 gin 2 4 1, then 2.3! which begin 2 4 3 1 2756143 or 2 4 3 5, and then 2.2! which begin In general, if we assume that c[1] = 0, we 24 3 6 lor243 65. can construct the permutation P[1],..., A much simpler method than the above P[N] with the program two was apparently first published by R. Durstenfeld [8]. (See also [22], p. 125). We v=l; notice that P[i] has i-1 elements preced- loop. ing it, so we can use the c array as follows: J:=~; ~:=N; loop whilej>c[~]+l: PO].=P[j-1]; repeat; loop while ~->2: P[~]:=:P[c[l]+l]; ~:=z-1 repeat; P[c/l]+ 1]:=~; i'=z+l; If we take P[1],.--,P[N] to be initially while ~- P0] is called an inversion. The ods. counter array in Hall's correspondence We could easily construct a program to counts the number of inversions in the generate all permutations of 1 2 ... N corresponding permutation and is called by embedding one of the methods de- an inversion table. Inversion tables are scribed above in a factorial counting con- helpful combinatorial devices in the study trol structure as defined in Section 1. Such of several sorting algorithms (see [23]). a program would clearly be much slower Another example of the relationship be- than the exchange methods described tween inversion tables and permutation above, because it must build the entire generation can be found in [7], where Dijk- array P[1],.-. ,P[N] where they do only a stra reinvents the Johnson-Trotter method simple exchange. Coveyou and Sullivan using inversion tables. [4] give an algorithm that works this way. D. H. Lehmer [26, 27] describes another Another method is given by Robinson [37]. correspondence that was def'med by D. N. Lehmer as long ago as 1906. To find 3. IMPLEMENTATION AND ANALYSIS the permutation corresponding to 1 2 1 2 2 0, we first increment each ele- The analysis involved in comparing a ment by 1 to get number of computer algorithms to perform the same task can be very complex. It is 232331. often necessary not only to look carefully Now, we write down 1 as before, and for at how the algorithms will be imple- i = 2,.-.,N, we increment all numbers mented on real computers, but also to which are -> c[i] by 1 and write c[i] to the carry out some complex mathematical left. In this way we generate analysis. Fortunately, these factors pres-

Computing Surveys, Vol 9, No 2, June 1977 156 • R. Sedgewick ent less difficulty than usual in the case of name, or an indexed memory reference. permutation generation. First, since all of For example, ADD 1,1 means "increment the algorithms have the same control Register I by r'; ADD l,J means "add the structure, comparisons between many of contents of Register J to Register r'; and them are immediate, and we need only ADD I,C(J) means "add to Register I the examine a few in detail. Second, the anal- contents of the memory location whose ad- ysis involved in determining the total run- dress is found by adding the contents of ning time of the algorithms on real com- Register J to C". In addition, we shall use puters (by counting the total number of control transfer instructions of the form times each instruction is executed) is not OPCODE LABEL difficult, because of the simple counting algorithms upon which the programs are namely JMP (unconditional transfer); JL, based. JLE, JE, JGE, JG (conditional transfer ac- If we imagine that we have an impor- cording as whether the first operand in the tant application where all N! permuta- last CMP instruction was <, -<, =, ->, > tions must be generated as fast as possible, than the second); and CALL (subroutine it is easy to see that the programs must be call). Other conditional jump instructions carefully implemented. For example, if we are of the form are generating, say, every permutation of OPCODE REGISTER,LABEL 12 elements, then every extraneous in- struction in the inner loop of the program namely JN, JZ, JP (transfer if the specified will make it run at least 8 minutes longer register is negative, zero, positive). Most on most computers (see Table 1). machines have capabilities similar to Evidently, from the discussion in Sec- these, and readers should have no diffi- tion 1, Heap's method (Algorithm 2) is the culty translating the programs given here fastest of the recursive exchange algo- to particular assembly languages. rithms examined, and Ives' method (Algo- Much of our effort will be directed to- rithm 4) is the fastest of the iterative ex- wards what is commonly called code opti- change algorithms. All of the algorithms mization: developing assembly language in Section 2 are clearly slower than these implementations which are as efficient as two, except possibly for Langdon's method possible. This is, of course, a misnomer: (Algorithm 6) which may be competitive while we can usually improve programs, on machines offering a fast rotation capa- we can rarely "optimize" them. A disad- bility. In order to draw conclusions com- vantage of optimization is that it tends to paring these three algorithms, we shall greatly complicate a program. Although consider in detail how they can be imple- significant savings may be involved, it is mented in assembly language on real com- dangerous to apply optimization tech- puters, and we shall analyze exactly how niques at too early a stage in the develop- long they can be expected to run. ment of a program. In particular, we shall As we have done with the high-level not consider optimizing until we have a language, we shall use a mythical assem- good assembly language implementation bly language from which programs on real which we have fully analyzed, so that we computers can be easily implemented. can tell where the improvements will do (Readers unfamiliar with assembly lan- the most good. Knuth [24] presents a fuller guage should consult [21].) We shall use discussion of these issues. load (LD), stere (ST), add (ADD), subtract Many misleading conclusions have been (SUB), and compare (CMP) instructions drawn and reported in the literature based which have the general form on empirical performance statistics com- paring particular implementations of par- LABEL OPCODE REGISTER, OPERAND ticular algorithms. Empirical testing can (optional) be valuable in some situations, but, as we The first operand will always be a sym- have seen, the structures of permutation bolic register name, and the second oper- generation algorithms are so similar that and may be a value, a symbolic register the empirical tests which have been per-

Computing Surveys, Vol. 9, No 2, June 1977 Permutation Generation Methods • 157 formed have really been comparisons of This direct translation of Algorithm 2 is compilers, programmers, and computers, more efficient than most automatic trans- not of algorithms. We shall see that the lators would produce; it can be further im- differences between the best algorithms proved in at least three ways. First, as we are very subtle, and they will become most have already noted in Section 1, the test apparent as we analyze the assembly lan- while i -< N need not be performed after guage programs. Fortunately, the assem- we have set i to 2 (if we assume N > 1), so bly language implementations aren't we can replace JMP WHILE in Program 1 much longer than the high-level descrip- by JMP LOOP. But this unconditional tions. (This turns out to be the case with jump can be removed from the inner loop many algorithms.) by moving the three instructions at LOOP down in its place (this is called rotating A Recursive Method (Heap) the loop-see [24]). Second, the test for We shall begin by looking at the imple- whether i is even or odd can be made more mentation of Heap's method. A direct efficient by maintaining a separate Regis- "hand compilation" of Algorithm 2 leads ter X which is defined to be 1 ff i is even immediately to Program 1. The right-hand and -1 ff i is odd. (This improvement ap- side of the program listing simply repeats plies to most computers, since few have a the text of Algorithm 2; each statement is jump if even instruction.) Third, the varia- attached to its assembly-language equiva- ble k can be eliminated, and some time lent. saved, if C(I) is updated before the ex-

PROGRAM 1. DIRECT IMPLEMENTATION OF HEAP'S METHOD LD Z,1 LD I,N I =N, INIT ST Z,C(I) loop c[1] =1, CMP 1,2 JLE CALL while/>2 SUB 1,1 I =1-1, JMP INIT repeat, CALL CALL PROCESS process, LOOP LD J,C(I) loop CMP J,I JE ELSE If c[/] THEN LD T,I then AND T,1 JZ T,EVEN if / odd LD K,1 then k =1 JMP EXCH else k =c[i] EVEN LD K,J endlf, EXCH LD T,P(I) LD T1 ,P(K) ST T1 ,P(I) ST T,P(K) P[I] = P[k], ADD J,1 ST J,C(I) ch] =cH+l, LD 1,2 1=2, CALL PROCESS process, JMP WHILE ELSE ST Z,C(I) else c[11"= I, ADD 1,1 I.=i+I endlf, WHILE CMP I,N whlle I-

Computing Surveys, Vol 9, No. 2, June 1977 158 • R. Sedgewick change is made, freeing Register J. These preceding instruction (which therefore improvements lead to Program 2. (The must happen N!-I times). Some of the program uses the instruction LDN X,× instruction frequencies are more compli- which simply complements Register X.) cated (we shall analyze the quantities AN Notice that even if we were to precompute and BN in detail below), but all of the the index table, as in Algorithm 1, we instructions can be labelled in this manner probably would not have a program as effi- (see [31]). From these frequencies, we can cient as Program 2. On computers with a calculate the total running time of the pro- memory-to-memory move instruction, we gram, if we know the time taken by the might gain further efficiency by imple- individual instructions. We shall assume menting the exchange in three instruc- that instructions which reference data in tions rather than in four. memory take two time units, while jump Each instruction in Program 2 is la- instructions and other instructions which belled with the number of times it is exe- do not reference data in memory take one cuted when the program is run to comple- time unit. Under this model, the total run- tion. These labels are arrived at through a ning time of Program 2 is flow analysis which is not difficult for this 19N! + A~ + 10BN + 6N - 20 program. For example, CALL PROCESS (and the two instructions preceding it) time units. These coefficients are typical, must be executed exactly N! times, since and a similar exact expression can easily the program generates all N! permuta- be derived for any particular implementa- tions. The instruction at CALL can be tion on any particular real machine. reached via JLE CALL (which happens ex- The improvements by which we derived actly once) or by falling through from the Program 2 from Program 1 are applicable to most computers, but they are intended only as examples of the types of simple PROGRAM 2. IMPROVED IMPLEMENTATION OF HEAP'S METHOD transformations which can lead to sub- stantial improvements when programs are LD Z,1 1 implemented. Each of the improvements LD I,N 1 results in one less instruction which is exe- INIT ST Z,C(I) N-1 cuted N! times, so its effect is significant. CMP 1,2 N-1 Such coding tricks should be applied only JLE CALL N-1 when an algorithm is well understood, and SUB 1,1 N-1 then only to the parts of the program JMP INIT N-1 which are executed most frequently. For THEN ADD J,1 Nl-1 example, the initialization loop of Pro- ST J,C(I) NW-1 gram 2 could be rotated for a savings of JP X,EXCH Nw-1 N-2 time units, but we have not bothered LD J,2 A N with this because the savings is so insig- EXCH LD T,P(I) Nw-1 nificant compared with the other improve- LD T1 ,P-I(J) NV-1 ments. On the other hand, further im- ST T1,P(I) N w- 1 provements within the inner loop will be ST T,P-I(J) NW-1 available on many computers. For exam- CALL LD 1,2 NI ple, most computers have a richer set of LD X,1 NW loop control instructions than we have CALL PROCESS Nw used: on many machines the last three LOOP LD J,C(I) NI+B~-I instructions in Program 2 can be imple- CMP J,I NI+BN-1 mented with a single command. In addi- JL THEN NI+B~-I tion, we shall examine another, more ad- ELSE ST Z,C(I) BN vanced improvement below. LDN X,X B~ To properly determine the effectiveness ADD 1,1 B N of these improvements, we shall first com- CMP I,N BN plete the analysis of Program 2. In order to JLE LOOP BN do so, we need to analyze the quantities AN

Computing Surveys, Vol 9, No 2, June 1977 Permutation Generation Methods • 159 and B N. In the algorithm, AN is the num- Substituting these into the expression ber of times the test i odd succeeds, and BN above, we find that the total running time is the number of times the test c[i] = 1 suc- of Program 2 is ceeds. By considering the recursive struc- (19 + (l/e) + 10(e-2))N! + 6N + O(1), ture of the algorithm, we quickly find that the recurrence relations or about 26.55N! time units. Table 3 shows the values of the various quantities in this N even analysis. AN = NAN_I + ]' [N-1 N odd We now have a carefully implemented program whose performance we under- and stand, and it is appropriate to consider BN=NBN-, + 1 how the program can be further "optim- hold for N > 1, with A, = B, = 0. These ized." A standard technique is to identify recurrences are not difficult to solve. For situations that occur frequently and han- example, dividing both sides of the equa- die them separately in as efficient a man- tion for BN by N! we get ner as possible. For example, every other exchange performed by Program 2 is sim- BN BN-, 1 BN-2 1 1 ply P[1]:=:P[2]. Rather than have the pro- N w- (N-l)! + N! (N-2)! + ~ + N.T gram go all the way through the main loop 1 1 1 to discover this, incrementing and then ..... 25+~+ +~ testing c[2], etc., we can gain efficiency by or simply replacing i:=2 by i:=3"~rocess; P[1]:=:P[2] in Algorithm 2. (For the pur- BN=N ' E 1 pose of this discussion assume that there is • 2~N kl " a statement i:=2 following the initializa- This can be more simply expressed in tion loop in Algorithm 2.) In general, for terms of the base of the natural loga- any n > 1, we can replace i:=2 by ~:=n+l; rithms, e, which has the series expansion process all permutations of P[1], • •., P[n]. ~k~o 1/k!: it is easily verified that This idea was first applied to the permuta- B N - [N!(e-2)] tion enumeration problem by Boothroyd [2]. For small n, we can quite compactly That is, BN is the integer part of the real write in-line code to generate all permuta- numberN!(e-2) (OrBN = Nl(e-2) + e with tions of P[1],. •., P[n]. For example, tak- 0 <- E < 1). The recurrences for A N can be ing n = 3 we may simply replace solved in a similar manner to yield the CALL LD 1,2 result LD X,1 CALL PROCESS AN = N! ~-, (-1)~ 2~N k! - [N!/e]. in Program 2 by the code in Program 3,

TABLE 3. ANALYSIS OF PROGRAM 2 (TN = 19! + AN + IOBN -{- 6IV - 20) N N! AN BN TN 26.55N!

1 1 0 0 2 2 0 1 40 56+ 3 6 2 4 154 159+ 4 24 8 17 638 637+ 5 120 44 86 3194 3186 6 720 264 517 19130 19116 7 5040 1854 3620 133836 133812 8 40320 14832 28961 1070550 1070496 9 362880 133496 260650 9634750 9634454 10 3628800 1334960 2606501 96347210 96344640 11 39916800 14684570 28671512 1059818936 1059791040 12 479001600 176214840 344058145 12717826742 12717492480

Computing Surveys, Vol 9, No 2, June 1977 160 • R. Sedgewick which efficiently permutes P[1], P[2], P[3]. n+3n! lines of code. This is an example of a (While only the code that differs from Pro- space-time tradeoff where the time saved gram 2 is given here, '

(50/6)N! + A'N + B'N + 6N - 20. An Iterative Method (Ives) Next, the analysis for AN and BN given above still holds, except that the initial The structures of Algorithm 2 and Algo- conditions are different. We find that rithm 4 are very similar, so that a direct "hand compilation" of Ives' method looks A'N 'i~N k! = very much like Program 1. By rotating the loop and maintaining the value N+l-i in a separate register we get Program 4, an improved implementation of Ives' method and the total rum~Jng time of Program 3 is which corresponds to the improved imple- then about 8.88N!. mentation of Heap's method in Program 2. By taking larger values of n we can get The total running time of this program further improvements, but at the cost of is 18N! + 21D N + 10N - 25, PROGRAM 3. OPTIMIZED INNER LOOP FOR PROGRAM 2 where DN is the number of times i: =i + 1 is executed in Algorithm 4. Another quan- CALL LD 1,4 tity, CN, the number of times the test LD X,1 P[N+I-i] = Q[N+I-i] fails, happens to CALL PROCESS cancel out when the total running time is LD T1 ,P(1) computed. These quantities can be ana- LD T2,P(2) lyzed in much the same way that AN and LD T3,P(3) BN were analyzed for Program 2: they sat- ST T1 ,P(2) isfy the recurrences ST T2,P(1) CN = C~v-2 + (N-l)! - (N-2)v CALL PROCESS DN = D1v-2 + (N-2)! ST T3,P(1) so that ST T2,P(3) CN = (N-l)! - (N-2)! + (N-3)! - (N-4)! + "" CALL PROCESS ST T1 ,P(1) DN = (N-2)I + (N-4)! + (N-6)! + • • • ST T3,P(2) and the total running time of Program 2 is CALL PROCESS 18N! + 21(N-2) + O((N-4)!), ST T2,P(1) or about ST T1 ,P(3) CALL PROCESS (18'+ 21 ~N, ST T3,P(1) N(N- i) ]-" ST T2,P(2) Thus Program 4 is faster than Program 2: CALL PROCESS the improved implementation of Ives' method uses less overhead per permuta-

Computing Surveys, Vol 9, No 2, June 1977 Permutation Generation Methods • 161 PROGRAM 4 IMPROVED IMPLEMENTATION OF IVES' (As before, we shall write down only the METHOD new code, but make reference to the entire LD I,N 1 optimized program as "Program 5".) In INIT ST I,C(I) N-1 this program, Pointer J is kept negative so LD V,P(I) N-1 that we can test it against zero, which can ST V,Q(I) N-1 be done efficiently on many computers. CMP 1,1 N-1 Alternatively, we could sweep in the other JLE CALL N-1 direction, and have J range from N- 1 to 0. SUB 1,1 N-1 Neither of these tricks may be necessary JMP INIT N-1 on computers with advanced loop control THEN LD T,P(J) N I-CN- 1 instructions. LD T1 ,P+I(J) NI-CN-1 To find the total running time of Pro- ST T1 ,P(J) NI-CN-1 gram 5, it turns out that we need only ST T,P+I(J) NI-C~-I replace N! by (N-2)! everywhere in the ADD J,1 NI-C~,-1 frequencies in Program 4, and then add ST J,C(I) Nt-C~-I the frequencies of the new instructions. CALL LD 1,1 N I-C~ The result is LD H,N Nw-C~ CALL PROCESS NI 9N! + 2(N-l)! + 18(N-2)! + O((N-4)!), LOOP LD J,C(I) Nm+DN-1 not quite as fast as the "optimized" version CMP J,H NS+DN-1 of Heap's algorithm (Program 3). For a JL THEN NI+DN-1 fixed value of N, we could improve the ELSE LD T,P(I) CN+D,~ program further by completely unrolling LD T1 ,P(H) CN+D~ the inner loop of Program 5. The second ST T1 ,P(I) C,v+D~ through eighth instructions of Program 5 ST I,C(I) C~+D~ could be replaced by CMP T,Q(H) C~+DN JNE CALL CN+DN LD T,P+I ADD 1,1 D~ ST T,P SUB H,1 DN ST V,P+I CMP I,H DN CALL PROCESS JL LOOP D~ LD T,P+2 ST T,P+I tion than the improved implementation of ST V,P+2 Heap's method, mainly because it does less CALL PROCESS counter manipulation. Other iterative LD T,P+3 methods, like the Johnson-Trotter algo- ST T,P+2 rithm (or the version of Ives' method, Al- ST V,P+3 gorithm 4a, which does not require the CALL PROCESS elements to be distinct), are only slightly faster than Heap's method. (This could be done, for example, by a However, the iterative methods cannot macro generator). This reduces the total be optimized quite as completely as we running time to were able to improve Heap's method. In Algorithm 4 and Program 4, the most fre- 7N! + (N-l)! + 18(N-2)! + O((N-4)!) quent operation is P[c[N]]:=:P[c[N]+I]; which is not as fast as the comparable c[N]:=c[N]+l; all but 1IN of the ex- highly optimized version of Heap's method changes are of this type. Therefore, we (with n = 4). should program this operation separately. It is interesting to note that the optimi- (This idea was used by Ehrlich [10, 11].) zation technique which is appropriate for Program 4 can be improved by inserting the recursive programs (handling small the code given in Program 5 directly after cases separately) is much more effective CALL PROCESS than the optimization technique which is

Computing Surveys, Vol. 9, No. 2, June 1977 162 • R. Sedgewick PROGRAM 5 OPTIMIZED INNER LOOP FOR With these assumptions, Langdon's PROGRAM 4 method is particularly simple to imple- ment, as shown in Program 6. Only eight EXLP CALL PROCESS (N-1)I assembly language instructions will suf- LD J,1-N (N-1)v fice on many computers to generate all INLP LD T,P+N+I(J) NI-(N-1)I permutations of {0,1,. • • ,N- 1}. ST T,P+N(J) NI-(N-1)v As we have already noted, however, the ST V,P+N+I(J) NV-(N-1)l MOVEs tend to be long, and the method is CALL PROCESS NV-(N-1) I not particularly efficient ifN is not small. ADD J,1 NV-(N-1)~ Proceeding as we have before, we see that JN J,INLP NI-(N-1)v EN and FN satisfy LD T,P+I (N-1)~ EN = ~ k! ST T,P+N (N-1)v l~k'~N--1 ST V,P+I (N-1)~ FN= ~ kk!=(N+l)!- 1 CMP T,Q+N (N-1)l l~k~N JNE EXLP (N-1)l (Here FN is not the frequency of execution of the MOVE instruction, but the total num- ber of words moved by it.) The total run- appropriate for the iterative programs (loop unrolling). ning time of Program 6 turns out to be A Cyclic Method (Langdon) N,(2N+10+9)+(O(N-2) ') It is interesting to study Langdon's cyclic It is faster than Program 2 for N < 8 and method (Algorithm 6) in more detail, be- faster than Program 4 for N < 4, but it is cause it can be implemented with only a much slower for larger N. few instructions on many computers. In By almost any measure, Program 6 is addition, it can be made to run very fast on the simplest of the programs and algo- computers with hardware rotation capa- rithms that we have seen so far. Further- bilities. more, on most computer systems it will To implement Algorithm 6, we shall use run faster than any of the algorithms im- a new instruction plemented in a high-level language. The MOVE TO, FROM(I) algorithm fueled a controversy of sorts (see which, if Register I contains the number i, other references in [25]) when it was first moves ~ words starting at Location FROM introduced, based on just this issue. to Location TO. That is, the above instruc- Furthermore, if hardware rotation is tion is equivalent to available, Program 6 may be the method of choice. Since (N-1)/N of the rotations are LD J,0 of length N, the program may be optimized LOOP T,FROM(J) in the manner of Program 5 around a four- T,TO(J) instruction inner loop (call, rotate, com- ADD J,1 pare, conditional jump). On some ma- CMPJ,I JL LOOP PROGRAM 6 IMPLEMENTATION OF LANGDON'S METHOD We shall assume that memory references are overlapped, so that the instruction THEN LD I,N-1 NI takes 2i time units. Many computers have CALL PROCESS NI "block transfer" instructions similar to LOOP LD T,P+I NV+E~ this, although the details of implementa- MOVE P,P+I(I) F~ tion vary widely. ST T,P+I(I) NV+E~ For simplicity, let us further suppose CMP T,I NI+E N that PIll,.- .,P[N] are initially the inte- JNE THEN NI+EN gers 0,1,. • • ,N-l, so that we don't have to SUB 1,1 E~ bother with the Q array of Algorithm 6. JNZ LOOP E~

Computing Surveys, Vol 9, No 2, June 1977 Permutation Generation Methods • 163 chines, the rotate might be performed in, and again in computer science, because say, two time units (for example, if paral- new algorithms (and new methods of ex- lelism were available, or if P were main- pressing algorithms) are constantly being tained in registers), which would lead to a developed. Normally, the kind of detailed total time of 5N! + O((N-1)!). We have analysis and careful implementation done only sketched details here because the is- in this paper is reserved for the most im- sues are so machine-dependent: the ob- portant algorithms. But permutation gen- vious point is that exotic hardware fea- eration nicely illustrates the important is- tures can have drastic effects upon the sues. An appropriate choice between algo- choice of algorithm. rithms for other problems can be made by studying their structure, implementation, CONCLUSION analysis, and %ptimization" as we have done for permutation generation. The remarkable similarity of the many permutation enumeration algorithms ACKNOWLEDGMENTS which have been published has made it possible for us to draw some very definite Thanks are due to P Flanagan, who implemented conclusions regarding their performance. and checked many of the algomthms and programs In Section 1, we saw that the method given on a real computer. Also, I must thank the many authors listed below for providmg me with such a by Heap is slightly simpler (and therefore wealth of maternal, and I must apologize to those slightly more efficient) than the methods whose work I may have misunderstood or misrepre- of Wells and Boothroyd, and that the sented. Finally, the editor, P. J. Denning, must be method given by Ives is simpler and more thanked for his many comments and suggestmns for efficient than the methods of Johnson and improving the readability of the manuscmpt. Trotter (and Ehrlich). In Section 2, we found that the cyclic and lexicographic al- REFERENCES gorithms will not compete with these, ex- [1] BOOTI~OYD,J "PERM (Algorithm 6)," Com- cept possibly for Langdon's method, which puter Bulletin 9, 3 (Dec. 1965), 104 avoids some of the overhead in the control [2] BOOTHROVD,J. "Permutation of the elements structure inherent in the methods. By of a vector (Algomthm 29)"; and "Fast permu- tation of the elements of a vector (Algorithm carefully implementing these algorithms 30)," Computer J. 10 (1967), 310-312. in Section 3 and applying standard code [3] BaATLEY, P "Permutations with repetitious optimization techniques, we found that (Algomthm 306)," Comm ACM 1O, 7 (July Heap's method will run fastest on most 1967), 450 [4] Cove,YOU, R R.; AND SULLrVAN, J. computers, since it can be coded so that G. "Permutahon (Algorithm 71)," Comm. most permutations are generated with ACM 4, 11 (Nov 1961), 497. only two store instructions. [5] D~,aSHOWITZ,N. "A simplified loop-free algo- rithm for generating permutatlous," BIT 15 However, as discussed in the Introduc- (1975), 158-164. tion, our accomplishments must be kept in [6] DIJKSTRA, E W A dzsc~pllne of program- perspective. An assembly-language imple- m~ng, Prentice-Hall, Englewood Cliffs, N J ,, mentation such as Program 3 may run 50 1976. [7] DZg~STaA, E. W "On a gauntlet thrown by to 100 times faster than the best previously David Gries," Acta Informat~ca 6, 4 (1976), published algorithms (in high-level lan- 357. guages) on most computer systems, but [8] DURSTENFELD,R. "Random permutation (AI- this means merely that we can now gener- gomthm 235)," Comm. ACM 7, 7 (July 1964), 420. ate all permutations of 12 elements in one [9] EAVES, B. C "Permute (Algorithm 130)," hour of computer time, where before we Comm. ACM 5, 11 (Nov. 1962), 551 (See also: could not get toN = 11. On the other hand, remarks by R. J. Ord-Smlth m Comm ACM if we happen to be interested only in all 10, 7 (July, 1967), 452-3 ) [10] EHRLICH, G. "Loopless algomthms for gener- permutations of 10 elements, we can now atmjg permutations, and other get them in only 15 seconds, rather than 15 combinatorial configurations," J ACM 20, 3 minutes. (July 1973), 500-513. [11] EHRLICH, G "Four combmatomal algomthms The problem of comparing different al- (Algomthm 466)," Comm ACM 16, 11 (Nov gorithms for the same task arises again 1973), 690-691.

Computing Surveys, Vol 9, No 2, June 1977 164 • R. Sedgewick [12] EWN, S. Algortthmtc combtnatortcs, Macmil- 452 (See also: remarks in Comm ACM 12, 11 lan, Inc., N.Y., 1973. (Nov 1969), 638.) [13] FIKE, C T. "A permutation generation [32] Om)-SMITH, R J "Generation of permuta- method," Computer J 18, 1 (Feb. 1975), 21-22. tions m lexicographlc order (Algorithm 323)," [14] FISCHER, L. L.; AND KRAUSE, K. C, Lehr- Comm. ACM 11, 2 (Feb. 1968), 117 (See also: buch der Combtnattonslehre und der Artthme- certification by I. M. Leltch in Comm ACM tzk, Dresden, 1812 12, 9 (Sept. 1969), 512.) [15] HALL,M.; ANDKNUTH, D.E. "Combinatorial [33] PECK, J E. L; AND SCHRACK, G F analysis and computers," Amertcan Math. "Permute (Algorithm 86)," Comm. ACM 5, 4 Monthly 72, 2 (Feb. 1965, Part II), 21-28. (April 1962), 208. [16] HEAl', B R. "Permutations by Inter- [34] PHmLmS, J. P.N. "Permutation of the ele- changes," Computer J. 6 (1963), 293-4. ments of a vector in lexicographic order (Algo- [17] HOWELL, J R. "Generation of permutations rithm 28)," Computer J. 1O (1967), 310-311 by adchhon,"Math. Comp 16 (1962), 243-44 [35] PLESZCZYNSKLS. "On the generation of per- [18] HOWELL,J.R. "Permutation generater (Al- mutations," Informatmn Processtng Letters 3, gorithm 87)," Comm. ACM 5, 4 (April 1962), 6 (July 1975), 180-183. 209. (See also: remarks by R. J. Ord-Smith in [36] RIORDAN,J. An ~ntroductton to comb~nato- Comm ACM 10, 7 (July 1967), 452-3.) rtal analysts, John Wiley & Sons, Inc., N.Y., [19] IvEs, F.M. "Permutation enumeration: four 1958. new permutatmn algorithms," Comm. ACM [37] ROHL,J.S., "Programming improvements to 19, 2 (Feb. 1976), 68-72. Fike's algorithm for generating permuta- [20] JOHNSON,S.M. "Generatlonofpermutations tions," Computer J. 19, 2 (May 1976), 156 by adjacent transposition," Math. Comp. 17 [38] ROBXNSON, C. L. "Permutation (Algorithm (1963), 282-285. 317)," Comm. ACM 10, 11 (Nov. 1967), 729. [21] KNUTH,D.E. "Fundamental algorithms," in [39] SAG,T.W. "Permutations of a set with repe- The art of computer pro~.rammzng 1, Addison- titions (Algorithm 242)," Comm ACM 7, 10 Wesley, Co., inc., Reading, Mass., 1968. (Oct 1964), 585. [22] KNUTH, D.E. "Seminumerlcal algorithms," [40] SCHRAK, G. F , AND SHIMRAT, M m The art ofcom~uter programming 2, Addi- "Permutation in lexicographlc order (Algo- son-Wesley, Co., inc., Reading, Mass., 1969. rithm 102)," Comm. ACM 5, 6 (June 1962), [23] KNUTH, D. E "Sorting and searching," in 346. (See also: remarks by R J Ord-Smith in The art of computer programmtng 3, Addison- Comm ACM 10, 7 (July 1967), 452-3.) Wesley, Co., Inc, Reading, Mass, 1972 [41] SHEN,M.-K "On the generation ofpermuta- [24] KNUTH, D. E. "St,ru,ctured programming tlons and combinations," BIT 2 (1962), 228- with go to statements, ' Computing Surveys 6, 231 4 (Dec. 1974), 261-301. [42] SHEN,M -K "Generation ofpermutations in [25] LANaDON,G. G., Jr., "An algorithm for gen- lexicographic order (Algorithm 202)," Comm erating permutations," Comm. ACM 10, 5 ACM 6, 9 (Sept. 1963), 517. (See also: remarks (May 1967), 298-9. (See also. letters by R. J. by R. J. Ord-Snnth in Comm. ACM 10, 7 (July Ord-Smith in Comm. ACM 10, 11 (Nov. 1967), 1967), 452-3.) 684; by B. E. Rodden m Comm. ACM 11, 3 [43] SLOANE,N. J. A. A handbook of tnteger se- (March 1968), 150; CR Rev. 13,891, Computing quences, Academic Press, Inc , N Y, 1973 Reviews 9, 3 (March 1968), and letter by Lang- [44] TOMPKIN8,C. "Machine attacks on problems don inComm. ACM 11, 6 (June 1968), 392.) whoso variables are permutations," in Proc [26] LEH~R, D H. "Teaching combinatorial Symposium zn Appl. Math., Numerwal Analy- tricks to a computer," in Proc. of Symposlum sts, Vol 6, McGraw-Hill, Inc., N.Y., 1956, Appl. Math ,Combtnatortal Analysis, Vol. 10, 195-211 American Mathematical Society, Providence, [45] TROTTER, H. F. "Perm (Algorithm 115)," R.I, 1960, 179-193. Comm. ACM 5, 8 (August 1962), 434-435. [27] LEHMER, D.H. "The machine tools of combi- natemcs," m Applied comb~natortal mathe- [46] WAL~R, R. J "An enumerative technique matws (E. F. Beckenbach, [Ed.]), John Wiley, for a class of combinatorial problems," in Proc Symposium in Appl. Math, Combtnatorial & Sons, Inc., N Y, 1964 Analysts, Vol. 10, American Mathematical So- [28] L~.mvi~.R,D.H. "Permutation by adjacent in- ciety, Providence, R.I , 1960, 91 terchanges," Amerwan Math. Monthly 72, 2 (Feb. 1965, Part II), 36-46. [47] WZLLS, M. B "Generation of permutations [29] Om)-SMITH, R J. "Generation of permuta- by transposition," Math Comp 15 (1961), 192- tion sequences Part 1," Computer J. 13, 3 195. (March 1970), 152-155. [48] WELLS, M. B. Elements of combtnatortal [30] OaD-SMITH, R. J. "Generation of permuta- computtng, Pergamon Press, Elmsford, N.Y., tion sequences: Part 2," Computer J. 14, 2 1971. (May 1971), 136-139. [49] WHITEHEAD, E. G. Combinatorial algo- [31] ORD-SMITH, R. J. "Generation of permuta- rithms. (Notes by C. Frankfeldt and A. E. tions in psoudo-lexicographic order (Algo- Kaplan) Courant Institute, New York Univ., rithm 308)," Comm ACM 10, 7 (July 1967), 1973, 11-17.

Computlng Surveys, Vol 9, No 2, June 1977