Unranking Combinations Lexicographically: an Efficient New Strategy Compared with Others Cyann Donnot, Antoine Genitrini, Yassine Herida
Total Page:16
File Type:pdf, Size:1020Kb
Unranking Combinations Lexicographically: an efficient new strategy compared with others Cyann Donnot, Antoine Genitrini, Yassine Herida To cite this version: Cyann Donnot, Antoine Genitrini, Yassine Herida. Unranking Combinations Lexicographically: an efficient new strategy compared with others. 2020. hal-02462764 HAL Id: hal-02462764 https://hal.sorbonne-universite.fr/hal-02462764 Preprint submitted on 31 Jan 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Unranking Combinations Lexicographically: an efficient new strategy compared with others Cyann Donnot1, Antoine Genitrini2, and Yassine Herida1 1 Full-time master student at Sorbonne University 2 Sorbonne University, LIP6. [email protected] Abstract. We propose a comparative study of four algorithms dedi- cated to the lexicographic unranking of combinations. The three first ones are algorithms from the literature. We give some key ideas how to analyze them in average and describe their strengths and weaknesses. We then introduce a new algorithm based on a new strategy inside the classical factorial numeral system (or factoradics). We conclude the pa- per with an experimental study, in particular when n = 10000; k = 5000, our algorithm is 36 (resp. 53) times as fast as our C++ implementation of Matlab’s (resp. Sagemath’s) algorithm. keywords. Unranking Algorithm; Combination; Lexicographic Order; Complexity Analysis. One of the most fundamental combinatorial object is called combination. It consists of a selection of items from a collection. In many enumerating problems it appears either as the main combinatorial structure, or as a core fundamental block because of its simplicity and counting characteristics. In the 60s while resolving some optimization problem about scheduling Lehmer rediscover an important property linking natural numbers and a mixed radius numeral system based on combinations. This relationship gives him the possi- bility to exhibit some greedy approach for a ranking algorithm that transforms (bijectively) a combination into an integer. This numeral system is now com- monly called combinatorial numbers system or combinadics. It is often used for the reverse of Lehmer’s problem: generating the u-th combination (for a given order about the combinations). This approach is substituted to the exhaustive generation once the latter is not possible anymore due to the combinatorial ex- plosion of the number of objects when their sizes increase. In the context of combination, the explosion appears really quickly. This generation strategy of a single element is classically called an unranking method. It is today often used as a basic brick in scheduling problems [18] but also e.g. in software testing [14]. In order to unrank elements one must first define an order over the elements. The one that is usually used is the lexicographic one that can been seen as a generalization of the alphabetical order. The lexicographic order is humanly easy to handle with, and thus has been extensively studied. But, as Ruskey [17, p. 59] mentions, the lexicographic generation is usually not the most efficient, thus a particular care must be taken while unranking in this order. The classical approach for the construction of combinatorial structures pre- senting a recursive decomposition schema consists in taking advantage of this 2 Cyann Donnot, Antoine Genitrini, and Yassine Herida decomposition in order to build a bigger object from a smaller one. The method has been extensively detailed in the famous book of Nijenhuis and Wilf [15]. They did not directly study the example of combinations, but the case of integer compositions is proposed and is not so far from our problem. The method has been then applied generically on decomposable objects in the sense of analytic combinatorics, first in the context of recursive generation [10], and then in the context of unranking approaches [12]. Aside such generic approaches there are several ad hoc algorithms but to the best of our knowledge no practical experiments does exist to determine which one should be used. We thus will start with te recall of three classical algorithms, supported by a revisited average complexity analysis. Then we will describe our new strategy and finally we will com- pare them in some experiments. In Fig. 1 we represent the time effi- Fig. 1: Time (in ms) for unranking a com- ciency of some improved versions of bination, with n = 10000 and k = 0::n 3 of these algorithms, our being de- picted in green. The figure will be described later. But as a foretaste, when n = 10000; k = n=2, our algorithm is experimentally 36 (resp. 53) times as fast as our C++ implementation of Matlab’s (resp. Sagemath’s) actual algorithm. Along the paper, we represent combinations as follows. Definition 1. Let n and k be two integers with 0 ≤ k ≤ n. We represent a com- bination of k elements among n denoted by f0; 1; : : : ; n−1g as a tuple containing k distinct elements increasingly sorted from left to right. For example, let n and k be respectively 5 and 3. The tuples (0; 1; 2) and (0; 2; 4) are combinations of k among n, but (0; 2; 1) and (0; 1; 2; 3) are not. There is another representation by using a 2-letters alphabet, but we do not deal with it through this paper. However we could rewrite the paper in this context also. There are several orders for comparing combinations. In the following we restrict our attention to orders comparing combinations of the same length, i.e. the same number of elements. Definition 2. Let A = (a0; a1; : : : ; ak−1) and B = (b0; b1; : : : ; bk−1) be two dis- tinct combinations. – In the lexicographic order, we say that A is smaller than B if and only if both combination have the same prefix (eventually empty) such that (a0; : : : ; ap−1) = (b0; : : : ; bp−1) and ap < bp. – In the co-lexicographic order, we say that A is smaller than B if and only if the tuple (ak−1; : : : ; a0) is smaller than (bk−1; : : : ; b0) for the lexicographic order. – An order being given such that A is smaller than B, then, for the reverse order, B is smaller that A. Unranking Combinations Lexicographically: an efficient new strategy 3 Definition 3. Let n; k define combinations and A be a combination. For a given n order, the rank u of A belongs to f0; 1;::: k − 1g and is such that A is the u-th smallest element. With these definitions, we can enter the core of the paper organized as follows. Section 1 describes a recursive algorithms solving the lexicographic combination unranking problem. Despite its simplicity due to the approach, the fact it is re- cursive is problematic to handle very big combinations. Then Section 2 describes two classical algorithms based on combinadics. It seems that it is the first time both are compared and the reason why one is better is explained. In Section 3 we enter the context of factoradics, and then describe our new unranking algo- rithm in this numeral system. Finally in Section 4 we compare all algorithms experimentally and give some possibility for really improve the time complexity of all approaches. 1 Unranking through recursion We deal with a combinatorial structure: combinations, that is very well under- stood from a combinatorial sense. Thus while trying to develop an unranking algorithm the first idea consists in developing a recursive algorithm based on the classical recursive generation presented in [15]. Let us directly introduce such an algorithm. Algorithm 1 Recursive Unranking 1: function Rec_Generation(n; k; u) 2: if k = 0 then 1: function Unranking_Rec(n; k; u) 3: return () 2: L := Rec_Generation(n; k; u) 3: return (n − 1 − L[k − 1 − i] for i 4: if n = k then from 0 to k − 1) 5: return (i for i from 0 to k − 1) 6: b := binomial(n − 1; k − 1) 7: if u < b then binomial(n; k) computes the value of n; k 8: R := Rec_Generation append(A; a): appends element a in A; (n − 1; k − 1; u) For i ≥ 0, L[i] is the element of index i in L. 9: append(R; n − 1) In line 3 and 5, the outputs are tuples built 10: return R by comprehension. The simple one in line 5 is 11: else (0; 1; : : : ; k − 1). 12: return Rec_Generation (n − 1; k; u − b) Proposition 1. The function Rec_Generation(n; k; ·) computes the combi- nations for n; k in the reverse co-lexicographic order. Corollary 1. The function Unranking_Rec(n; k; ·) computes the combina- tions for n; k in the lexicographic order. The proposition is proved by induction and the corollary is a direct observation given in [11, p. 47]. 4 Cyann Donnot, Antoine Genitrini, and Yassine Herida In order to analyze the algorithm in details, we are interested in the average number of calls to the function binomial, when u describes the whole range of n integers from 0 to k −1. Ruskey justifies such a measure by supposing the table of all binomial coefficients precomputed, thus each call is equivalent. Later, in Section 4 we will discuss this measure. Let us introduce the sequence un;k computing the cumulative number of calls for the whole range for u. Lemma 1. Let un;k be the cumulative number of calls to binomial while un- n ranking all possible u from 0 to k − 1.