Oooo»O O O O ® Oo#Ooo O O O O M O * O O M M O O O M M M Witold Lipski, Ir
Total Page:16
File Type:pdf, Size:1020Kb
C 3 O C 3 pRACE »PI PAN • ICS PAS REPORTS oooo»o o o o ® oo#ooo o o o o m o * o o m m o o o m m m Witold Lipski, Ir. o m o o o o m o o A note on decomposing O w O w O a query set into Q £ Q 0 0 subsets having the a a /-s consecutive retrieval o««oO property o##«o o### o o o o 3 21978 9 WARSZAWA ooo IRSTTTUT PODSTAW INFORMATYKI POLSKIE) AKADEMII NADK INSTITUTE IF COMPUTER SCIENCE POLISH ACADEMY OF SCIEHCES 00*901 W A R S A W , P. O. Box 22, POLAND Witold Lipski, Jr A NOTE ON DECOMPOSING A QUERY SET INTO SUBSETS HAVING THE CONSECUTIVE RETRIEVAL PROPERTY 3 20 Warsaw 1978 Witold Lipski, Jr. A NOTE ON DECOMPOSING A QUERY SET INTO SUBSETS HAVING THE CONSECUTIVE RETRIEVAL PROPERTY 3 20 Warsaw 1978 Bada Bedakeyjna A. Blikle (przewodniczący), S. Bylka, J. Lipski (sekretarz), Ł. Łukaszewicz, R. Marczyński, A. Mazurkiewicz, T. Uowicki, Z. Pawlak, D. Sikorski, Z. Szoda, li. Warmus (zastępca prze wodniczącego) Pracę zgłosił Zdzisław Pawlak Uailing addresst Witold Lipski, Jr. Instituto of Computar Science Polish Academy of Sciences P. 0. Boi 22 00-901 Warsaw PKil b ^ H / U O ¡AoíJí Frinted as a manuscript Ha prawach rękopisu Bakład 700 egz. Ark. fcyd. 1,00; ark. druk. 1,50. Papier offset, kl. III, 70 g, 70 x 100. Oddano do druku w kwietniu 1978 r. W.D.B.Zam.322/0/78 S-86 AIS CATEG0HI3S: 68A5O 05-00 05A99 Abstract • Streszczenie • ConepaaHHS A simple algorithm is presented which constructs a partition of the set of all non-trivial binary queries into subsets having the consecutive retrieval property. The algorithm, which has been implemented on a computer, is a modification of a method of Yamamoto et al. £j. Statist. Planning and inference 1(1977) 41-513- 0 dekompozycji zbioru pytań na podzbiory mające własność kolejnego wyszukiwania Przedstawiono prosty algorytm konstrukcji podziału zbioru wszystkich pytań binarnych na podzbiory mające własność kolejnego wyszukiwani a. Algorytm ten, zrealizowany na maszynie cyfrowej, jest modyfikacją metody Yamamoto et al. [j. Statist. Planning and Inference 1(1977) 41-513« EIoApa3fleaeHKe MHOxeciBa eonpocoB Ha noiWHoaecTBa, oSjiaflaBimie c b o Sc tb o m nocJie.noBaTeJihHoro noacKa HpHBOÆHTCH npocioS aaropHXM noApa3,neJ!eHHH MHOxecTBa Bcex SBomiHHX EonpocoB Ea noiUHO^ecTsa, ofiaa^aioinHe cboS- c t b o m nccaeAOBaTejibHoro noncica. ÆaHHbiil ajiropKTM, ocyneci- BJieHHKft Ha SaeKTpOHHO-BŁnHCSHTeJIbHOB IJH^POBOH MaaHHe, XB- JineTcx Bî«0H 3MeKeHHesi Mexoaa HwaMOTO h «çpyrsix fj. Statist. Planning and Inference 1 (1977) 41—513- .1IHTSODUCTian The coE-sec'rtive retrieval (GR) property has been introduced by Ghosh [1J in connection with the design of efficient file organizations minimizing both the access time and the storage space at a time. Subsequently, it has been extensively investigated by several authors (see e.g. Ghosh [2,3'], T.ip«Vi [6] , Lipski and Karek [8] and lamamoto et al. [93 )• Let us recall that a query set Q is said to have the CR property with respeet to a record set R if there exists a linear arrangement of the records (without any duplication") such that all pertinent records for every query in Q are stored in consecutive storage locations. It should be noted that the query setB and record sets of practical interest are usually too complex to admit such & special organization. However, several modifications of the CR property have been proposed, which extend the class of admissible query-record sets at a cost of seme increase of the acces time and/or the storage space. One such proposal is to decompose the query set into subsets having the OR property (see Ghosh [2} • Lipski and K a r e k f8]). Of course, such a decomposition always exists — it is sufficient to make the subsets small enough, e.g. each consisting of at most two queries. A non-trivial problem of practical interest is, however, to find a decomposition into the minimal possible number of subsets. In the general case, when a query set and a record set can be arbitrary, and a query - 6 - can specify any subset of records, this problem seems to be very difficult. In fact, it can be proved (see Lipski [?]) that for such a case even the following simple problem: Given a query set Q and a record set R, decide whether Q can be decomposed into two subsets having the CR property with respect to R, is KP-complete in the sense of Karp [5]* However, in this paper we shall deal exclusively with the special case where the record set consists of all 2^-1 non-trivial records characterized by 1 binary-valued attributes, i.e., r W = i x): ... € {0 ,13} \ {(0...0)}3 and the query set consists of all 2^-1 binary queries, i.e., £j): C 1 e (fl = 1 or 0 indicates the presence or absence of ith attribute in a record, respectively; ^ » 1 or i indicates whether to specify or not the presence of ith attribute in the pertinent records, respectively.') A subset of having the OH property with respect to will be called a CR subset. The number of 1*s in a query q is called the order of q and is depoted by w(q'). ‘The cardinality of a set S is denoted by IS\ , [_yj is the maximal integer not greater than (a real number) y, and fyl is the minimal integer not less than y. - 7 - *1516 problem of decomposing into the ml n1 mal number of subsets having the CE. property with respect to E ^ was recently treated by Yamamoto et al. [9] . They noticed that — by a result of Ghosh ([2] — no CE subset can contain more than two queries of order ll/2j, and consequently the number of CE subsets in any partition of is at least ) • T( \l/2j) fc\ • Psing an ingenious method they also prove that for 1 even there exists a partition of into this number of CE subsets. Their proof gives also some indication how such a mininal partition can be constructed. It is the main purpose of the present note to give a very simple algorithm which constructs such a minimal partition if 1 is even and a partition close to minimal if 1 is odd. Our method has been implemented on a computer (see Appendix). 2. THE ALGORITHM As noted by Tamamoto et al. [9] . the set consisting of the following 21-1 queries: ✓ - 8 - (1 z X . X x) (1 1 I ... X X) (1 1 1 • •» z X) • • (1 1 1 ... 1 X) (2) (1 1 1 ... 1 <0 (x 1 ... 1 1) 1 1 ... 1 x ) <x I <x 1 1 ... X X) — r • . i • i • i« ( i 1 X . X X) — t- i is a CB subset of (on the right hand side we schematically depicted the required arrangement of records). Notice that any query q = (... Sj) can be identified with the set (?) i.e., with the set of attributes specified in q. This set is nonempty, since we consider only non-trivial queries. ITow let us denote X = £2,5,...,l], n = |X] = 1 - 1. We shell need a method of decomposing tP(X), the set of all subsets of X, into chains is called a chain if A c B or B g. A for all A,B 6 ^ . Below we shall briefly describe such a method (see e.g. Greene and ELeitman [VJ ). A chain is called symmetric if it has the form C C °Ln/2J-j C C (n/2j-d+1C (V21+3 - 9 - 1 • where 0 ln/2J and ICJ = i for |n/2J -3 ^ i ^ Tn/2l+j. Since every symmetric chain contains exactly one ln/2J -subset (i.e. a subset of cardinality L°/2j} of X, it follows that any partition of i P ( X) into symmetric chains is composed of exactly ([_n/2J^ chains. We construct such a partition by induction on n = |X|. For n = 0, TPlX) = {0} is itself a symmetric chain. Now assume that we have a partition of 2P C x) into symmetric chains, and lpt a ^ X. Replace every chain C A j C C A ^ in our partition by the following two chaims: C ¿2 C ... C A<l u{a] C ^ ^ Of course, we obtained a partition of 7P(l.u{b.}) into chains- These chains are not symmetric, we can, however, make them symmetric by shifting the top subset of the second chain to th»e top of the first one: 1 ... C A ^ c ^ f a J , C. A^ u | a] C ... C u {a] (if k = 1, then the second chain vanishes). By applying this procedure recursively, we may easily produce a partition of TPCX) into symmetric chains, for any X. This construction is directly implemented by the recursive function PARTITION in the PASCAL program included into Appendix. For technical reasons, attributed are labelled 0,1,...,1-1 in this program. Any symmetric chain ^ is represented by an array PERI! and two integers FIRST, LAST* -10- in the following way: = ^{PERM[l],PSBH[23,...,PESIl[i]} : FIRST ¿i < LAST- With any symmetric chain A^CAgC,.. Ci^ of subsets of X = {2,3,...,1} we may associate the set of queries { 9^ »^2 ’' * ’ * ^k’ ^k ’ ®k—1 ’ * * * ’ 3 such that S(q^) = {l}uAi , S ^ ) = for 1 £ i £ k , and S(qj) is defined by (3) (since we consider only non-trivial queries, we should delete ^ if i] = (?). This set of queries is easily seen to be a CR subset, since it is a subset of B ' ^ (see (2)) , up to some permutation of attributes.