H

intersections of sets. It is natural to ask what sort Hamilton Cycles in Random of graphs would be most likely to arise if the list Intersection Graphs of sets is generated randomly. Consider the model of random graphs where Charilaos Efthymiou1 and Paul (Pavlos) each vertex chooses randomly from a universal Spirakis2;3;4 set the members of its corresponding set, each 1Department of Computer Engineering and independently of the others. The probability Informatics, University of Patras, Patras, Greece space that is created is the space of random inter- 2Computer Engineering and Informatics, section graphs, G , where n is the number of Research and Academic Computer Technology n;m;p vertices, m is the cardinality of a universal set of Institute, Patras University, Patras, Greece elements and p is the probability for each vertex 3Computer Science, University of Liverpool, to choose an element of the universal set. The Liverpool, UK model of random intersection graphs was first 4Computer Technology Institute (CTI), Patras, introduced by M. Karonsky,´ E. Scheinerman, and Greece K. Singer-Cohen in [4]. A rigorous definition of the model of random intersection graphs Keywords follows:

Stochastic order relations between Erdös–Rényi Definition 1 Let n, m be positive integers and random graph model and random intersection 0 Ä p Ä 1. The random intersection graph graphs; Threshold for appearance of Hamilton Gn;m;p is a probability space over the set of cycles in random intersection graphs graphs on the vertex set f1;:::;ng where each vertex is assigned a random subset from a fixed Years and Authors of Summarized set of m elements. An edge arises between two Original Work vertices when their sets have at least a common element. Each random subset assigned to a vertex 2005; Efthymiou, Spirakis is determined by

Pr Œvertex i chooses element j  D p Problem Definition with these events mutually independent. E. Marczewski proved that every graph can be represented by a list of sets where each vertex A common question for a graph is whether it has corresponds to a set and the edges to nonempty a cycle, a set of edges that form a path so that the

© Springer Science+Business Media New York 2016 M.-Y. Kao (ed.), Encyclopedia of Algorithms, DOI 10.1007/978-1-4939-2864-4 892 Hamilton Cycles in Random Intersection Graphs

first and the last vertex is the same, that visits all the model Gn;m;p and the Erdös–Rényi random the vertices of the graph exactly once. We call this graph model Gn;pO . kind of cycle the Hamilton cycle and the graph that contains such a cycle is called a Hamiltonian Definition 3 Let n be a positive integer, graph. 0 ÄOp Ä 1. The random graph G.n; p/O is a probability space over the set of graphs on Definition 2 Consider an undirected graph the vertex set f1;:::;ng determined by G D .V; E/ where V is the set of vertices and E the set of edges. This graph contains a Hamilton Pr Œi;j DOp cycle if and only if there is a simple cycle that with these events mutually independent. contains each vertex in V. The stochastic order relation between the two Consider an instance of G , for specific val- n;m;p models of random graphs is established in the ues of its parameters n, m, and p, what is the prob- sense that if A is an increasing graph property, ability of that instance to be Hamiltonian? Taking then it holds that the parameter p, of the model, to be a function of n and m,in[2], a threshold function P.n;m/     Pr G 2 A Ä Pr Gn;m;p 2 A has been found for the graph property “Contains n;pO a Hamilton cycle”; i.e., a function P.n;m/ is where pO D f.p/. A graph property A is in- derived such that creasing if and only if given that A holds for A 0 if p.n;m/  P.n;m/ a graph G.V; E/ then holds for any G.V; E /:   E0 à E. lim Pr Gn;m;pContains Hamilton cycle D 0 n;m!1 if p.n;m/  P.n;m/ Key Results   lim Pr Gn;m;pContains Hamilton cycle D 1 n;m!1 Theorem 1 Let m D dn˛e, where ˛ is a fixed real positive, and C ;C be sufficiently large When a graph property, such as “Contains 1 2 constants. If a Hamilton cycle,” holds with probability that tends to 1 (or 0) as n, m tend to infinity, then it log n p  C for 0<˛<1 or is said that this property holds (does not hold), 1 m “almost surely” or “almost certainly.” r If in G the parameter m is very small log n n;m;p p  C for ˛>1 compared to n, the model is not particularly in- 2 nm teresting and when m is exceedingly large (com- then almost all Gn;m;p are Hamiltonian. Our pared to n) the behavior of Gn;m;p is essentially the same as the Erdös–Rényi model of random bounds are asymptotically tight. ˛ graphs (see [3]). If someone takes m D dn e,for Note that the theorem above says nothing fixed real ˛>0, then there is some deviation when m D n, i.e., ˛ D 1. from the standard models, while allowing for a natural progression from sparse to dense graphs. Thus, the parameter m is assumed to be of the Applications form m Ddn˛e for some fixed positive real ’. The proof of existence of a Hamilton cy- The Erdös–Rényi model of random graphs, Gn;p, cle in Gn;m;p is mainly based on the estab- is exhaustively studied in computer science lishment of a stochastic order relation between because it provides a framework for studying Hamilton Cycles in Random Intersection Graphs 893 practical problems such as “reliable network Open Problems computing” or it provides a “typical instance” of a graph and thus it is used for average As in many other random structures, e.g., Gn;p case analysis of graph algorithms. However, and random formulae, properties of random in- the simplicity of Gn;p means it is not able to tersection graphs also appear to have threshold capture satisfactorily many practical problems behavior. So far threshold behavior has been in computer science. Basically, this is because studied for the induced subgraph appearance and of the fact that in many problems independent hamiltonicity. edge-events are not well justified. For example, Other fields of research for random intersec- consider a graph whose vertices represent a set tion graphs may include the study of connectivity of objects that either are placed or move in a behavior, of the model i.e., the path formation, specific geographical region, and the edges are the formation of giant components. Additionally, radio communication links. In such a graph, a very interesting research question is how cover we expect that, any two vertices u, w are more and mixing times vary with the parameter p,of likely to be adjacent to each other, than any the model. other, arbitrary, pair of vertices, if both are H adjecent to a third vertex v. Even epidemiological phenomena (like the spread of disease) tend to Cross-References be more accurately captured by this proximity- sensitive random intersection graph model. Other  Independent Sets in Random Intersection applications may include oblivious resource Graphs sharing in a distributive setting, interaction of mobile agents traversing the web etc. The model of random intersection graphs Recommended Reading G was first introduced by M. Karonsky,´ n;m;p 1. Alon N, Spencer JH (2000) The probabilistic method, E. Scheinerman, and K. Singer-Cohen in [4] 2nd edn. Wiley, New York where they explored the evolution of random 2. Efthymiou C, Spirakis PG (2005) On the existence intersection graphs by studying the thresholds of Hamilton cycles in random intersection graphs. In: for the appearance and disappearance of Proceedings of the 32nd ICALP. LNCS, vol 3580. Springer, Berlin/Heidelberg, pp 690–701 small induced subgraphs. Also, J.A. Fill, E.R. 3. Fill JA, Scheinerman ER, Singer-Cohen KB (2000) Scheinerman, andK. Singer Cohen in [3] proved Random intersection graphs when m D ¨(n): an an equivalence theorem relating the evolution of equivalence theorem relating the evolution of the G(n, m, p) and G(n, p) models. Random Struct Algorithms Gn;m;p and Gn;p, in particular they proved that ˛ 16:156–176 when m D n where ˛>6, the total variation 4. Karonski´ M, Scheinerman ER, Singer-Cohen K distance between the graph random variables (1999) On random intersection graphs: the subgraph has limit 0. S. Nikoletseas, C. Raptopoulos, and problem. Comb Probab Comput 8:131–159 P. Spirakis in [8] studied the existence and the 5. Komlós J, Szemerédi E (1983) Limit distributions for the existence of Hamilton cycles in a random graph. efficient algorithmic construction of close to Discret Math 43:55–63 optimal independent sets in random intersection 6. Korshunov AD (1977) Solution of a problem of graphs. D. Stark in [11] studied the degree of P. Erdös and A. Rényi on Hamilton cycles in non- the vertices of the random intersection graphs. oriented graphs. Metody Diskr Anal Teoriy Upr Syst Sb Trubov Novosibrirsk 31:17–56 However, after [2], Spirakis and Raptopoulos, 7. Nikoletseas S, Raptopoulos C, Spirakis P (2007) in [10], provide algorithms that construct Expander properties and the cover time of random in- tersection graphs. In: Proceedings of the 32nd MFCS. Hamilton cycles in instances of Gn;m;p,for p Springer, Berlin/Heidelberg, pp 44–55 above the Hamiltonicity threshold. Finally, 8. Nikoletseas S, Raptopoulos C, Spirakis P (2004) The Nikoletseas et al. in [7] study the mixing time and existence and efficient construction of large inde- cover time as the parameter p of the model varies. pendent sets in general random intersection graphs. 894 Haplotype Inference on Pedigrees Without Recombinations

In: Proceedings of the 31st ICALP. LNCS, vol 3142. (SNPs) [5], which are the nucleotide sites where Springer, Berlin/Heidelberg, pp 1029–1040 more than one nucleotide can occur. A hap- 9. Singer K (1995) Random intersection graphs. PhD lotype is the sequence of linked SNP genetic thesis, The Johns Hopkins University, Baltimore 10. Spirakis PG, Raptopoulos C (2005) Simple and markers (small segments of DNA) on a single efficient greedy algorithms for Hamilton cycles chromosome. However, experiments often yield in random intersection graphs. In: Proceedings genotypes, which is a blend of the two haplotypes of the 16th ISAAC. LNCS, vol 3827. Springer, of the chromosome pair. It is more useful to Berlin/Heidelberg, pp 493–504 11. Stark D (2004) The vertex degree distribution of ran- have information on the haplotypes, thus giving dom intersection graphs. Random Struct Algorithms rise to the computational problem of inferring 24:249–258 haplotypes from genotypes. The physical position of a marker on a chro- mosome is called a locus and its state is called an allele. SNP are often biallelic, i.e., the allele can take on two different states, corresponding Haplotype Inference on Pedigrees to two different nucleotides. In the language of Without Recombinations computer science, the allele of a biallelic SNP can be denoted by 0 and 1, and a haplotype Mee Yee Chan1, Wun-Tat Chan2, Francis Y.L. with m loci is represented as a length-m string Chin1, Stanley P.Y. Fung3, and Ming-Yang Kao4 in f0; 1gm and a genotype as a length-m string 1 m Department of Computer Science, University of in f0; 1; 2g . Consider a haplotype pair hh1, h2i Hong Kong, Hong Kong, China and a corresponding genotype g. For each locus, 2College of International Education, Hong Kong if both haplotypes show a 0, then the geno- Baptist University, Hong Kong, China type must also be 0, and if both haplotypes 3Department of Computer Science, University of show a 1, the genotype must also be 1. These Leicester, Leicester, UK loci are called homozygous. If however one of 4Department of Electrical Engineering and the haplotypes shows a 0 and the other a 1, Computer Science, Northwestern University, the genotype shows a 2 and the locus is called Evanston, IL, USA heterozygous. This is called SNP consistency. For example, considering a single individual, the Keywords genotype g D 012212 has four SNP-consistent haplotype pairs: {h011111, 010010i, h011110, 010011i, h011011, 010110i, h011010, 010111i}. Computational biology; Haplotype inference; In general, if a genotype has s heterozygous Pedigree; Recombination loci, it can have 2s1 SNP-consistent haplotype solutions. Years aud Authors of Summarized Haplotypes are passed down from an indi- Original Work vidual to its descendants. Mendelian consistency requires that, in the absence of recombinations 2009; Chan, Chan, Chin, Fung, Kao or mutations, each child inherits one haplotype from one of the two haplotypes of the father and inherits the other haplotype from the mother Problem Definition similarly. This gives us more information to in- fer haplotypes when we are given a pedigree. In many diploid organisms like humans, chro- The computational problem is therefore, given mosomes come in pairs. Genetic variation oc- a pedigree with n individuals where each indi- curs in some “positions” along the chromosomes. vidual is associated with a genotype of length These genetic variations are commonly modelled m, find an assignment of a pair of haplotypes in the form of single nucleotide polymorphisms to each individual such that SNP consistency Haplotype Inference on Pedigrees Without Recombinations 895

ab

2102 F M 2000 F ?101 ?100 1100 0101 red

M ?000 ?000 1000 0000 brown

2202 S D 1200 S ?101 ?000D 1100 1000

Haplotype Inference on Pedigrees Without Recom- and 4 brown edges. Each vector is a vertex in G. Vector binations, Fig. 1 (a) Example of a pedigree with four pairs enclosed by rounded rectangles belong to the same nodes. (b) The graph G with 12 vertices, 6 red edges, individual and Mendelian consistency are obeyed for each in [1], representing the constraints by the parity individual. In rare cases (especially for humans) of edge labels of some auxiliary graphs and [3], the pedigree may contain mating loops:a finding solutions of these constraints using graph mating loop is formed when, for example, there traversal without (directly) solving a system of is a marriage between descendants of a common linear equations. This gives a linear O.mn/ time H ancestor. algorithm, although it only works for the case As a simple example, consider the pedigree in with no mating loops and only produces one Fig. 1a for a family of four individuals and their particular solution even when the pedigree admits genotypes. Due to SNP consistency, mother M’s more than one solution. Later algorithms include haplotypes must be h0000; 1000i (the order does [4] which returns the full set of solutions in not matter). Similarly, daughter D’s haplotypes optimal time (again without mating loops) and must be h1000; 1100i. Now we apply Mendelian [2] which can handle mating loops and runs in consistency to deduce that D must obtain the O.kmn C k2m/ time where k is the number of 1000 haplotype from M since neither of father mating loops. F’s haplotypes can be 1000 (considering locus In the following we sketch the idea behind 2). Therefore, D obtains 1100 from F, and F’s the linear time algorithm in [1]. Each individual haplotypes must be h0101; 1100i. With F’s and only has a pair of haplotypes, but the algorithm M’s haplotypes known, the only solution for the first produces a number of vector pairs for each haplotypes of son S that is consistent with his individual, one vector pair for each trio (a father- genotype 2202 is h0101; 1000i. mother-child triplet) that this individual belongs to. Each vector pair represents the information about the two haplotypes of this individual that Key Results can be derived by considering this trio only. These vector pairs will eventually be “unified” to While this kind of deduction might appear to become a single pair. be enough to resolve all haplotype values, it For the pedigree in Fig. 1a, the algorithm is not the case. As we will shortly see, there first produces the graph G in Fig. 1b, which has are “long-distance” constraints that need to be two connected components for the two trios F- considered. These constraints can be represented M-S and F-M-D. The rule for enforcing SNP by a system of linear equations in GF(2) and consistency (Mendelian consistency) is that the solved using Gaussian elimination. This gives a unresolved loci values, i.e., the ? values, must O.m3n3/ time algorithm [3]. Subsequent papers be different (same) at opposite ends of a red try to capture or solve the constraints more eco- (brown) edge. There is only one way to unify the nomically. The time complexity was improved vector pairs of F consistently (due to locus 4): in [6]toO.mn2 C n3 log2 n log log n/ by elimi- ?101 must correspond to 0101. We add an edge nating redundant equations and using low-stretch between these two vectors to represent the fact spanning trees. A different approach was used that they should be identical. Then all ? values 896 Haplotype Inference on Pedigrees Without Recombinations ab red edge 002 112 00? 00? 11? 11? brown edge

222 221 00? 11? ??0 ??1 ??1 ??1 A B 222 022 ??0 ??1 1?? 0?? 0?? 0?? B’ C 222 211 1?? 0?? ?00 ?11 ?11 ?11

222 C’ D ?00 ?11

c 0

A BCC’B’ D

0 0

1 0

Haplotype Inference on Pedigrees Without Recombi- pedigree. (b) The local graph G.(c) The parity constraint nations, Fig. 2 An example showing how constraints are graph J . Three constraints are added represented by labeled edges in another graph. (a)The can be resolved by traversing the now-connected Cross-References graph and applying the aforementioned rules for enforcing consistency.  Beyond Evolutionary Trees However, consider another pedigree in Fig. 2a.  Musite: Tool for Predicting Protein Phosphory- The previous steps can only produce Fig. 2b, lation Sites which has four connected components and un-  Sequence and Spatial Motif Discovery in Short resolved loci. We need to decide for A and B Sequence Fragments whether A D 00‹ should connect to B D‹‹0 or its complement ??1 and similarly for B0 and C , etc. Observe that a path between A and C Recommended Reading must go through an odd number of red edges since locus 1 changes from 0 to 1. To capture this 1. Chan MY, Chan WT, Chin FYL, Fung SPY, Kao MY type of long-distance constraints, we construct a (2009) Linear-time haplotype inference on pedigrees parity constraint graph J where the edge labels without recombinations and mating loops. SIAM J represent the parity constraints; see Fig. 2c.In Comput 38(6):2179–2197 effect, J represents a set of linear equations in 2. Lai EY, Wang WB, Jiang T, Wu KP (2012) A linear- time algorithm for reconstructing zero-recombinant GF(2); in Fig. 2c, the equations are xAB CxB0C D haplotype configuration on a pedigree. BMC Bioinfor- 1; xB0C CxC 0D D 0, and xABCxB0C CxC 0D D 0. matics 13(S-17):S19 Finally, we can traverse J along the unique 3. Li J, Jiang T (2003) Efficient inference of haplotypes from genotypes on a pedigree. J. Bioinformatics Com- path between any two nodes; the parity of this put Biol 1(1):41–69 path tells us how to merge the vector pairs in G. 4. Liu L and Jiang T (2010) A linear-time algorithm For example, the parity between A and B should for reconstructing zero-recombinant haplotype config- be 0, indicating 00? in A should connect to ??0 in uration on pedigrees without mating loops. J Combin Optim 19:217–240 B (so both become 000), while the parity between 5. Russo E, Smaglik P (1999) Single nucleotide polymor- 0 0 B and C is 1, so B and C should be 000 and phism: big pharmacy hedges its bets. The Scientist, 111, respectively. 13(15):1, July 19, 1999 Hardness of Proper Learning 897

6. Xiao J, Liu L, Xia L, Jiang T (2009) Efficient algo- Their specific reductions from NP-hard problems rithms for reconstructing zero-recombinant haplotypes are the basis of several other follow-up works on on a pedigree based on fast elimination of redundant the hardness of proper learning [1, 3, 7]. linear equations. SIAM J Comput 38(6):2198–2219

Notation Learning in the PAC model is based on the as- sumption that the unknown function (or concept) Hardness of Proper Learning belongs to a certain class of concepts C. In order to discuss algorithms that learn and output func- Vitaly Feldman tions, one needs to define how these functions IBM Research – Almaden, San Jose, CA, USA are represented. Informally, a representation for a concept class C is a way to describe concepts C Keywords from that defines a procedure to evaluate a con- cept in C on any input. For example, one can rep- DNF; function representation; NP-hardness resent a conjunction of input variables by listing H of learning; PAC learning; Proper learning; the variables in the conjunction. More formally, a representation-based hardness representation class can be defined as follows. Definition 1 A representation class F is a pair R Years and Authors of Summarized .L; / where Original Work • L is a language over some fixed finite alphabet 1988; Pitt, Valiant (e.g., f0; 1g); • R is an algorithm that for  2 L, on input .; 1n/ returns a Boolean circuit over f0; 1gn. Problem Definition In the context of efficient learning, only ef- The work of Pitt and Valiant [18] deals with ficient representations are considered, or, rep- learning Boolean functions in the Probably Ap- resentations for which R is a polynomial-time proximately Correct (PAC) learning model intro- algorithm. The concept class represented by F is duced by Valiant [19]. A learning algorithm in the set of functions over f0; 1gn defined by the Valiant’s original model is given random exam- circuits in fR.; 1n/ j  2 Lg. For a Boolean ples of a function f Wf0; 1gn !f0; 1g from a function f ,“f 2 F” means that f belongs to representation class F and produces a hypothesis the concept class represented by F and that there h 2 F that closely approximates f . Here, a is a  2 L whose associated Boolean circuit com- representation class is a set of functions and a putes f: For most of the representations discussed language for describing the functions in the set. in the context of learning, it is straightforward The authors give examples of natural represen- to construct a language L and the corresponding tation classes that are NP-hardtolearninthis translating function R, and therefore, they are not model, whereas they can be learned if the learn- specified explicitly. ing algorithm is allowed to produce hypotheses Associated with each representation is the from a richer representation class H. Such an complexity of describing a Boolean function algorithm is said to learn F by H; learning F by using this representation. More formally, for a F is called proper learning. Boolean function f 2 C, F-size.f / is the The results of Pitt and Valiant were the first length of the shortest way to represent f using to demonstrate that the choice of representation F,orminfjjj 2 L; R.; 1n/ Á f g. of hypotheses can have a dramatic impact on the We consider Valiant’s PAC model of learning computational complexity of a learning problem. [19], as generalized by Pitt and Valiant [18]. 898 Hardness of Proper Learning

In this model, for a function f and a distribu- Theorem 2 ([18]) The representation class of tion D over X,an example oracle EX.f; D/ Boolean threshold functions is not properly is an oracle that, when invoked, returns an ex- learnable unless RP D NP. ample hx;f.x/i, where x is chosen randomly This result is obtained via a reduction from with respect to D, independently of any previous the NP-complete Zero-One Integer Programming examples. For   0, we say that function problem (see [9] (p.245) for details on the prob- g-approximates a function f with respect to lem). The result is contrasted by the fact that gen- distribution D if PrDŒf .x/ ¤ g.x/ Ä . eral linear thresholds are properly learnable [4]. Definition 2 A representation class F is PAC These results show that using a specific repre- learnable by representation class H if there exists sentation of hypotheses forces the learning algo- an algorithm that for every >0, ı>0, n, rithm to solve a combinatorial problem that can f 2 F, and distribution D over X,given, ı, be NP-hard. In most machine learning applica- and access to EX.f; D/, runs in time polynomial tions it is not important which representation of in n; s D F-size.c/; 1= and 1=ı, and outputs, hypotheses is used as long as the value of the with probability at least 1ı, a hypothesis h 2 H unknown function is predicted correctly. There- that -approximates f . fore, learning in the PAC model is now defined without any restrictions on the output hypothesis A DNF expression is defined as an OR of (other than it being efficiently evaluatable). Hard- ANDs of literals, where a literal is a possibly ness results in this setting are usually based on negated input variable. We refer to the ANDs of a cryptographic assumptions (cf. [15]). DNF formula as its terms.LetDNF.k/ denote the Hardness results for proper learning based on representation class of k-term DNF expressions. assumption NP ¤ RP are now known for several Similarly, a CNF expression is an OR of ANDs other representation classes and for other vari- of literals. Let k-CNF denote the representation ants and extensions of the PAC learning model. class of CNF expressions with each AND having Blum and Rivest show that for any k  3, at most k literals. unions of k halfspaces are not properly learn- For a real-valued vector c 2 Rn and  2 R,a able [3]. Hancock et al. prove that decision trees linear threshold function (also called a halfspace) (cf. [16] for the definition of this representation) T .x/ is the function that equals 1 if and only c;P are not learnable by decision trees of somewhat if c x  . The representation class of in i i larger size [11]. This result was strengthened Boolean threshold functions consists of all linear by Alekhnovich et al. who also proved that in- threshold functions with c 2f0; 1gn and  an tersections of two halfspaces are not learnable by integer. intersections of k halfspaces for any constant k, general DNF expressions are not learnable by unions of halfspaces (and in particular are not Key Results properly learnable) and k-juntas are not properly learnable [1]. Further, DNF expressions remain Theorem 1 ([18]) For every k  2, the repre- NP-hard to learn properly even if membership sentation class of DNF.k/ is not properly learn- queries, or the ability to query the unknown func- able unless RP D NP. tion at any point, are allowed [7]. Khot and Saket More specifically, Pitt and Valiant show that show that the problem of learning intersections learning DNF.k/ by DNF.`/ is at least as hard as of two halfspaces remains NP-hard even if a coloring a k-colorable graph using ` colors. For hypothesis with any constant error smaller than the case k D 2, they obtain the result by reducing 1=2 is required [17]. No efficient algorithms or from Set Splitting (see [9] for details on the hardness results are known for any of the above problems). Theorem 1 is in sharp contrast with learning problems if no restriction is placed on the fact that DNF.k/ is learnable by k-CNF [19]. the representation of hypotheses. Hardness of Proper Learning 899

The choice of representation is important even or in the PAC model with membership queries. in powerful learning models. Feldman proved This question is open even in the PAC model that nc-term DNF are not properly learnable for restricted to the uniform distribution only. Note any constant c even when the distribution of that decision trees are learnable (non-properly) examples is assumed to be uniform and member- if membership queries are available [5] and are ship queries are available [7]. This contrasts with learnable properly in time O.nlog s/, where s is Jackson’s celebrated algorithm for learning DNF the number of leaves in the decision tree [1]. in this setting [13], which is not proper. An even more interesting direction of research In the agnostic learning model of Haussler would be to obtain hardness results for learning [12] and Kearns et al. [14], even the representa- by richer representation classes, such as AC0 cir- tion classes of conjunctions, decision lists, halfs- cuits, classes of neural networks and, ultimately, paces, and parity functions are NP-hard to learn unrestricted circuits. properly (cf. [2, 6, 8, 10] and references therein). Here again the status of these problems in the representation-independent setting is largely un- Cross-References known. H  Cryptographic Hardness of Learning  Graph Coloring Applications  Learning DNF Formulas  PAC Learning A large number of practical algorithms use repre- sentations for which hardness results are known (most notably decision trees, halfspaces, and neu- Recommended Reading ral networks). Hardness of learning F by H implies that an algorithm that uses H to represent 1. Alekhnovich M, Braverman M, Feldman V, Klivans its hypotheses will not be able to learn F in A, Pitassi T (2008) The complexity of properly learn- the PAC sense. Therefore such hardness results ing simple classes. J Comput Syst Sci 74(1):16–34 2. Ben-David S, Eiron N, Long PM (2003) On the elucidate the limitations of algorithms used in difficulty of approximately maximizing agreements. practice. In particular, the reduction from an NP- J Comput Syst Sci 66(3):496–514 hard problem used to prove the hardness of learn- 3. Blum AL, Rivest RL (1992) Training a 3-node neural ing F by H can be used to generate hard instances network is NP-complete. Neural Netw 5(1):117–127 4. Blumer A, Ehrenfeucht A, Haussler D, Warmuth of the learning problem. M (1989) Learnability and the Vapnik-Chervonenkis dimension. J ACM 36(4):929–965 5. Bshouty N (1995) Exact learning via the monotone theory. Inform Comput 123(1):146–153 Open Problems 6. Feldman V, Gopalan P, Khot S, Ponuswami A (2009) On agnostic learning of parities, monomials and half- A number of problems related to proper spaces. SIAM J Comput 39(2):606–645 7. Feldman V (2009) Hardness of approximate two- learning in the PAC model and its extensions level logic minimization and pac learning with mem- are open. Almost all hardness of proper bership queries. J Comput Syst Sci 75(1):13–26 learning results are for learning with respect 8. Feldman V, Guruswami V, Raghavendra P, Wu Y to unrestricted distributions. For most of the (2012) Agnostic learning of monomials by halfspaces is hard. SIAM J Comput 41(6):1558–1590 problems mentioned in section “Key Results” 9. Garey M, Johnson DS (1979) Computers and in- it is unknown whether the result is true if the tractability. W.H. Freeman, San Francisco distribution is restricted to belong to some 10. Guruswami V, Raghavendra P (2009) Hardness of learning halfspaces with noise. SIAM J Comput natural class of distributions (e.g., product 39(2):742–765 distributions). It is unknown whether decision 11. Hancock T, Jiang T, Li M, Tromp J (1995) Lower trees are learnable properly in the PAC model bounds on learning decision lists and trees. In: Mayr 900 Harmonic Algorithm for Online Bin Packing

EW, Puech C (eds) STACS 95: 12th Annual Sympo- the asymptotic competitive ratio, which is the sium on Theoretical Aspects of Computer Science, standard measure for online algorithms for bin Munich, pp 527–538. Springer, Berlin/Heidelberg packing type problems. The competitive ratio for 12. Haussler D (1992) Decision theoretic generalizations of the PAC model for neural net and other learning a given input is the ratio between the costs of applications. Inform Comput 100(1):78–150 the algorithm and of an optimal off-line solution. 13. Jackson J (1997) An efficient membership-query al- The asymptotic competitive ratio is the worst- gorithm for learning DNF with respect to the uniform case competitive ratio of inputs for which the distribution. J Comput Syst Sci 55:414–440 14. Kearns M, Schapire R, Sellie L (1994) optimal cost is sufficiently large. In the online Toward efficient agnostic learning. Mach Learn (standard) bin packing problem, items of rational 17(2–3):115–141 sizes in .0; 1 are presented one by one. The 15. Kearns M, Valiant L (1994) Cryptographic limitations on learning boolean formulae and finite automata. algorithm must pack each item into a bin before J ACM 41(1):67–95 the following item is presented. The total size of 16. Kearns M, Vazirani U (1994) An introduction to items packed into a bin cannot exceed 1, and the computational learning theory. MIT, Cambridge goal is to use the minimum number of bins, where 17. Khot S, Saket R (2011) On the hardness of learning intersections of two halfspaces. J Comput Syst Sci a bin is used if at least one item was packed into 77(1):129–141 it. All items must be packed, and the supply of 18. Pitt L, Valiant L (1988) Computational limitations on bins is unlimited. learning from examples. J ACM 35(4):965–984 When an algorithm acts on an input, it can 19. Valiant LG (1984) A theory of the learnable. Com- mun ACM 27(11):1134–1142 decide to close some of its bins and never use them again. A bin is called closed in such a case, while otherwise a used bin (which already has at least one item) is called open. The motivation for closing bins is to obtain fast running times per Harmonic Algorithm for Online Bin item (so that the algorithm will pack it into a Packing bin selected out of a small number of options). Simple algorithms such as First Fit (FF), Best Leah Epstein Fit (BF), and Worst Fit (WF) have worst-case Department of Mathematics, University of running times of O.log N/ per item, where N Haifa, Haifa, Israel is the number of items at the time of assignment of the new item. On the other hand, the simple Keywords algorithm Next Fit (NF), which keeps at most of open bin and closes it when a new item cannot Bin packing; Bounded space algorithms; Com- be packed there (before it uses a new bin for the petitive ratio new item), has a worst-case running time of O.1/ per item. Algorithms that keep a constant number of open bins are called bounded space.Inmany Years and Authors of Summarized practical applications, this property is desirable, Original Work since the number of candidate bins for a new item is small and it does not increase with the input 1985; Lee, Lee size. Algorithm HARMk (for an integer k  3) was defined by Lee and Lee [7]. The fundamental Problem Definition and natural idea of “harmonic-based” algorithms is classify each item by size first (for online One of the goals of the design of the harmonic algorithms, the classification of an item must be algorithm (or class of algorithms) was to provide done immediately upon arrival) and then pack it an online algorithm for the classic bin pack- according to its class (instead of letting the exact ing problem that performs well with respect to size influence packing decisions). For the classifi- Harmonic Algorithm for Online Bin Packing 901

1 cation of items, HARMk splits the interval .0; 1 that the crucial item sizes are just above . P j C1 into subintervals. There are k  1 subintervals The series 1 1 give the asymptotic compet- of the form . 1 ; 1  for i D 1;:::;k  1 and j D1 j iC1 i itive ratio of the HARM, ˘1. For a long time the 1 one final subinterval .0; k . Each bin will contain best lower bound on the asymptotic competitive only items from one subinterval (type). Every ratio of (unbounded space) online algorithms was type is packed independently into its own bins the one by van Vliet [8, 13], proved using this using NF. Thus, there are at most k  1 open sequence (but the current best lower bound was bins at each time (since for items of sizes above proved using another set of inputs [1]). 1 , two items cannot share a bin, and any bin can 2 In order to prove the upper bound ˘1 on the be closed once it receives an item). Moreover, for competitive ratio, weights were used [12]. In this i0is small and ` is an integer. These item. are items of type `  1, and bins consisting of Proving that no better bounded space algo- 0 such items contain `  1 items (except for the last rithms exist can be done as follows. Let j be bin used for this type that may contain a smaller a fixed integer. Let N be a large integer and number of items). However, a bin (of an off-line consider a sequence containing N items of each size 1 C ı for a sufficiently small ı>0, solution) that already contains an item of size j 1 1 for any j D j 0;j0  1; ;1.Ifı is chosen 2 C"1 and an item of size 3 C"2 (for some small P 0 appropriately, we have j 1 C j 0ı<1, "1;"2 >0) cannot contain also an item whose j D1 j C1 1 so the items can be packed (off-line) into N size is slightly above 4 . The largest item of this 1 bins. However, if items are presented in this order form would be slightly larger than 7 . Thus, the following sequence was defined [7]. Let 1 D 1 (sorted by nondecreasing size), after all items of and, for j>1, j D j 1.j 1 C 1/ (note that one size have been presented, only a constant 0 j 0 is divisible by any j for j

The space of a bounded space algorithm is the Cross-References number of open bins that it can have. The space of NF is 1, while the space of harmonic algorithms  Current Champion for Online Bin Packing increases with k. A bounded space algorithm with space 2 and the same asymptotic competitive ratio as FF and BF have been designed [3] (for Recommended Reading comparison, HARM3 has an asymptotic compet- itive ratio of 7 ). A modification where smaller 4 1. Balogh J, Békési J, Galambos G (2012) New lower space is used to obtain the same competitive bounds for certain classes of bin packing algorithms. ratios of harmonic algorithms (or alternatively, Theor Comput Sci 440–441:1–13 smaller competitive ratios were obtained using 2. Chrobak M, Sgall J, Woeginger GJ (2011) Two- the same space) was designed by Woeginger [15]. bounded-space bin packing revisited. In: Proceedings of the 19th annual European symposium on algo- Thus, there exists another sequence of bounded rithms (ESA2011), Saarbrücken, Germany, pp 263– space algorithms, with an increasing sequence of 274 open bins, where their sequence of competitive 3. Csirik J, Johnson DS (2001) Bounded space on-line ratios tends to ˘ such that the space required bin packing: best is better than first. Algorithmica 1 31:115–138 for every competitive ratio is much smaller than 4. Csirik J, Woeginger GJ (2002) Resource augmenta- that of [7]. tion for online bounded space bin packing. J Algo- One drawback of the model above is that an rithms 44(2):308–320 off-line algorithm can rearrange the items and 5. Epstein L (2006) Online bin packing with cardinality constraints. SIAM J Discret Math 20(4):1015–1030 does not have to process them as a sequence. The 6. Epstein L (2010) Bin packing with rejection revisited. variant where it must process them in the same Algorithmica 56(4):505–528 order as an online algorithm was studied as well 7. Lee CC, Lee DT (1985) A simple online bin packing [2]. Algorithms that are based on partitioning into algorithm. J ACM 32(3):562–572 8. Liang FM (1980) A lower bound for on-line bin classes and have smaller asymptotic competitive packing. Inf Process Lett 10(2):76–79 ratios (but they are obviously not bounded space) 9. Ramanan P, Brown DJ, Lee CC, Lee DT (1989) On- were designed [7, 9, 11]. line bin packing in linear time. J Algorithms 10:305– Generalizations have been studied too, in par- 326 10. Seiden SS (2001) An optimal online algorithm for ticular, bounded space bin packing with cardi- bounded space variable-sized bin packing. SIAM J nality constraints (where an item cannot receive Discret Math 14(4):458–470 more than t items for a fixed integer t  2) 11. Seiden SS (2002) On the online bin packing problem. J ACM 49(5):640–671 [5], parametric bin packing (where there is an 12. Ullman JD (1971) The performance of a memory upper bound strictly smaller than 1 on item sizes) allocation algorithm. Technical report 100, Princeton [14], bin packing with rejection (where an item i University, Princeton 13. van Vliet A (1992) An improved lower bound for has a rejection penalty ri associated with it, and online bin packing algorithms. Inf Process Lett it can be either packed, or rejected for the cost 43(5):277–284 ri )[6], variable-sized bin packing (where bins of 14. van Vliet A (1996) On the asymptotic worst case multiple sizes are available for packing) [10], and behavior of Harmonic Fit. J Algorithms 20(1):113– bin packing with resource augmentation (where 136 15. Woeginger GJ (1993) Improved space for bounded- the online algorithm can use bins of size b>1 space online bin packing. SIAM J Discret Math for a fixed rational number b, while an off-line 6(4):575–581 Hierarchical Self-Assembly 903

Winfree [9], different from that model only in Hierarchical Self-Assembly the absence of a seed and the ability of two large assemblies to attach. David Doty Computing and Mathematical Sciences, Definitions California Institute of Technology, Pasadena, A tile type is a unit square with four sides, each CA, USA consisting of a glue label (often represented as a finite string) and a nonnegative integer strength. Keywords We assume a finite set T of tile types, but an infinite number of copies of each tile type, each Hierarchical assembly; Intrinsic universality; copy referred to as a tile.Anassembly is a Running time; Self-assembly; Verification positioning of tiles on the integer lattice Z2, i.e., a partial function ˛ W Z2 Ü T . We write Years and Authors of Summarized j˛j to denote jdom ˛j. Write ˛ v ˇ to denote Original Work that ˛ is a subassembly of ˇ, which means that dom ˛  dom ˇ and ˛.p/ D ˇ.p/ for all points H 2005; Aggarwal, Cheng, Goldwasser, Kao, p 2 dom ˛. We abuse notation and take a tile Espanes, Schweller type t to be equivalent to the single-tile assembly 2012; Chen, Doty containing only t (at the origin if not otherwise 2013; Cannon, Demaine, Demaine, Eisenstat, specified). Two adjacent tiles in an assembly in- Patitz, Schweller, Summers, Winslow teract if the glue labels on their abutting sides are equal and have positive strength. Each assembly induces a binding graph, a grid graph whose vertices are tiles, with an edge between two tiles Problem Definition if they interact. The assembly is -stable if every cut of its binding graph has strength at least , The general idea of hierarchical self-assembly where the weight of an edge is the strength of (a.k.a., multiple tile [2], polyomino [8, 10], two- the glue it represents. That is, the assembly is handed [3, 5, 6]) is to model self-assembly of stable if at least energy  is required to separate tiles in which attachment of two multi-tile assem- the assembly into two parts. blies is allowed, as opposed to all attachments We now define both the seeded and hierarchi- being that of a single tile onto a larger assembly. cal variants of the tile assembly model. A seeded Several problems concern comparing hierarchical tile system is a triple T D .T;;/, where T is self-assembly to its single-tile-attachment variant a finite set of tile types,  W Z2 Ü T is a finite, (called the “seeded” model of self-assembly), -stable seed assembly, and  is the temperature. so we define both models here. The model of If T has a single seed tile s 2 T (i.e., .0; 0/ D s hierarchical self-assembly was first defined (in for some s 2 T and is undefined elsewhere), a slightly different form that restricted the size then we write T D .T;s;/:Let jT j denote jT j. of assemblies that could attach) by Aggarwal, An assembly ˛ is producible if either ˛ D  Cheng, Goldwasser, Kao, Moisset de Espanes, or if ˇ is a producible assembly and ˛ can be and Schweller [2]. Several generalizations of the obtained from ˇ by the stable binding of a single model exist that incorporated staged mixing of tile. In this case, write ˇ !1 ˛ (˛ is producible test tubes, “dissolvable” tiles, active signaling from ˇ by the attachment of one tile), and write across tiles, etc., but here we restrict attention ˇ ! ˛ if ˇ ! ˛ (˛ is producible from ˇ by the to the model closest to the seeded model of 1 attachment of zero or more tiles). An assembly is terminal if no tile can be -stably attached to it. Supported by NSF grants CCF-1219274, CCF-1162589, A hierarchical tile system is a pair T D .T; /, and 1317694. where T is a finite set of tile types and  2 N 904 Hierarchical Self-Assembly is the temperature. An assembly is producible if a notion. There are several intricacies to the full either it is a single tile from T or it is the -stable formal definition that are discussed in further result of translating two producible assemblies detail in [3, 5]. without overlap. Therefore, if an assembly ˛ is First, we require that there is a constant k 2 producible, then it is produced via an assembly ZC (the “resolution loss”) such that each tile type tree, a full binary tree whose root is labeled with t in T is “represented” by one or more k k ˛, whose j˛j leaves are labeled with tile types, blocks ˇ of tiles in S. In this case, we write and each internal node is a producible assembly r.ˇ/ D t, where ˇ Wf1;:::;kg2 Ü S and S formed by the stable attachment of its two child is the tile set of S. Then ˇ represents a k k assemblies. An assembly ˛ is terminal if for block of such tiles, possibly with empty positions every producible assembly ˇ, ˛ and ˇ cannot be at points x where ˇ.x/ is undefined. We call -stably attached. If ˛ can grow into ˇ by the such a k k block in S a “macrotile.” We can attachment of zero or more assemblies, then we extend r to a function R that, given an assembly write ˛ ! ˇ. ˛S partitioned into k k macrotiles, outputs an In either model, let AŒT  be the set of pro- assembly ˛T of T such that, for each macrotile ducible assemblies of T , and let AŒT   AŒT  ˇ of ˛S , r.ˇ/ D t, where t is the tile type at the be the set of producible, terminal assemblies of corresponding position in ˛T . T .ATAST is directed (a.k.a., deterministic, Given such a representation function R indi- confluent)ifjAŒT jD1.IfT is directed with cating how to interpret assemblies of S as repre- unique producible terminal assembly ˛,wesay senting assemblies of T , we now define what it that T uniquely produces ˛. It is easy to check means to say that S simulates T . For each pro- that in the seeded aTAM, T uniquely produces ducible assembly ˛T of T , there is a producible ˛ if and only if every producible assembly ˇ v assembly ˛S of S such that R.˛S / D T , and ˛. In the hierarchical model, a similar condition furthermore, for every producible assembly ˛S ,if holds, although it is more complex since hierar- R.˛S / D T , then T is producible in T . Finally, chical assemblies, unlike seeded assemblies, do we require that R respects the “single attach- not have a “canonical translation” defined by the ment” dynamics of T : there is a single tile that 0 seed position. T uniquely produces ˛ if and only can be attached to ˛T to result in ˛T if and only if if for every producible assembly ˇ, there is a there is some sequence of attachments to ˛S that 0 0 0 0 0 translation ˇ of ˇ such that ˇ v ˛. In particular, results in assembly ˛S such that R.˛S / D ˛T . if there is a producible assembly ˇ ¤ ˛ such With such an idea in mind, we can ask, “Is that dom ˛ D dom ˇ, then ˛ is not uniquely the hierarchical model at least as powerful as the produced. Since dom ˇ D dom ˛, every nonzero seeded model?” translation of ˇ has some tiled position outside Problem 1 For every seeded tile system T , de- of dom ˛, whence no such translation can be a sign a hierarchical tile system S that simulates T . subassembly of ˛, implying ˛ is not uniquely produced. Another interpretation of a solution to Prob- lem 1 is that, to the extent that the hierarchical Power of Hierarchical Assembly Compared model is more realistic than the seeded model by to Seeded incorporating the reality that tiles may aggregate One sense in which we can conclude that one even in the absence of a seed, such a solution model of computation M is at least as powerful as shows how to enforce seeded growth even in another model of computation M 0 is to show that such an unfriendly environment that permits non- any machine defined by M 0 can be “simulated seeded growth. efficiently” by a machine defined by M .Inself- assembly, there is a natural definition of what it Assembly Time means for one tile system S to “simulate” another We now define time complexity for hierarchi- T . We now discuss intuitively how to define such cal systems (this definition first appeared in [4], Hierarchical Self-Assembly 905 where it is explained in more detail). We treat reactions merely change concentrations of other each assembly as a single molecule. If two assem- assemblies (although these indirectly affect the blies ˛ and ˇ can attach to create an assembly time it will take our chosen assembly to complete, , then we model this as a chemical reaction by changing the rate of reactions with our chosen ˛ Cˇ ! , in which the rate constant is assumed assembly). to be equal for all reactions (and normalized to More formally, let T D .T; / be a hierarchi- 1). In particular, if ˛ and ˇ can be attached cal TAS, and let W T ! Œ0; 1 be a concentra- in two different ways, this is modeled as two tions function, giving the initialPconcentration of different reactions, even if both result in the same each tile type (we require that t2T .t/ D 1, assembly. a condition known as the “finite density con- At an intuitive level, the model we define can straint”). Let RC D Œ0; 1/, and let t 2 RC: For be explained as follows. We imagine dumping ˛ 2 AŒT ,letŒ˛.t/ (abbreviated Œ˛.t/ when all tiles into solution at once, and at the same is clear from context) denote the concentration of time, we grab one particular tile and dip it into ˛ at time t with respect to initial concentrations the solution as well, pulling it out of the solution , defined as follows. Given two assemblies ˛ and when it has assembled into a terminal assem- ˇ that can attach to form , we model this event H bly. Under the seeded model, the tile we grab as a chemical reaction R W ˛ C ˇ ! . Say that will be a seed, assumed to be the only copy a reaction ˛ C ˇ !  is symmetric if ˛ D ˇ. in solution (thus requiring that it appears only Define the propensity (a.k.a., reaction rate)ofR C once in any terminal assembly). In the seeded at time t 2 R to be R.t/ D Œ˛.t/ Œˇ.t/ if R 1 2 model, no reactions occur other than the attach- is not symmetric and R.t/ D 2 Œ˛.t/ if R is ment of individual tiles to the assembly we are symmetric. holding. In the hierarchical model, other reac- If ˛ is consumed in reactions ˛ C ˇ1 ! tions are allowed to occur in the background 1;:::;˛C ˇn ! n and produced in asymmet- 0 0 0 0 (we model this using the standard mass-action ric reactions ˇ1 C 1 ! ˛;:::;ˇm C m ! ˛ 00 00 00 model of chemical kinetics [7]), but only those and symmetric reactions ˇ1 Cˇ1 ! ˛;:::;ˇp C 00 reactions with the assembly we are holding move ˇp ! ˛, then the concentration Œ˛.t/ of ˛ at it “closer” to completion. The other background time t is described by the differential equation:

p dŒ˛.t/ Xm X 1 Xn D Œˇ0 .t/ Œ 0.t/ C Œˇ00.t/2  Œ˛.t/ Œˇ .t/; (1) dt i i 2 i i iD1 iD1 iD1

with boundary conditions Œ˛.0/ D .r/ if ˛ of time complexity in hierarchical systems and is an assembly consisting of a single tile r and comparing them to the seeded system time com- Œ˛.0/ D 0 otherwise. In other words, the propen- plexity model of [1], it is convenient to introduce sities of the various reactions involving ˛ de- a seedlike “timekeeper tile” into the hierarchical termine its rate of change, negatively if ˛ is system, in order to stochastically analyze the consumed and positively if ˛ is produced. growth of this tile when it reacts in a solution This completes the definition of the dynamic that is itself evolving according to the continu- evolution of concentrations of producible assem- ous model described above. The seed does not blies; it remains to define the time complexity have the purpose of nucleating growth but is of assembling a terminal assembly. Although we introduced merely to focus attention on a single have distinguished between seeded and hierarchi- molecule that has not yet assembled anything, in cal systems, for the purpose of defining a model order to ask how long it will take to assemble 906 Hierarchical Self-Assembly into a terminal assembly. The choice of which tile eral other ways in which the hierarchical model type to pick will be a parameter of the definition, is more powerful than the seeded model, but so that a system may have different assembly we restrict attention to simulation here.) For the times depending on the choice of timekeeper tile. most part, temperature 2 seeded systems are as Fix a copy of a tile type s to designate as powerful as those at higher temperatures, but the a “timekeeper seed.” The assembly of s into simulation results of [3] hold for higher temper- some terminal assembly ˛O is described as a time- atures as well. In particular, they showed that dependent continuous-time Markov process in every seeded temperature 4 tile system T can which each state represents a producible assem- be simulated by a hierarchical temperature 4 tile bly containing s, and the initial state is the size- system (as well as showing it is possible for tem- 1 assembly with only s. For each state ˛ rep- perature  hierarchical tile systems to simulate resenting a producible assembly with s at the temperature  seeded tile systems for  2f2;3g, origin, and for each pair of producible assemblies using similar logic to the higher-temperature con- ˇ;  such that ˛ C ˇ !  (with the translation struction). The definition of simulation has a assumed to happen only to ˇ so that ˛ stays parameter k indicating the resolution loss of the “fixed” in position), there is a transition in the simulation. In fact, the simulation described in [3] Markov process from state ˛ to state  with requires only resolution loss k D 5. transition rate Œˇ.t/. Figure 1 shows an example of S simulating We define TT ;;s to be the random variable T . The construction enforces the “simulation of representing the time taken for the copy of s dynamics” constraint that if and only if a single to assemble into a terminal assembly via some tile can attach in T , and then a 5 5 macrotile sequence of reactions as defined above. We define representing it in S can assemble. It is critical the time complexity of a directed hierarchical that each tile type in T is represented by more TAS T with concentrations  and timekeeper s than one type of macrotile in S: each different to be T.T ; ;s/D E TT ;;s . type of macrotile represents a different subset For a shape S Z2 (finite and connected), of sides that can cooperate to allow the tile to define the diameter of S to be diam.S/ D bind. To achieve this, each macrotile consists of a max ku  vk1; where kwk1 is the L1 norm of w. central “brick” (itself a 3 3 block composed of 9 u;v2S unique tile types with held together with strength- Problem 2 Design a hierarchical tile system 4 glues) surrounded by “mortar” (forming a ring T D .T; / such that every producible terminal around the central brick). Figure 1 shows “mortar assembly ˛O has the same shape S, and for rectangles” but, similarly to the brick, these are some s 2 T and concentrations function just 3 1 assemblies of 3 individual tile types W T ! Œ0; 1, T.T ; ;s/D o.diam.S//. with strength-4 glues. The logic of the system It is provably impossible to achieve this with is such that if a brick B designed for a subset the seeded model [1, 4], since all assemblies in of cooperating sides C ÂfN; S; E; Wg, then that model require expected time at least propor- only if the mortar for all sides in C is present tional to their diameter. can B attach. Its attachment is required to fill in the remaining mortar representing the other sides in fN; S; E; WgnC that may not be present. Finally, those tiles enable the assembly of mortar Key Results in adjacent 5 5 blocks, to be ready for possible cooperation to bind bricks in those blocks. Power of Hierarchical Assembly Compared to Seeded Assembly Time Cannon, Demaine, Demaine, Eisenstat, Patitz, Chen and Doty [4] showed a solution to Prob- Schweller, Summers, and Winslow [3]showeda lem 2, by proving that for infinitely many n 2 solution to Problem 1. (They also showed sev- N, there is a (non-directed) hierarchical TAS Hierarchical Self-Assembly 907

Brick

Mortar tile

H

Mortar rectangle Simulated aTAM, τ = 4

Macrotile

Hierarchical Self-Assembly, Fig. 1 Simulation of a arrows represent glues of strength 1. In the seeded tile seeded tile system T of temperature 4 by a hierarchical system, the number of dashes on the side of a tile represent tile system S of temperature 4 (Figure taken from [3]). its strength Filled arrows represent glues of strength 2, and unfilled

T D .T; 2/ that strictly self-assembles an n n0 ter of the bar, which is w D n4=5 for a horizontal rectangle S, where n0 D o.n/ (hence diam.S/ D bar and mk2 D n3=5 (where k is a parameter .n/), such that jT jDO.log n/ and there is a that we set to be n1=5) for a vertical bar. The tile type s 2 T and concentrations function W speedup comes from the fact that each horizontal T ! Œ0; 1 such that T.T ; ;s/D O.n4=5 log n/. bar can attach to one of k different binding sites The construction consists of m D n1=5 stages on a vertical bar, so the expected time for this shown in Fig. 2, where each stage consists of the to happen is factor k lower than if there were attachment of two “horizontal bars” to a single only a single binding site. The vertical “arm” on “vertical bar” as shown in Fig. 3. The vertical the left of each horizontal bar has the purpose of bar of the next stage then attaches to the right preventing any other horizontal bars from binding of the two horizontal bars, which cooperate to near it. Each stage also requires filler tiles to fill allow the binding because they each have a single in the gap regions, but the time required for this strength 1 glue. All vertical bars are identical is negligible compared to the time for all vertical when they attach, but attachment triggers the and horizontal bars to attach. growth of some tiles (shown in orange in Figs. 2 Note that this construction is not directed: and 3) that make the attachment sites on the right although every producible terminal assembly has side different from their locations in the previous the shape of an n n0 rectangle, there are many stage, which is how the stages “count down” from such terminal assemblies. Chen and Doty [4] m to 1. also showed that for a class of directed systems The bars themselves are assembled in a “stan- called “partial order tile systems,” no solution to dard” way that requires time linear in the diame- Problem 2 exists: provably any such tile system 908 Hierarchical Self-Assembly

Hierarchical Self-Assembly, Fig. 2 High-level overview of interaction of “vertical stage 1 bars” and “horizontal bars” to create the rectangle in the solution to Problem 2 that assembles in time stage 2 sublinear in its diameter. Filler tiles fill in the empty regions. If glues overlap two regions then represent stage 3 a formed bond. If glues overlap one region but not another, they are glues from the former region but are mismatched (and thus “covered and protected”) by the latter region

horizontal bar type A

k identical pairs of glues spaced O(1) apart; vertical bar as it "group A glues" appears before binding

previous stage lower horizontal bar attached partial vertical to one of these k glues bar (will complete after binding to two horizontal bars)

width = w

height = O(mk 2) horizontal bar type B

k identical pairs of glues (different glues from top) spaced O(k) apart; "group B glues"

mk identical glues spaced O(k) apart

vertical bar after "post- binding processing" to place stage-specific right-side glues

Hierarchical Self-Assembly, Fig. 3 “Vertical bars” for stage of Fig. 2. “Type B” horizontal bars have a longer the construction of a fast-assembling square, and their vertical arm than “Type A” since the glues they must block interaction with horizontal bars, as shown for a single are farther apart Hierarchical Space Decompositions for Low-Density Scenes 909 assembling a shape of diameter d requires ex- ings of the 23rd annual ACM-SIAM symposium on pected time ˝.d/. discrete algorithms, Kyoto, pp 1163–1182 5. Demaine ED, Patitz MJ, Rogers T, Schweller RT, Summers SM, Woods D (2013) The two-handed tile assembly model is not intrinsically universal. In: Open Problems ICALP 2013: proceedings of the 40th international colloquium on automata, languages and program- It is known [2] that the tile complexity of assem- ming, Riga, July 2013 6. Doty D, Patitz MJ, Reishus D, Schweller RT, Sum- bling an n k rectangle in the seeded aTAM, if mers SM (2010) Strong fault-tolerance for self- k< log n , is asymptotically lower log log nlog log log n assembly with fuzzy temperature. In: FOCS 2010: n1=k proceedings of the 51st annual IEEE symposium on bounded by ˝ k and upper bounded by foundations of computer science, Las Vegas, pp 417– O.n1=k/. For the hierarchical model, the up- 426 7. Epstein IR, Pojman JA (1998) An introduction per bound holds as well [2], but the strongest to nonlinear chemical dynamics: oscillations, waves, known lower bound is the information-theoretic patterns, and chaos. Oxford University Press, Oxford log n 8. Luhrs C (2010) Polyomino-safe DNA self-assembly ˝ log log n : via block replacement. Nat Comput 9(1):97–109. H Question 1 What is the tile complexity of assem- Preliminary version appeared in DNA 2008 bling an n k rectangle in the hierarchical model, 9. Winfree E (1998) Algorithmic self-assembly of log n DNA. PhD thesis, California Institute of Technology, when k

1. Adleman LM, Cheng Q, Goel A, Huang M-D (2001) Running time and program size for self-assembled Keywords squares. In: STOC 2001: proceedings of the thirty- third annual ACM symposium on theory of comput- Binary space partitions; Compressed quadtrees; ing, Hersonissos. ACM, pp 740–748 Computational geometry; Hierarchical space 2. Aggarwal G, Cheng Q, Goldwasser MH, Kao M-Y, Moisset de Espanés P, Schweller RT (2005) Com- decompositions; Realistic input models plexities for generalized models of self-assembly. SIAM J Comput 34:1493–1515. Preliminary version appeared in SODA 2004 Years and Authors of Summarized 3. Cannon S, Demaine ED, Demaine ML, Eisenstat S, Patitz MJ, Schweller RT, Summers SM, Winslow Original Work A (2013) Two hands are better than one (up to constant factors). In: STACS 2013: proceedings of 2000; De Berg the thirtieth international symposium on theoretical 2010; De Berg, Haverkort, Thite, Toma aspects of computer science, Kiel, pp 172–184 4. Chen H-L, Doty D (2012) Parallelism and time in hierarchical self-assembly. In: SODA 2012: proceed- 910 Hierarchical Space Decompositions for Low-Density Scenes

Problem Definition and to the right of `, respectively (or, if ` is horizontal, below and above `). Many algorithmic problems on spatial data can – The left subtree of v is a BSP tree for be solved efficiently if a suitable decomposition S  WD fo \ ` W o 2 Sg, the set of object of the ambient space is available. Two desirable fragments lying in the half-plane `. properties of the decomposition are that its cells – The right subtree of v is a BSP tree for have a nice shape – convex and/or of constant S C WD fo \ `C W o 2 Sg, the set of object complexity – and that each cell intersects only a fragments lying in the half-plane `C. few objects from the given data set. Another de- sirable property is that the decomposition is hier- The size of a BSP tree is the total number of object archical, meaning that the space is partitioned in a fragments stored in the tree. recursive manner. Popular hierarchical space de- compositions include quadtrees and binary space Compressed Quadtrees partitions. Let U D Œ0; 12 be the unit square. We say that When the objects in the given data set are a square   U is a canonical square if there nonpoint objects, they can be fragmented by is an integer k > 0 such that  is a cell of the the partitioning process. This fragmentation has regular subdivision of U into 2k 2k squares. a negative impact on the storage requirements A donut is the set-theoretic difference out n in of the decomposition and on the efficiency of of a canonical square out and a canonical square T algorithms operating on it. Hence, it is desirable in out.Acompressed quadtree for a set P to minimize fragmentation. In this chapter, we of points inside a canonical square  defined as describe methods to construct linear-size com- follows; see also Fig. 2 (middle). pressed quadtrees and binary space partitions for so-called low-density sets. To simplify the • If a predefined stopping criterion is met – presentation, we describe the constructions in the usually this is when jP j is sufficiently small plane. We use S to denote the set of n objects – then T consists of a single leaf storing the for which we want to construct a space decompo- set P . sition and assume for simplicity that the objects • If the stopping criterion is not met, then T is in S are disjoint, convex, and of nonzero area. defined as follows. Let NE denote the north-

east quadrant of  and let PNE WD P \ NE. Binary Space Partitions Define SE, SW, NW and PSE, PSW, PNW sim- A binary space partition for a set S of n objects ilarly for the other three quadrants. (Here we in the plane is a recursive decomposition of the should make sure that points on the boundary plane by lines, typically such that each cell in between quadrants are assigned to quadrants the final decomposition intersects only a few in a consistent manner.) Now T consists of objects from S. The tree structure modeling this a root node v with four or two children, decomposition is called a binary space partition depending on how many of the sets PNE, PSE, tree,orBSP tree for short – see Fig. 1 for an PSW, PNW are nonempty: illustration. Thus, a BSP tree T for S can be – If at least two of the sets PNE, PSE, PSW, defined as follows. PNW are nonempty, then v has four children

vNE, vSE, vSW, vNW. The child vNE is the • If a predefined stopping criterion is met – root of a compressed quadtree for the set

often this is when jSj is sufficiently small – PNE inside the square NW; the other three then T consists of a single leaf where the set children are defined similarly for the point S is stored. sets inside the other quadrants. T • Otherwise the root node v of stores a – If only one of PNE, PSE, PSW, PNW is  suitably chosen splitting line `.Let` and nonempty, then v has two children vin C ` denote the half-planes lying to the left and vout. The child vin is the root of a Hierarchical Space Decompositions for Low-Density Scenes 911

1 1

 o7 5 2 6 o2  o3 2 4 o1    o7 6 o6 3 5 7 7 8 7 o  o o o  o5 5 4 3 2 4 8 3 o4

o6 o2 o5 o1

Hierarchical Space Decompositions for Low-Density Scenes, Fig. 1 A binary space partition for a set of polygons (left) and the corresponding BSP tree (right) H

Hierarchical Space Decompositions for Low-Density construct a compressed quadtree for the vertices (middle), Scenes, Fig. 2 Construction of a compressed quadtree and put the disks back in (right) for a set of disks: take the bounding-box vertices (left),

compressed quadtree for P inside in, disks. The size of a compressed quadtree for a set where in is the smallest canonical square of nonpoint objects is defined as the total number containing all points from P . The other of object fragments stored in the tree. Because child is a leaf corresponding to the donut nonpoint objects may be split into fragments  n in. during the subdivision process, a compressed quadtree for nonpoint objects is not guaranteed A compressed quadtree for a set of n points has to have linear size. size O.n/. Above we defined compressed quadtrees for Low-Density Scenes point sets. In this chapter, we are interested The main question we are interested in is the in compressed quadtrees for nonpoint objects. following: given a set S of n objects, can we These are defined similarly: each internal node construct a compressed quadtree or BSP tree corresponds to a canonical square, and each leaf with O.n/ leaves such that each leaf region is a canonical square or a donut. This time donuts intersects O.1/ objects? In general, the answer to need not be empty, but may intersect objects this question is no. For compressed quadtrees, (although not too many). The right picture in this can be seen by considering a set S of Fig. 2 shows a compressed quadtree for a set of slanted parallel segments that are very close 912 Hierarchical Space Decompositions for Low-Density Scenes together. A linear-size BSP tree cannot be low density sufficient to guarantee a hierarchical guaranteed either: there are sets of n disjoint space decomposition of linear size? The answer segments in the plane for which any BSP tree has is yes, and constructing the space decomposition size ˝.nlog n= log log n/ [7]. In R3 the situation is surprisingly easy. is even worse: there are sets of n disjoint triangles for which any BSP tree has size ˝.n2/ [5]. (Both bounds are tight: there are algorithms that Key Results guarantee a BSP tree of size O.nlog n= log log n/ in the plane [8] and of size O.n2/ in R3 [6].) The construction of space decompositions for Fortunately, in practice, the objects for which we low-density sets is based on the following lemma. want to construct a space decomposition are often In the lemma, the square  is considered to be distributed nicely, which allows us to construct open, that is,  does not include its boundary. Let much smaller decompositions than for the worst- bb.o/ denote the axis-aligned bounding box of an case examples mentioned above. To formalize object o. this, we define the concept of density of a set of Lemma 1 Let S be a set of n objects in the plane objects in Rd . and let BS denote the set of 4n vertices of the Definition 1 The density of a set S of objects in bounding boxes bb.o/ of the objects o 2 S. Let Rd , denoted density.S/, is defined as the smallest  be any square region in the plane. Then the number such that the following holds: any ball number of objects in S intersecting  is at most b Rd intersects at most objects o 2 S such k C 4 , where k is the number of bounding-box that diam.o/ > diam.b/, where diam. / denotes vertices inside  and WD density.S/. the diameter of an object. With Lemma 1 in hand, it is surprisingly simple As illustrated in Fig. 3(i), a set of n parallel to construct BSP trees or compressed quadtrees segments can have density n if the segments are of small size for any given set S whose density is very close together. In most practical situations, small. however, the input objects are distributed nicely and the density will be small. For many classes Binary Space Partitions of objects, one can even prove that the density For BSP trees we proceed as follows. Let BS be is O.1/. For example, a set of disjoint disks in the set of vertices of the bounding boxes of the the plane has density at most 5. More generally, objects in S. In a generic step in the recursive any set of disjoint objects that are fat – exam- construction of T , we are given a square  and ples of fat objects are disks, squares, triangles the set of points BS ./ WD BS \ . Initially  whose minimum angle is lower bounded – has is a square containing all points of BS . When density O.1/ [3]. The main question now is: Is BS D;, then T consists of a single leaf and the

(i) (ii)

b

b

Hierarchical Space Decompositions for Low-Density matter where it is placed or what its size is, intersects at Scenes, Fig. 3 (i) The ball b intersects all n segments most three triangles with diameter at least diam.b/,sothe and the segments have diameter larger than diam.b/,so density of the set of triangles is 3 the density of the set of segments is n.(ii)Anyballb,no Hierarchical Space Decompositions for Low-Density Scenes 913

(i) (ii) σ 1 1  1 1 2 2 3 2 3 2

σ

Hierarchical Space Decompositions for Low-Density Scenes, Fig. 4 Two cases in the construction of the BSP tree recursion ends; otherwise we proceed as follows. square coincide. Figure 2 illustrates the process.

Let NE, SE, SW, and NW denote the four quad- The resulting compressed quadtree has O.n/ leaf rants of . We now have two cases, illustrated in regions, which are canonical squares or donuts. Fig. 4. Again using Lemma 1, one can argue that each H leaf region intersects O. / objects. Case (i): all points in BS ./ lie in the same quadrant. Let  0 be the smallest square shar- ing a corner with  and containing all points Improvements and Generalizations from BS ./ in its interior or on its boundary. The constructions above guarantee that each re- Split  into three regions using a vertical gion in the space decomposition is intersected and a horizontal splitting line such that  0 is by O. / objects and that the number of regions one of those regions; see Fig. 4(i). Recursively is O.n/. Hence, the total number of the object construct a BSP tree for the square  0 with fragments is O. n/. The main idea behind the 0 respect to the set BS . / of points lying in the introduction of the density is that in prac- interior of  0. tice is often a small constant. Nevertheless, Case (ii): not all points in BS ./ lie in the same it is (at least from a theoretical point of view) quadrant. Split  into four quadrants using a desirable to get rid of the dependency on vertical and two horizontal splitting lines; see in the number of fragments. This is possible Fig. 4(ii). Recursively construct a BSP tree for by reducing the number of regions in the de- each quadrant with respect to the points lying composition to .n= /. To this end, we allow in its interior. leaf regions to contain up to O. / bounding- box vertices. Note that Lemma 1 implies that a The construction produces a subdivision of the square with O. / bounding-box vertices inside initial square into O.n/ leaf regions, which are intersects O. / objects. If implemented correctly, squares or rectangles and which do not contain this idea leads to decompositions with O.n= / points from B in their interior. Using Lemma 1, S regions each of which intersects O. / objects, one can argue that each leaf region intersects both for binary space partitions [2, Section 12.5] O. / objects. and for compressed quadtrees [4]. The results can also be generalized to higher dimensions, giving Compressed Quadtrees the following theorem. The construction of a compressed quadtree for a low-density set S is also based on the set Theorem 1 Let S be a set of n objects in Rd BS of bounding-box vertices: we construct a and let WD density.S/. There is a binary space compressed quadtree for BS ,wherewestopthe partition for S consisting of O.n= / leaf regions, recursive construction when a square contains each intersecting O. / objects. Similarly, there is bounding-box vertices from at most one object in a compressed quadtree with O.n= / leaf regions, S or when all bounding-box vertices inside the each intersecting O. / objects. 914 High Performance Algorithm Engineering for Large-Scale Problems

Cross-References Problem Definition

 Binary Space Partitions Algorithm engineering refers to the process  Quadtrees and Morton Indexing required to transform a pencil-and-paper algorithm into a robust, efficient, well tested, and easily usable implementation. Thus it encompasses a number of topics, from modeling Recommended Reading cache behavior to the principles of good software engineering; its main focus, however, 1. De Berg M (2000) Linear size binary space partitions for uncluttered scenes. Algorithmica 28(3):353–366 is experimentation. In that sense, it may be 2. De Berg M, Cheong O, Van Kreveld M, Overmars viewed as a recent outgrowth of Experimental M (2008) Computational geometry: algorithms and Algorithmics [14], which is specifically devoted applications, 3rd edn. Springer, Berlin/Heidelberg to the development of methods, tools, and 3. De Berg M, Katz M, Van der Stappen AF, Vleugels J (2002) Realistic input models for geometric algo- practices for assessing and refining algorithms rithms. Algorithmica 34(1):81–97 through experimentation. The ACM Journal of 4. De Berg M, Haverkort H, Thite S, Toma L (2010) Star- Experimental Algorithmics (JEA),atURLwww. quadtrees and guard-quadtrees: I/O-efficient indexes jea.acm.org, is devoted to this area. for fat triangulations and low-density planar subdivi- sions. Comput Geom Theory Appl 43:493–513 High-performance algorithm engineering [2] 5. Chazelle B (1984) Convex partitions of polyhedra: a focuses on one of the many facets of algorithm lower bound and worst-case optimal algorithm. SIAM engineering: speed. The high-performance aspect J Comput 13(3):488–507 does not immediately imply parallelism; in fact, 6. Paterson MS and Yao FF (1990) Efficient binary space partitions for hidden-surface removal and solid model- in any highly parallel task, most of the impact of ing. Discret Comput Geom 5(5):485–503 high-performance algorithm engineering tends to 7. Tóth CD (2003) A note on binary plane partitions. come from refining the serial part of the code. Discret Comput Geom 30(1):3–16 algorithm engineering 8. Tóth CD (2011) Binary plane partitions for disjoint line The term was first used segments. Discret Comput Geom 45(4):617–646 with specificity in 1997, with the organization of the first Workshop on Algorithm Engineering (WAE 97). Since then, this workshop has taken place every summer in Europe. The 1998 Work- shop on Algorithms and Experiments (ALEX98) High Performance Algorithm was held in Italy and provided a discussion forum Engineering for Large-Scale for researchers and practitioners interested in the Problems design, analyzes and experimental testing of ex- act and heuristic algorithms. A sibling workshop David A. Bader was started in the Unites States in 1999, the Work- College of Computing, Georgia Institute of shop on Algorithm Engineering and Experiments Technology, Atlanta, GA, USA (ALENEX99), which has taken place every win- ter, colocated with the ACM/SIAM Symposium on Discrete Algorithms (SODA). Keywords

Experimental algorithmics Key Results

Parallel computing has two closely related main Years and Authors of Summarized uses. First, with more memory and storage Original Work resources than available on a single workstation, a parallel computer can solve correspondingly 2005; Bader larger instances of the same problems. This High Performance Algorithm Engineering for Large-Scale Problems 915 increase in size can translate into running higher- Performance Fortran [10]), as well as the fidelity simulations, handling higher volumes parallel algorithm packages (e.g., fast Fourier of information in data-intensive applications, transforms using FFTW [6] or parallelized and answering larger numbers of queries and linear algebra routines in ScaLAPACK [4]), datamining requests in corporate databases. often exhibit differing performance on different Secondly, with more processors and larger types of parallel platforms. When multiple aggregate memory subsystems than available library packages exist for the same task, a user on a single workstation, a parallel computer may observe different running times for each can often solve problems faster. This increase library version even on the same platform. in speed can also translate into all of the Thus a running-time analysis should clearly advantages listed above, but perhaps its crucial separate the time spent in the user code from advantage is in turnaround time. When the that spent in various library calls. Indeed, if computation is part of a real-time system, such particular library calls contribute significantly as weather forecasting, financial investment to the running time, the number of such calls decision-making, or tracking and guidance and running time for each call should be systems, turnaround time is obviously the critical recorded and used in the analysis, thereby H issue. A less obvious benefit of shortened helping library developers focus on the most cost- turnaround time is higher-quality work: when effective improvements. For example, in a simple a computational experiment takes less than an message-passing program, one can characterize hour, the researcher can afford the luxury of the work done by keeping track of sequential exploration – running several different scenarios work, communication volume, and number in order to gain a better understanding of the of communications. A more general program phenomena being studied. using the collective communication routines of In algorithm engineering, the aim is to present MPI could also count the number of calls to repeatable results through experiments that apply these routines. Several packages are available to to a broader class of computers than the specific instrument MPI codes in order to capture such make of computer system used during the experi- data (e.g., MPICH’s nupshot [8], Pablo [17], ment. For sequential computing, empirical results and Vampir [15]). The SKaMPI benchmark [18] are often fairly machine-independent. While ma- allows running-time predictions based on such chine characteristics such as word size, cache and measurements even if the target machine is not main memory sizes, and processor and bus speeds available for program development. SKaMPI was differ, comparisons across different uniprocessor designed for robustness, accuracy, portability, machines show the same trends. In particular, and efficiency; For example, SKaMPI adaptively the number of memory accesses and proces- controls how often measurements are repeated, sor operations remains fairly constant (or within adaptively refines message-length and step-width a small constant factor). In high-performance al- at “interesting” points, recovers from crashes, gorithm engineering with parallel computers, on and automatically generates reports. the other hand, this portability is usually absent: each machine and environment is its own special case. One obvious reason is major differences Applications in hardware that affect the balance of commu- nication and computation costs – a true shared- The following are several examples of algorithm memory machine exhibits very different behav- engineering studies for high-performance and ior from that of a cluster based on commodity parallelcomputing. networks. Another reason is that the communication 1. Bader’s prior publications (see [2] and http:// libraries and parallel programming environments www.cc.gatech.edu/~bader) contain many (e.g., MPI [12], OpenMP [16], and High- empirical studies of parallel algorithms for 916 High Performance Algorithm Engineering for Large-Scale Problems

combinatorial problems like sorting, selection, 5. Vitter et al. provide the canonical theoretic graph algorithms, and image processing. foundation for I/O-intensive experimental 2. In a recent demonstration of the power of algorithmics using external parallel disks (e.g., high-performance algorithm engineering, see [1, 19, 20]). Examples from sorting, FFT, a million-fold speed-up was achieved through permuting, and matrix transposition problems a combination of a 2,000-fold speedup are used to demonstrate the parallel disk in the serial execution of the code and model. a 512-fold speedup due to parallelism 6. Juurlink and Wijshoff [11] perform one of (a speed-up, however, that will scale to any the first detailed experimental accounts on the number of processors) [13]. (In a further preciseness of several parallel computation demonstration of algorithm engineering, models on five parallel platforms. The authors additional refinements in the search and discuss the predictive capabilities of the bounding strategies have added another models, compare the models to find out which speedup to the serial part of about 1,000, allows for the design of the most efficient for an overall speedup in excess of 2 billion) parallel algorithms, and experimentally 3. JáJá and Helman conducted empirical studies compare the performance of algorithms for prefix computations, sorting, and list- designed with the model versus those designed ranking, on symmetric multiprocessors. The with machine-specific characteristics in mind. sorting research (see [9]) extends Vitter’s The authors derive model parameters for each external Parallel Disk Model to the internal platform, analyses for a variety of algorithms memory hierarchy of SMPs and uses this new (matrix multiplication, bitonic sort, sample computational model to analyze a general- sort, all-pairs shortest path), and detailed purpose sample sort that operates efficiently in performance comparisons. shared-memory. The performance evaluation 7. The LogP model of Culler et al. [5] provides uses nine well-defined benchmarks. The a realistic model for designing parallel benchmarks include input distributions algorithms for message-passing platforms. Its commonly used for sorting benchmarks (such use is demonstrated for a number of problems, as keys selected uniformly and at random), including sorting. but also benchmarks designed to challenge the 8. Several research groups have performed implementation through load imbalance and extensive algorithm engineering for high- memory contention and to circumvent algo- performance numerical computing. One of the rithmic design choices based on specific input most prominent efforts is that led by Dongarra properties (such as data distribution, presence for ScaLAPACK [4], a scalable linear algebra of duplicate keys, pre-sorted inputs, etc.). library for parallel computers. ScaLAPACK 4. In [3] Blelloch et al. compare through analysis encapsulates much of the high-performance and implementation three sorting algorithms algorithm engineering with significant impact on the Thinking Machines CM-2. Despite the to its users who require efficient parallel use of an outdated (and no longer available) versions of matrix–matrix linear algebra platform, this paper is a gem and should be routines. New approaches for automatically required reading for every parallel algorithm tuning the sequential library (e.g., LAPACK) designer. In one of the first studies of its kind, are now available as the ATLAS package [21]. the authors estimate running times of four of the machine’s primitives, then analyze the steps of the three sorting algorithms in terms Open Problems of these parameters. The experimental studies of the performance are normalized to provide All of the tools and techniques developed over clear comparison of how the algorithms the last several years for algorithm engineer- scale with input size on a 32K-processor ing are applicable to high-performance algorithm CM-2. engineering. However, many of these tools need High Performance Algorithm Engineering for Large-Scale Problems 917 further refinement. For example, cache-efficient 2. Bader DA, Moret BME, Sanders P (2002) Algorithm programming is a key to performance but it is not engineering for parallel computation. In: Fleischer R, yet well understood, mainly because of complex Meineche-Schmidt E, Moret BME (eds) Experimen- tal algorithmics. Lecture notes in computer science, machine-dependent issues like limited associativ- vol 2547. Springer, Berlin, pp 1–23 ity, virtual address translation, and increasingly 3. Blelloch GE, Leiserson CE, Maggs BM, Plaxton CG, deep hierarchies of high-performance machines. Smith SJ, Zagha M (1998) An experimental analysis A key question is whether one can find simple of parallel sorting algorithms. Theory Comput Syst 31(2):135–167 models as a basis for algorithm development. 4. Choi J, Dongarra JJ, Pozo R, Walker DW (1992) For example, cache-oblivious algorithms [7]are ScaLAPACK: a scalable linear algebra library for efficient at all levels of the memory hierarchy in distributed memory concurrent computers. In: The 4th symposium on the frontiers of massively parallel theory, but so far only few work well in practice. computations. McLean, pp 120–127 As another example, profiling a running program 5. Culler DE, Karp RM, Patterson DA, Sahay A, offers serious challenges in a serial environment Schauser KE, Santos E, Subramonian R, von Eicken (any profiling tool affects the behavior of what T (1993) LogP: towards a realistic model of parallel computation. In: 4th symposium on principles and is being observed), but these challenges pale practice of parallel programming. ACM SIGPLAN, H in comparison with those arising in a parallel pp 1–12 or distributed environment (for instance, mea- 6. Frigo M, Johnson SG (1998) FFTW: an adap- suring communication bottlenecks may require tive software architecture for the FFT. In: Proceed- ings of IEEE international conference on acous- hardware assistance from the network switches tics, speech, and signal processing, Seattle, vol 3, or at least reprogramming them, which is sure pp 1381–1384 to affect their behavior). Designing efficient and 7. Frigo M, Leiserson CE, Prokop H, Ramachandran, portable algorithms for commodity multicore and S (1999) Cacheoblivious algorithms. In: Proceed- ings of 40th annual symposium on foundations manycore processors is an open challenge. of computer science (FOCS-99), New York. IEEE, pp 285–297 8. Gropp W, Lusk E, Doss N, Skjellum A (1996) A high- Cross-References performance, portable implementation of the MPI message passing interface standard. Technical report,  Analyzing Cache Misses Argonne National Laboratory, Argonne. www.mcs. anl.gov/mpi/mpich/  Cache-Oblivious B-Tree 9. Helman DR, JáJá J (1998) Sorting on clusters of  Cache-Oblivious Model SMP’s. In: Proceedings of 12th international parallel  Cache-Oblivious Sorting processing symposium, Orlando, pp 1–7  Engineering Algorithms for Computational Bi- 10. High Performance Fortran Forum (1993) High perfor- mance Fortran language specification, 1.0 edn., May ology 1993  Engineering Algorithms for Large Network 11. Juurlink BHH, Wijshoff HAG (1998) A quantitative Applications comparison of parallel computation models. ACM  Engineering Geometric Algorithms Trans Comput Syst 13(3):271–318 12. Message Passing Interface Forum (1995) MPI: a  Experimental Methods for Algorithm Analysis message-passing interface standard. Technical report,  External Sorting and Permuting University of Tennessee, Knoxville, June 1995. Ver-  Implementation Challenge for Shortest Paths sion 1.1  Implementation Challenge for TSP Heuristics 13. Moret BME, Bader DA, Warnow T (2002) High- performance algorithm engineering for computa-  I/O-Model tional phylogenetics. J Supercomput 22:99–111, Spe-  Visualization Techniques for Algorithm Engi- cial issue on the best papers from ICCS’01 neering 14. Moret BME, Shapiro HD (2001) Algorithms and experiments: the new (and old) methodology. J Univ Comput Sci 7(5):434–446 15. Nagel WE, Arnold A, Weber M, Hoppe HC, Solchen- Recommended Reading bach K (1996) VAMPIR: visualization and analysis of MPI resources. Supercomputer 63 12(1):69–80 1. Aggarwal A, Vitter J (1988) The input/output com- 16. OpenMP Architecture Review Board (1997) plexity of sorting and related problems. Commun OpenMP: a proposed industry standard API for ACM 31:1116–1127 shared memory programming. www.openmp.org 918 Holant Problems

17. Reed DA, Aydt RA, Noe RJ, Roth PC, Shields KA, problems and is inspired by Valiant’s holographic Schwartz B, Tavera LF (1993) Scalable performance algorithms [12] (also cf. entry  Holographic Al- analysis: the Pablo performance analysis environ- gorithms). A constraint function f ,orsignature, ment. In: Skjellum A (ed) Proceedings of scalable n parallel libraries conference, Mississippi State Uni- is a mapping from Œ  to C, representing a local versity. IEEE Computer Society Press, pp 104–113 contribution to a global sum. Here, Œ  is a finite 18. Reussner R, Sanders P, Träff J (1998, accepted) domain set, and n is the arity of f . The range SKaMPI: a comprehensive benchmark for public is usually taken to be C, but it can be replaced benchmarking of MPI. Scientific programming, 2001. Conference version with Prechelt, L., Müller, M. In: by any commutative semiring. A Holant problem Proceedings of EuroPVM/MPI Holant.F/ is parameterized by a set of constraint 19. Vitter JS, Shriver EAM (1994) Algorithms for par- functions F. We usually focus on the Boolean allel memory. I: two-level memories. Algorithmica 2 12(2/3):110–147 domain, namely, D . For consideration of 20. Vitter JS, Shriver EAM (1994) Algorithms for par- models of computation, we restrict function val- allel memory II: hierarchical multilevel memories. ues to be complex algebraic numbers. Algorithmica 12(2/3):148–169 We allow multigraphs, namely, graphs with 21. Whaley R, Dongarra J (1998) Automatically tuned linear algebra software (ATLAS). In: Proceedings self-loops and parallel edges. A signature grid of supercomputing 98, Orlando. www.netlib.org/utk/ ˝ D .G; / of Holant.F/ consists of a graph people/JackDongarra/PAPERS/atlas-sc98.ps G D .V; E/, where  assigns each vertex v 2 V and its incident edges with some fv 2 F and its input variables. We say ˝ is a planar signature grid if G is planar. The Holant problem on instance ˝ is to evaluate Holant Problems X Y Holant.˝I F/ D f . j /; Jin-Yi Cai1;2, Heng Guo2, and Tyson Williams2 v E.v/  v2V 1Beijing University, Beijing, China 2 Computer Sciences Department, University of a sum over all edge labelings  W E ! Œ , where Wisconsin–Madison, Madison, WI, USA E.v/ denotes the incident edges of v and  jE.v/ denotes the restriction of  to E.v/.Thisisalso Keywords known as the partition function in the statistical physics literature. Computational complexity; Counting complexity; Formally, a set of signatures F defines the Holant problems; Partition functions following Holant problem: Name Holant.F/ Years and Authors of Summarized Instance A signature grid ˝ D .G; / Original Work Output Holant.˝I F/ F 2011; Cai, Lu, Xia The problem Pl-Holant. / is defined similarly 2012; Huang, Lu using a planar signature grid. 2013; Cai, Guo, Williams A function fv can be represented by listing 2013; Guo, Lu, Valiant its values in lexicographical order as in a truth Cdeg.v/ 2014; Cai, Guo, Williams table, which is a vector in or as a tensor in .C /˝ deg.v/. Special focus has been put on symmetric signatures, which are functions in- Problem Definition variant under any permutation of the input. An example is the EQUALITY signature Dn of arity The framework of Holant problems is intended n. A Boolean symmetric function f of arity n to capture a class of sum-of-product computa- can be listed as Œf0;f1;:::;fn, where fw is the tions in a more refined way than counting CSP function value of f when the input has Hamming Holant Problems 919 weight w. Using this notation, an EQUALITY sig- A symmetric degenerate signature has the form nature is Œ1;0;:::;0;1. Another example is the u˝n. EXACTONE signature Œ0;1;0;:::;0. Clearly, the Definition 2 A k-ary function f.x ;:::;x / is Holant problem defined by this signature counts 1 k of affine type if it has the form the number of perfect matchings. P The set F is allowed to be an infinite set. For n p j D1h˛j ;xi Holant.F/ to be tractable, the problem must be AxD0 1 ; computable in polynomial time even when the C T description of the signatures in the input ˝ is where 2 , x D .x1;x2;:::;xk;1/ , A is a F F included in the input size. In contrast, we say matrix over 2, ˛j is a vector over 2, and Holant.F/ is #P-hard if there exists a finite subset is a 0–1 indicator function such that AxD0 is 1 of F for which the problem is #P-hard. iff Ax D 0. Note that the dot product h˛jP;xi is calculated over F , while the summation n The Holant framework is a generalization and 2 p j D1 refinement of both counting graph homomor- on the exponent of i D 1 is evaluated as a phisms and counting constraint satisfaction prob- sum mod 4 of 0–1 terms. We use A to denote the lems (see entry  Complexity Dichotomies for set of all affine-type functions. H Counting Graph Homomorphisms for more de- An alternative but equivalent form for an tails and results). p Q.x1;x2;:::;xk / affine-type function is AxD0 1 where Q. / is a quadratic form with integer coefficients that are even for every cross Key Results term. Definition 3 A function is of product type if it The Holant problem was introduced by Cai, Lu, can be expressed as a product of unary functions, and Xia [3], which also contains a dichotomy of binary equality functions .Œ1; 0; 1/, and binary Holant for symmetric Boolean complex func-  disequality functions .Œ0; 1; 0/, each applied to tions. The notation Holant means that all unary some of its variables. We use P to denote the set functions are assumed to be available. This re- of product-type functions. striction is later weakened to only allow two constant functions that pin a variable to 0 or 1. Definition 4 A function f is called vanishing if This framework is called Holantc.In[5], a di- the value Holant.˝Iff g/ is 0 for every signature chotomy of Holantc is obtained. The need to grid ˝.WeuseV to denote the set of vanishing assume some freely available functions is fi- functions. nally avoided in [10]. In this paper, Huang and For vanishing signatures, we need some more Lu proved a dichotomy for Holant but with the definitions. caveat that the functions must be real weighted. This result was later improved by Cai, Guo, and Definition 5 An arity n symmetric signature of RC Williams [6], who proved a dichotomy for Holant the form f D Œf0;f1;:::;fn is in t for parameterized by any set of symmetric Boolean a nonnegative integer t  0 if t>nor for complex functions. any 0 Ä k Ä n  t, fk;:::;fkCt satisfy the We will give some necessary definitions and recurrence relation then state the dichotomy from [6]. First are ! several tractable families of functions over the t t i fkCt Boolean domain. t ! ! Definition 1 A signature f of arity n is degen- t t C2 C i t1f C C i 0f D 0: erate if there exist unary signatures uj 2 t  1 kCt1 0 k (1 Ä j Ä n) such that f D u1 ˝ ˝ un. (1) 920 Holant Problems

R F R  We define t similarly but with i in place of i 5. All nondegenerate signatures in are in 2 in (1). for some  2fC; g. With R˙, one can define the recurrence degree of t Theorem 1 is about Holant problems parame- a function f . terized by symmetric Boolean complex functions Definition 6 For a nonzero symmetric signature over general graphs. Holant problems are studied f of arity n,itisofpositive (resp. negative) in other settings as well. For planar graphs, [2] recurrence degree t Ä n, denoted by rdC.f / D t contains a dichotomy for Holantc with real sym-  RC (resp. rd .f / D t), if and only if f 2 tC1  metric functions. There are signature sets that are RC R R t (resp. f 2 tC1  t ). If f is the all-zero #P-hard over general graphs but tractable over signature, we define rdC.f / D rd.f / D1. planar graphs. The algorithms for such sets are In [6], it is shown that f 2 V if and only if for due to Valiant’s holographic algorithms and the either  DCor ,wehave2rd .f / < arity.f /. theory of matchgates [1, 12]. Accordingly, we split the set V of vanishing Another generalization looks at a broader signatures in two. range of functions. One may consider asymmetric functions as in [4], which contains a dichotomy  Definition 7 We define V for  2fC; g as for Holant problems defined by asymmetric Boolean complex functions. One can also   V Dff j 2rd .f / < arity.f /g: consider functions of larger domain size. For domain size 3,[7] contains a dichotomy for a To state the dichotomy, we also need the no- single arity 3 symmetric complex function in tion of F-transformable. For a matrix T 2 C22, the Holant setting. For any constant domain and a signature set F, define T F Dfg j9f 2 size, [8] contains a dichotomy for a single arity 3 F of arity n; g D T ˝nf g. Here, we view complex weighted function that satisfies a strong the signatures as column vectors. Let D2 be the symmetry property. equality function of arity 2. One can consider constraint functions with a C C Definition 8 A signature set F 0 is F-transformable range other than . Replacing by some finite F if there exists a non-singular matrix T 2 C22 field p for some prime p defines counting prob- 0 ˝2 lems modulo p. The case p D 2 is called parity such that F Â T F and .D2/T 2 F. Holant problems. It is of special interest because 0 If a set of functions F is F-transformable and F modulo 2 is tractable, 0 is a tractable set, then Holant.F / is tractable as which implies a family of tractable matchgate well. functions even over general graphs. For parity The dichotomy of Holant problems over sym- Holant problems, a complete dichotomy for sym- metric Boolean complex functions is stated as metric functions is obtained by Guo, Lu, and follows. Valiant [9]. Theorem 1 ([6]) Let F be any set of symmetric, complex-valued signatures in Boolean variables. Then, Holant.F/ is #P-hard unless F satisfies Open Problems one of the following conditions, in which case the problem is in P: Unlike the progress in the general graph setting, the strongest known dichotomy results for pla- 1. All nondegenerate signatures in F are of arity nar Holant problems are rather limited. These at most 2; planar dichotomies showed that newly tractable 2. F is A-transformable; problems over planar graphs are captured by 3. F is P-transformable; holographic algorithms with matchgates, but with F V  R  4. Â [ff 2 2 j arity.f / D 2g for restrictions like symmetric functions or regular some  2fC; g; graphs. The theory of holographic algorithms Holographic Algorithms 921 with matchgates can be applied to planar graphs 10. Huang S, Lu P (2012) A dichotomy for real weighted and asymmetric signatures. A true test of its Holant problems. In: CCC, Porto. IEEE Computer power would be to obtain an asymmetric complex Society, pp 96–106 11. Valiant LG (2006) Accidental algorithms. In: FOCS, weighted dichotomy of planar Holant problems. Berkeley. IEEE, pp 509–517 The situation is similarly limited for higher do- 12. Valiant LG (2008) Holographic algorithms. SIAM J main sizes, where things seem considerably more Comput 37(5):1565–1594 complicated. A reasonable first step in this direc- tion would be to consider some restricted (yet still powerful) family of functions. Despite the success for F2, little is known Holographic Algorithms about the complexity of Holant problems over other finite fields or semirings. As Valiant showed Jin-Yi Cai1;2, Pinyan Lu3, and Mingji Xia4 in [11], counting problems modulo some finite 1Beijing University, Beijing, China modulus include some interesting and surprising 2Computer Sciences Department, University of phenomena. It deserves further research. Wisconsin–Madison, Madison, WI, USA 3Microsoft Research Asia, Shanghai, China H 4The State Key Laboratory of Computer Cross-References Science, Chinese Academy of Sciences, Beijing, China  Complexity Dichotomies for Counting Graph Homomorphisms  Holographic Algorithms Keywords

Bases; Counting problems; Holographic algo- Recommended Reading rithms; Perfect matchings; Planar graphs

1. Cai JY, Lu P (2011) Holographic algorithms: from art to science. J Comput Syst Sci 77(1):41–61 Years and Authors of Summarized 2. Cai JY, Lu P, Xia M (2010) Holographic algorithms Original Work with matchgates capture precisely tractable planar #CSP. In: FOCS, Las Vegas. IEEE Computer Society, pp 427–436 2006, 2008; Valiant 3. Cai JY, Lu P, Xia M (2011) Computational complex- 2008, 2009, 2010, 2011; Cai, Lu ity of Holant problems. SIAM J Comput 40(4):1101– 2009; Cai, Choudhary, Lu 1132 2014; Cai, Gorenstein 4. Cai JY, Lu P, Xia M (2011) Dichotomy for Holant* problems of Boolean domain. In: SODA, San Fran- cisco. SIAM, pp 1714–1728 5. Cai JY, Huang S, Lu P (2012) From Holant to #CSP and back: dichotomy for Holantc problems. Algorith- Problem Definition mica 64(3):511–533 6. Cai JY, Guo H, Williams T (2013) A complete di- , introduced by L. Valiant chotomy rises from the capture of vanishing signa- tures (extended abstract). In: STOC, Palo Alto. ACM, [11], is an algorithm design technique rather than pp 635–644 a single algorithm for a particular problem. In 7. Cai JY, Lu P, Xia M (2013) Dichotomy for Holant* essence, these algorithms are reductions to the problems with domain size 3. In: SODA, New Or- FKT algorithm [7–9] to count the number of leans. SIAM, pp 1278–1295 8. Cai JY, Guo H, Williams T (2014) The complexity perfect matchings in a in polyno- of counting edge colorings and a dichotomy for some mial time. Computation in these algorithms is higher domain Holant problems. In: FOCS, Philadel- expressed and interpreted through a choice of lin- phia. IEEE, pp 601–610 ear basis vectors in an exponential “holographic” 9. Guo H, Lu P, Valiant LG (2013) The complexity of symmetric Boolean parity Holant problems. SIAM J mix, and then it is carried out by the FKT method Comput 42(1):324–356 via the Holant Theorem. This methodology has 922 Holographic Algorithms X produced polynomial time algorithms for a va- i1 i2 im Ri1i2:::im b ˝ b ˝ ˝b ; riety of problems ranging from restrictive ver- sions of satisfiability, vertex cover, to other graph where problems such as edge orientation and node/edge deletion. No polynomial time algorithms were R D PerfMatch.G0  Z/; known for these problems, and some minor varia- i1i2:::im tions are known to be NP-hard (or even #P-hard). 0 Let G D .V;E;W/be a weighted undirected and Z is the subset of the input nodes of  hav- planar graph, where V;E, and W are sets of ing the characteristic sequence Z D i1i2 :::im. vertices, edges, and edge weights, respectively. As a contravariant tensor, G transforms as A matchgate is a tuple .G; X/ where X  V follows.P Under a basis transformation ˇj D i is a set of external nodes on the outer face. A i bi tj , matchgate is considered a generator or a recog- X nizer matchgate when the external nodes are con- .G0/j1j2:::jm D Gi1i2:::im tQj1 tQj2 :::tQjm ; i1 i2 im sidered output or input nodes, respectively. They differ mainly in the way they are transformed. where .tQj / is the inverse matrix of .t i /. Similarly, The external nodes are ordered clockwise on the i j R transforms as a covariant tensor, namely, external face.  is called an odd (resp. even) matchgate if it has an odd (resp. even) number X 0 i1 i2 im .R /j j :::j D Ri i :::i t t :::t : of nodes. 1 2 m 1 2 m j1 j2 jm Each matchgate is assigned a signature tensor. A generator  with m output nodes is assigned  A signature is symmetric if each entry only m m depends on the Hamming weight of the index a contravariant tensor G 2 V0 of type 0 , m i i :::i . This notion is invariant under a basis where V0 is the tensor space spanned by the m- 1 2 m fold tensor products of the standard basis b D transformation. A symmetric signature is denoted 1 0 by Œ0;1;:::;m, where i denotes the value Œb ; b  D ; . The tensor G under the 0 1 0 1 of a signature entry whose Hamming weight of standard basis b has the form its index is i. X A matchgrid ˝ D .A;B;C/ is a weighted i1i2:::im G bi1 ˝ bi2 ˝ ˝bim ; planar graph consisting of a disjoint union of: a set of g generators A D .A1;:::;Ag /,asetof where r recognizers B D .B1;:::;Br /, and a set of f connecting edges C D .C1;:::;Cf /, where i1i2:::im G D PerfMatch.G  Z/: each Ci edge has weight 1 and joins an output node of a generator with an input node of a Here Z is the subset of the output nodes of recognizer, so that every input and output node in  having the characteristic sequence Z D every constituent matchgate has exactly one such m Pi1i2 :::iQ m 2f0; 1g , PerfMatch.G  Z/ D incident connectingN edge. g M .i;j /2M wij is a sum over all perfect Let G D iD1 G.Ai / be the tensor product matchings M in the graph G  Z obtained from ofN all the generator signatures, and let R D r G by removing Z and its incident edges, and j D1 R.Bj / be the tensor product of all the wij is the weight of the edge .i; j /. Similarly a recognizer signatures. Then Holant˝ is defined recognizer  0 with underlying graph G0 having m to be the contraction of the two product tensors, 0 input nodes  is assigned a covariant tensor R 2 Vm under some basis ˇ, where the corresponding 0 of type m . This tensor under the standard (dual) indices match up according to the f connecting  basis b has the form edges Ck: Holographic Algorithms 923

X ˚  Holant˝ D hR; Gi D Œ˘1ig G.Ai ;xjAi / Œ˘1j r R.Bj ;x jBj / : (1) ˝f x2ˇ

If we write the covariant tensor R as a row matchgate signatures, if any exists; “NO” if vector of dimension 2f , write the contravariant they are not simultaneously realizable. f tensor G as a column vector of dimension 2 , The theory of matchgates and holographic both indexed by some common ordering of the algorithms provides a systematic understanding connecting edges, then Holant˝ is just the dot of which constraint functions can be realized product of these two vectors. Valiant’s beautiful by matchgates, the structure for the bases, Holant Theorem is as follows: and finally solve the simultaneous realizability Theorem 1 (Valiant) For any matchgrid ˝ over problem. any basis ˇ,letG be its underlying weighted H graph, then Matchgate Identities There is a set of algebraic identities [1, 6] which

Holant˝ D PerfMatch.G/: completely characterizes signatures directly re- alizable without basis transformation by match- The FKT algorithm can compute the perfect gates for any number of inputs and outputs. These polynomial PerfMatch.G/ for a identities are derived from Grassmann-Plücker planar graph in polynomial time. This gives identities for Pfaffians. a polynomial time algorithm to compute Patterns ˛; ˇ are m-bit strings, i.e., ˛; ˇ 2 m Holant˝ . f0; 1g . A position vector P Dfpi g;i 2 Œl is a subsequence of f1;2;:::;mg, i.e., pi 2 Œm and p1

Holographic Algorithms, 1 1 1 Fig. 1 Some matchgates used in #PL-3-NAE-ICE 1 1 2 2

1 1 1 −2 2 − 3 6 − 3

1 1 1 − 3 1 1 2 3 2 2 Holographic Algorithms 925 to label one side as generators and the other side is satisfied. Therefore, Holant˝ is precisely the as recognizers.) From the given planar graph G, number of valid orientations required by #PL-3- we obtain a signature grid ˝, where the underly- NAE-ICE. ing graph G0 is the edge-vertex incidence graph Note that the signature Œ0;1;1;0 is not the of G. By definition, Holant˝ is an exponential signature of any matchgate. A simple reason for sum where each term is a product of appropriate this is that a matchgate signature, being defined in entries of the signatures. Each term is indexed terms of perfect matchings, cannot have nonzero by a 0–1 assignment on all edges of G0;ithas values for inputs of both odd and even Hamming a value of 0 or 1, and it has a value of 1 iff weights. it corresponds to an orientation of G such that However,h underi a holographic transformation at every vertex of G the local NAE constraint 11 using H D 1 1 ,

h i h i h i  ˝3 ˝3 ˝3 H ˝3Œ0;1;1;0D H ˝3 1  1  0 D Œ6; 0; 2;0; 1 0 1 H h i h i h i  ˝2 ˝2 ˝2 ˝2 ˝2 1 1 0 H Œ0;1;0D H 1  0  1 D Œ2; 0; 2;

and This is a variant of 3SAT. A Boolean for- mula is planar if it can be represented by a 1 Œ0;1;0.H1/˝2 D Œ1; 0; 1: planar graph where vertices represent variables 2 and clauses, and there is an edge iff the vari- These signatures are all realizable as matchgate able or its negation appears in that clause. The signatures by verifying all the matchgate identi- SAT problem is when the gate for each clause ties. More concretely, we can exhibit the requisite is the Boolean OR. When SAT is restricted to three matchgates in Fig. 1. planar formulae, it is still NP-complete, and its Hence, #PL-3-NAE-ICE is precisely the fol- corresponding counting problem is #P-complete. lowing Holant problem on planar graphs: Moreover, for many connectives other than NAE (e.g., EXACTLY ONE), the unrestricted or the Holant.Œ0; 1; 0 j Œ0;1;0;Œ0;1;1;0/ planar decision problems are still NP-complete, and the corresponding counting problems are #P- 1 ÁT Holant. 2 Œ1; 0; 1 j Œ2; 0; 2; Œ6; 0; 2;0/: complete. We design a signature grid as follows: To each 1 Now we may replace each signature 2 Œ1; 0; 1, NAE clause, we assign a generator with signature Œ2; 0; 2, and Œ6; 0; 2;0 in ˝ by their corre- Œ0;1;1;0. To each Boolean variable, we assign sponding matchgates, and then we can compute a generator with signature .Dk/ where k is the Holant˝ in polynomial time by Kasteleyn’s algo- number of clauses the variable appears, either rithm. negated or unnegated. Further, if a variable oc- The next problem is a satisfiability problem. currence is negated, we have a recognizer Œ0;1;0 #PL-3-NAE-SAT along the edge that joins the variable generator INPUT: A planar formula ˚ consisting of a con- and the NAE generator, and if the variable oc- junction of NAE clauses each of size 3. currence is unnegated, then we use a recognizer OUTPUT: The number of satisfying assignments Œ1;0;1instead. Under a holographic transforma- of ˚. tion using H , .Dk/ is transformed to 926 Hospitals/Residents Problem

h i h i  h i h i ˝k ˝k ˝k ˝k ˝k 1 0 1 1 Synonyms H 0 C 1 D 1 C 1 College admissions problem; Stable admissions D 2Œ1; 0; 1; 0; : : :: problem; Stable assignment problem; Stable b- matching problem; University admissions prob- It can be verified that all the signatures used lem satisfy all matchgate identities and thus can be realized by matchgates under the holographic transformation. Years and Authors of Summarized Original Work Recommended Reading 1962; Gale, Shapley 1. Cai JY, Gorenstein A (2014) Matchgates revisited. Theory Comput 10(7):167–197 2. Cai JY, Lu P (2008) Basis collapse in holographic algorithms. Comput Complex 17(2):254–281 Problem Definition 3. Cai JY, Lu P (2009) Holographic algorithms: the power of dimensionality resolved. Theor Comput Sci Comput Sci 410(18):1618–1628 An instance I of the Hospitals/Residents problem 4. Cai JY, Lu P (2010) On symmetric signatures in holo- (HR) [6, 7, 18]involvesasetR Dfr1;:::;rng graphic algorithms. Theory Comput Syst 46(3):398– of residents and a set H Dfh1;:::;hmg of 415 hospitals. Each hospital hj 2 H has a posi- 5. Cai JY, Lu P (2011) Holographic algorithms: from art to science. J Comput Syst Sci 77(1):41–61 tive integral capacity, denoted by cj . Also, each 6. Cai JY, Choudhary V, Lu P (2009) On the theory resident ri 2 R has a preference list in which of matchgate computations. Theory Comput Syst he ranks in strict order a subset of H . A pair 45(1):108–132 .ri ;hj / 2 R H is said to be acceptable if 7. Kasteleyn PW (1961) The statistics of dimers on a lattice. Physica 27:1209–1225 hj appears in ri ’s preference list; in this case 8. Kasteleyn PW (1967) Graph theory and crystal ri is said to find hj acceptable. Similarly each physics. In: Harary F (ed) Graph theory and theoreti- hospital hj 2 H has a preference list in which cal physics. Academic, London, pp 43–110 it ranks in strict order those residents who find 9. Temperley HNV, Fisher ME (1961) Dimer problem in statistical mechanics – an exact result. Philos Mag hj acceptable. Given any three agents x;y;´ 2 6:1061–1063 R [ H , x is said to prefer y to ´ if x finds each 10. Valiant LG (2006) Accidental algorthims. In: FOCS of y and ´ acceptable, andP y precedes ´ on x’s ’06: proceedings of the 47th annual IEEE symposium preference list. Let C D cj . on foundations of computer science. IEEE Computer hj 2H Society, Washington, pp 509–517. doi:http://dx.doi. Let A denote the set of acceptable pairs in I , org/10.1109/FOCS.2006.7 and let L DjAj.Anassignment M is a subset 11. Valiant LG (2008) Holographic algorithms. SIAM of A.If.ri ;hj / 2 M , ri is said to be assigned J Comput 37(5):1565–1594. doi:http://dx.doi.org/10. to h , and h is assigned r . For each q 2 R [ 1137/070682575 j j i H , the set of assignees of q in M is denoted by M.q/.Ifri 2 R and M.ri / D;, ri is said to be unassigned; otherwise ri is assigned. Similarly, any hospital hj 2 H is under-subscribed, full, Hospitals/Residents Problem or over-subscribed according as jM.hj /j is less than, equal to, or greater than cj , respectively. David F. Manlove A matching M is an assignment such that School of Computing Science, University of jM.ri /jÄ1 for each ri 2 R and jM.hj /jÄcj Glasgow, Glasgow, UK for each hj 2 H (i.e., no resident is assigned to an unacceptable hospital, each resident is as- Keywords signed to at most one hospital, and no hospital is over-subscribed). For notational convenience, Matching; Stability given a matching M and a resident ri 2 R such Hospitals/Residents Problem 927 that M.ri / ¤;, where there is no ambiguity, the the pair .rl ;hj /, which comprises deleting hj notation M.ri / is also used to refer to the single from rl ’s preference list and vice versa. member of M.ri /. Given that the above algorithm involves resi- A pair .ri ;hj / 2 AnM blocks a matching dents applying to hospitals, it has become known M or is a blocking pair for M , if the following as the Resident-oriented Gale/Shapley algorithm, conditions are satisfied relative to M : or RGS algorithm for short [7, Section 1.6.3]. The RGS algorithm terminates with a stable match- 1. ri is unassigned or prefers hj to M.ri /; ing, given an instance of HR [6][7, Theorem 2. hj is under-subscribed or prefers ri to at least 1.6.2]. Using a suitable choice of data structures one member of M.hj / (or both). (extending those described in [7, Section 1.2.3]), the RGS algorithm can be implemented to run in A matching M is said to be stable if it admits O.L/ time. This algorithm produces the unique no blocking pair. Given an instance I of HR, the stable matching that is simultaneously best possi- problem is to find a stable matching in I . ble for all residents [6][7, Theorem 1.6.2]. These observations may be summarized as follows: H Theorem 1 Given an instance of HR, the RGS Key Results algorithm constructs, in O.L/ time, the unique stable matching in which each assigned resident HR was first defined by Gale and Shapley [6] obtains the best hospital that he could obtain under the name “College Admissions Problem.” in any stable matching, while each unassigned In their seminal paper, the authors’ primary resident is unassigned in every stable matching. consideration is the classical Stable Marriage problem (SM; see Entries  Stable Marriage and A counterpart of the RGS algorithm, known as  Optimal Stable Marriage), which is a special the Hospital-oriented Gale/Shapley algorithm,or case of HR in which n D m, A D R H , HGS algorithm for short [7, Section 1.6.2], gives and cj D 1 for all hj 2 H – in this case, the unique stable matching that similarly satisfies the residents and hospitals are more commonly an optimality property for the hospitals [7, Theo- referred to as the men and women, respectively. rem 1.6.1]. Gale and Shapley showed that every instance Although there may be many stable matchings I of HR admits at least one stable matching. for a given instance I of HR, some key structural Their proof of this result is constructive, i.e., an properties hold regarding unassigned residents algorithm for finding a stable matching in I is and under-subscribed hospitals with respect to all described. This algorithm has become known as stable matchings in I , as follows. the Gale/Shapley algorithm. Theorem 2 For a given instance of HR: An extended version of the Gale/Shapley algorithm for HR is shown in Fig. 1.The • The same residents are assigned in all stable algorithm involves a sequence of apply and delete matchings; operations. At each iteration of the while loop, • Each hospital is assigned the same number of some unassigned resident ri with a nonempty residents in all stable matchings; preference list applies to the first hospital hj • Any hospital that is under-subscribed in one on his list and becomes provisionally assigned stable matching is assigned exactly the same to hj (this assignment could subsequently be set of residents in all stable matchings. broken). If hj becomes over-subscribed as a result of this assignment, then hj rejects its worst assigned resident rk. Next, if hj is full These results are collectively known as the “Rural (irrespective of whether hj was over-subscribed Hospitals Theorem” (see [7, Section 1.6.4] for earlier in the same loop iteration), then for each further details). Furthermore, the set of stable resident rl that hj finds less desirable than its matchings in I forms a distributive lattice under worst assigned resident rk, the algorithm deletes a natural dominance relation [7, Section 1.6.5]. 928 Hospitals/Residents Problem

M WD ;; while (some resident ri is unassigned and ri has a nonempty list) { hj := first hospital on ri ’s list; /* ri applies to hj */ M WD M [f.ri ;hj /g; if (hj is over-subscribed) { rk := worst resident in M.hj / according to hj ’s list; M WD M nf.rk ;hj /g; } if (hj is full) { rk := worst resident in M.hj / according to hj ’s list; for (each successor rl of rk on hj ’s list) delete the pair .rl ;hj /; } }

Hospitals/Residents Problem, Fig. 1 Gale/Shapley algorithm for HR

Applications of algorithms for HR in practical settings such as junior doctor allocation as noted above. Practical applications of HR are widespread, most notably arising in the context of centralized automated matching schemes that assign Extensions of HR applicants to posts (e.g., medical students to hospitals, school leavers to universities, and One key extension of HR that has considerable primary school pupils to secondary schools). practical importance arises when an instance may Perhaps the largest and best-known example involve a set of couples, each of which submits of such a scheme is the National Resident a joint preference list over pairs of hospitals Matching Program (NRMP) in the USA [8], (typically in order that the members of the cou- which annually assigns around 31,000 graduating ple can be located geographically close to one medical students (known as residents) to their another). The extension of HR in which couples first hospital posts, taking into account the may be involved is denoted by HRC; the stability preferences of residents over hospitals and vice definition in HRC is a natural extension of that in versa and the hospital capacities. Counterparts of HR (see [15, Section 5.3] for a formal definition the NRMP are in existence in other countries, of HRC). It is known that an instance of HRC including Canada [9] and Japan [10]. These need not admit a stable matching (see [4]). More- matching schemes essentially employ extensions over, the problem of deciding whether an HRC of the RGS algorithm for HR. instance admits a stable matching is NP-complete Centralized matching schemes based largely [17]. on HR also occur in other practical contexts, such HR may be regarded as a many-one general- as school placement in New York [1], university ization of SM. A further generalization of SM faculty recruitment in France [3], and university is to a many-many stable matching problem, admission in Spain [16]. Further applications are in which both residents and hospitals may be described in [15, Section 1.3.7]. multiply assigned subject to capacity constraints. Indeed, the Nobel Prize in Economic Sci- In this case, residents and hospitals are more ences was awarded in 2012 to Alvin Roth and commonly referred to as workers and firms,re- Lloyd Shapley, partly for their theoretical work spectively. There are two basic variations of the on HR and its variants [6, 18] and partly for many-many stable matching problem according their contribution to the widespread deployment to whether workers rank (i) individual acceptable Hospitals/Residents Problem 929

firms in order of preference and vice versa or (ii) acquainted with one another and therefore more acceptable subsets of firms in order of preference likely to block a matching in practice. and vice versa. Previous work relating to both models is surveyed in [15, Section 5.4]. Other variants of HR may be obtained if pref- Open Problems erence lists include ties. This extension is again important from a practical perspective, since it As noted in Section “Applications,” ties in the may be unrealistic to expect a popular hospital to hospitals’ preference lists may arise naturally in rank a large number of applicants in strict order, practical applications. In an HRT instance, weak particularly if it is indifferent among groups of stability is the most commonly-studied stability applicants. The extension of HR in which pref- criterion, due to the guaranteed existence of such erence lists may include ties is denoted by HRT. a matching. Attempting to match as many resi- In this context three natural stability definitions dents as possible motivates the search for large arise, the so-called weak stability, strong stability, weakly stable matchings. Several approximation and super-stability (see [15, Section 1.3.5] for algorithms for finding a maximum cardinality formal definitions of these concepts). Given an weakly stable matching have been formulated H instance I of HRT, it is known that weakly (see  Stable Marriage with Ties and Incomplete stable matchings may have different sizes, and Lists and [15, Section 3.2.6] for further details). the problem of finding a maximum cardinality It remains open to find tighter upper and lower weakly stable matching is NP-hard (see entry bounds for the approximability of this problem.  Stable Marriage with Ties and Incomplete Lists for further details). On the other hand, in contrast to the case for weak stability, a super-stable URL to Code matching in I need not exist, though there is an O.L/ algorithm to find such a matching if one Ada implementations of the RGS and HGS does [11]. Analogous results hold in the case of algorithms for HR may be found via the 2 strong stability – in this case, an O.L / algo- following URL: http://www.dcs.gla.ac.uk/ rithm [13] was improved by an O.CL/ algorithm research/algorithms/stable. [14] and extended to the many-many case [5]. Furthermore, counterparts of the Rural Hospitals Theorem hold for HRT under each of the super- Cross-References stability and strong stability criteria [11, 19]. A further generalization of HR arises when  Optimal Stable Marriage each hospital may be split into several depart-  Ranked Matching ments, where each department has a capacity,  Stable Marriage and residents rank individual departments in or-  Stable Marriage and Discrete Convex Analysis der of preference. This variant is modeled by  Stable Marriage with Ties and Incomplete Lists the Student-Project Allocation problem [15, Sec-  Stable Partition Problem tion 5.5]. Finally, the Hospitals/Residents prob- lem under Social Stability [2] is an extension of HR in which an instance is augmented by Recommended Reading a social network graph G (a bipartite graph whose vertices correspond to residents and hos- 1. Abdulkadirogluˇ A, Pathak PA, Roth AE (2005) The pitals and whose edges form a subset of A) such New York city high school match. Am Econ Rev that a blocking pair must additionally satisfy the 95(2):364–367 2. Askalidis G, Immorlica N, Kwanashie A, Manlove property that it forms an edge of G. Edges in DF, Pountourakis E (2013) Socially stable matchings G correspond to resident–hospital pairs that are in the Hospitals/Residents problem. In: Proceedings 930 Hospitals/Residents Problems with Quota Lower Bounds

of the 13th Algorithms and Data Structures Sympo- 19. Scott S (2005) A study of stable marriage problems sium (WADS’13), London, Canada. Lecture Notes in with ties. PhD thesis, Department of Computing Sci- Computer Science, vol 8037. Springer, pp 85–96 ence, University of Glasgow 3. Baïou M, Balinski M (2004) Student admissions and faculty recruitment. Theor Comput Sci 322(2):245– 265 4. Biró P, Klijn F (2013) Matching with couples: a mul- tidisciplinary survey. Int Game Theory Rev 15(2):Ar- Hospitals/Residents Problems with ticle number 1340008 5. Chen N, Ghosh A (2010) Strongly stable assignment. Quota Lower Bounds In: Proceedings of the 18th annual European Sympo- sium on Algorithms (ESA’10), Liverpool, UK. Lec- Huang Chien-Chung ture Notes in Computer Science, vol 6347. Springer, Chalmers University of Technology and pp 147–158 6. Gale D, Shapley LS (1962) College admissions and University of Gothenburg, Gothenburg, Sweden the stability of marriage. Am Math Mon 69:9–15 7. Gusfield D, Irving RW (1989) The Stable Marriage Problem: Structure and Algorithms. MIT Press, Cam- Keywords bridge, USA 8. http://www.nrmp.org (National Resident Matching Hospitals/Residents problem; Lower quotas; Sta- Program website) ble matching 9. http://www.carms.ca (Canadian Resident Matching Service website) 10. http://www.jrmp.jp (Japan Resident Matching Pro- Years and Authors of Summarized gram website) 11. Irving RW, Manlove DF, Scott S (2000) The Hos- Original Work pitals/Residents problem with Ties. In: Proceed- ings of the 7th Scandinavian Workshop on Algo- 2010; Biró, Fleiner, Irving, Manlove; rithm Theory (SWAT’00), Bergen, Norway. Lecture 2010; Huang; Notes in Computer Science, vol 1851. Springer, pp 259–271 2011; Hamada, Iwama, Miyazaki; 12. Irving RW, Manlove DF, Scott S (2002) Strong 2012; Fleiner, Kamiyama stability in the Hospitals/Residents problem. Tech- nical report TR-2002-123, Department of Comput- ing Science, University of Glasgow. Revised May 2005 Problem Definition 13. Irving RW, Manlove DF, Sott S (2003) Strong sta- bility in the Hospitals/Residents problem. In: Pro- ceedings of the 20th Annual Symposium on The- The Hospitals/Residents (HR) problem is the oretical Aspects of Computer Science (STACS’03), many-to-one version of the stable marriage prob- Berlin, Germany. Lecture Notes in Computer Sci- lem introduced by Gale and Shapley. In this prob- ence, vol 2607. Springer, pp 439–450. Full version R H available as [12] lem, a bipartite graph G D . [ ;E/is given. 14. Kavitha T, Mehlhorn K, Michail D, Paluch KE (2007) Each vertex in H represents a hospital and each Strongly stable matchings in time O.nm/ and exten- vertex in R a resident. Each vertex has a prefer- sion to the Hospitals-Residents problem. ACM Trans ence over its neighboring vertices. Each hospital Algorithms 3(2):Article number 15 15. Manlove DF (2013) Algorithmics of matching under h has an upper quota u.h/ specifying the maxi- preferences. World Scientific, Hackensack, Singapore mum number of residents it can take in a match- 16. Romero-Medina A (1998) Implementation of stable ing. The goal is to find a stable matching while solutions in a restricted matching market. Rev Econ respecting the upper quotas of the hospitals. Des 3(2):137–147 17. Ronn E (1990) NP-complete stable matching prob- The original HR has been well studied in the lems. J Algorithms 11:285–304 past decades. A recent trend is to assume that 18. Roth AE, Sotomayor MAO (1990) Two-sided match- each hospital h also comes with a lower quota ing: a study in game-theoretic modeling and anal- l.h/. In this context, it is required (if possible) ysis. Econometric society monographs, vol 18. Cambridge University Press, Cambridge/New York, that a matching satisfies both the upper and the USA lower quotas of each hospital. The introduction Hospitals/Residents Problems with Quota Lower Bounds 931 of such lower quotas is to enforce some policy 2. There is no closed hospital h and a set R Â R in hiring or to make the outcome more fair. It of residents so that (i) jRjjl.h/j, (ii) for is well-known that hospitals in some rural areas each r 2 R, .r; h/ 2 EnM , and (iii) each suffer from the shortage of doctors. resident r 2 R is either unassigned or prefers With the lower quotas, the definition of sta- h to his assigned hospital M.r/. bility in HR and the objective of the problem depend on the applications. Below we summarize With the above definition of stability, we refer three variants that have been considered in the to the question of the existence of a stable match- literature. ing as HR woCH.

Minimizing the Number of Blocking Pairs Classified HR In this variant, a matching M is feasible if, for Motivated by the practice in academic hiring, each hospital h, l.h/ ÄjM.h/jÄu.h/.Given Huang introduced a more generalized variant of a feasible matching, a resident r and a hospital HR. In this variant, a hospital h has a classifica- h form a blocking pair if the following condition tion hC over its neighboring residents. Each class H holds. (i) .r; h/ 2 EnM , (ii) r is unassigned in C 2 hC comes with a upper quota u.C / and a M or r prefers h to his assignment M.r/, and lower quota l.C/. A matching M is feasible if, (iii) jM.h/j < u.h/ or h prefers r to one of its for each hospital h and for each of its classes assigned residents. A matching is stable if the c 2 hC, l.C/ ÄjM.h/jÄu.C /. A feasible number of blocking pairs is 0. It is straightfor- matching M is stable if the following condition ward to check whether a stable matching exists. holds: there is no hospital h such that We assume that the given instance has no stable matching and the objective is to find a matching 1. There exists a resident r so that .r; h/ 2 EnM , with the minimum number of blocking pairs. We and r is either unassigned in M or r prefers h call this problem Min-BP HR. An alternative to his assignment M.r/; objective is to minimize the number of residents 2. For every class C 2 hC, l.C/ ÄjM.h/ [ that are part of a blocking pair in a matching. We frgj Ä u.C /, or there exists another resident call this problem Min-BR HR. r0 2 M.h/ so that h prefers r to r0 and for every class C 2 hC, l.C/ ÄjM.h/ [ HR with the Option of Closing a Hospital frgnfr0gj Ä u.C /. The following variation of HR is motivated by the higher education system in Hungary. Instead With the above definition of stability, we refer of requiring all hospitals to have enough residents to the question of the existence of a stable match- to meet their lower quotas, it is allowed that a ing as CHR. hospital be closed as long as there is not too much demand for it. Key Results Precisely, in this variant, a matching M is fea- sible if, for each hospital h, jM.h/jD0 or l.h/ Ä For the first variant where the objective is to jM.h/jÄu.h/. In the former case, a hospital minimize the number of blocking pairs, Hamada is closed; in the latter case, a hospital is opened. et al. showed the following tight results. Given a feasible matching M ,itisstableif Theorem 1 ([3]) For any positive constant > 1. There is no opened hospital h and resident r 0, there is no polynomial-time .jRjCjHj/1- so that (i) .h; r/ 2 EnM , (ii) r is unassigned approximation algorithm for Min-BP HR unless in M or r prefers h to his assignment M.r/, P=NP. This holds true even if the given bipartite and (iii) jM.h/j < u.h/ or h prefers r to one graph is complete and all upper quotas are 1 and of its assigned residents; all lower quotas are 0 or 1. 932 Hub Labeling (2-Hop Labeling)

Theorem 2 ([3]) There is a polynomial-time 4. Huang C-C (2010) Classified stable matching. In: .jRjCjHj/-approximation algorithm for Min- 21st annual ACM-SIAM symposium on discrete algo- BP HR. rithms, Austin, pp 1235–1253 In the case that the objective is to minimize the number of residents involved in blocking pairs, Hub Labeling (2-Hop Labeling) Hamada et al. showed the following. Theorem 3 ([3]) Min-BR HR is NP-hard. This Daniel Delling1, Andrew V. Goldberg2, and holds true even if the given bipartite graph is Renato F. Werneck3 complete and all hospitals have the same prefer- 1Microsoft, Silicon Valley, CA, USA ence over the residents. 2Microsoft Research – Silicon Valley, Mountain View, CA, USA Theorem 4 ([3]) There is a polynomial-time p 3Microsoft Research Silicon Valley, La Avenida, jRj-approximation algorithm for Min-BR HR. CA, USA For the second variant, where a hospital is allowed to be closed, Biró et al. showed the Keywords following. Distance oracles; Labeling algorithms; Shortest Theorem 5 ([1]) The problem HR woCH is NP- paths complete. This holds true even if all upper quotas are at most 3. Years and Authors of Summarized For the last variant where each hospital is Original Work allowed to classify the neighboring residents and sets the upper and lower quotas for each of its 2003; Cohen, Halperin, Kaplan, Zwick classes, Huang showed that if all classifications 2012; Abraham, Delling, Goldberg, Werneck of the hospitals are laminar families, the problem 2013; Akiba, Iwata, Yoshida is in P. Fleiner and Kamiyama later proved the 2014; Delling, Goldberg, Pajor, Werneck same result by a significantly simpler matroid- 2014; Delling, Goldberg, Savchenko, Werneck based technique. Theorem 6 ([2,4]) In CHR, if all classifications of the hospitals are laminar families, then one Problem Definition find a stable matching or detect its absence in the given instance in O.nm/ time, where n D Given a directed graph G D .V; A/ (with n D jR [ Hj and m DjEj. jV j and m DjAj) with a length function ` W A ! RC and a pair of vertices s; t,adistance oracle returns the distance dist.s; t/ from s to t.Alabel- Recommended Reading ing algorithm [18] implements distance oracles in two stages. The preprocessing stage computes 1. Biro P, Fleiner T, Irving RW, Manlove DF (2010) The a label for each vertex of the input graph. Then, College Admissions problem with lower and common given s and t,thequery stage computes dist.s; t/ quotas. Theor Comput Sci 411(34–36):3136–3153 using only the labels of s and t; the query does 2. Fleiner T, Kamiyama N (2012) A matroid approach to stable matchings with lower quotas In: 23nd annual not explicitly use G and `. ACM-SIAM symposium on discrete algorithms, Ky- Hub labeling (HL) (or 2-hop labeling) is a oto, pp 135–142 special kind of labeling algorithm. The label 3. Hamada K, Iwama K, Miyazaki S (2011) The Hos- L.v/ of a vertex v consists of two parts: the pitals/Residents problem with quota lower bounds. In: 19th annual European symposium on algorithms, forward label Lf .v/ is a collection of vertices w Saarbrücken, vol 411(34–36), pp 180–191 with their distances dist.v; w/ from v, while the Hub Labeling (2-Hop Labeling) 933

tex orderings and present efficient algorithms for computing the minimal HHL for a given ordering, as well as heuristics for finding vertex s t orderings that lead to small labels. In particular, the RXL algorithm uses sampling to efficiently approximate a greedy vertex order, leading to em- pirically small labels. RXL can handle large prob- Hub Labeling (2-Hop Labeling), Fig. 1 Example of a hub labeling. The hubs of s are circles; the hubs of t are lems from several application domains. We then crosses (Taken from [3]) discuss representations of hub labels that allow various trade-offs between space and query time. backward label Lb.v/ is a collection of vertices u with their distances dist.u;v/to v. (If the graph is General Hub Labelings undirected, a single label per vertex suffices.) The The time and space efficiency of the distance ora- vertices in v’s label are the hubs of v. The labels cles we discuss depend on the label size. If labels must obey the cover property: for any two ver- are big, HL is impractical. Gavoille et al. [15] H tices s and t,thesetL .s/ \ L .t/ must contain show that there exist graphs for which general la- f b Q 2 at least one hub that is on the shortest s  t path. belings must have size .n /. For planar graphs, Q 4=3 Q 3=2 Given the labels, HL queries are straightforward: they give an ˝.n / lower and O.n / up- per bound. They also show that graphs with k- to find dist.s; t/, simply find the hub x 2 Lf .s/\ separators have hub labelings of size O.nk/Q . Lb.t/ that minimizes dist.s; x/ C dist.x; t/ (see Fig. 1 for an example). If the hubs in each label Abraham et al. [1] show that graphs with small are sorted by ID, queries consist of a simple linear highway dimension (which they conjecture in- sweep over the labels, as in mergesort. clude road networks) have small hub labelings. The size of a forward (backward) label, Given a particular graph, computing a labeling with the smallest size is NP-hard. Cohen et al. [9] jLf .v/j (jLb.v/j), is the number of hubs it con- tains. The size of a labeling L is the sum of the developed an O.log n/-approximation algorithm for the problem. Next we discuss this general HL average label sizes, .Lf .v/ C Lb.v//=2, over all vertices. The memory footprint of the algorithm (GHL) algorithm. is proportional to the size of the labeling, while A partial labeling is a labeling that does not query times are determined by the maximum necessarily satisfy the cover property. Given a label size. Queries themselves are trivial; the partial labeling L D .Lf ;Lb/, we say that a hard part is an efficient implementation of a vertex pair Œu; w is covered if Lf .u/ \ Lb.w/ preprocessing algorithm that, given G and `, contains a vertex on a shortest path from u to computes a small hub labeling. w and uncovered otherwise. GHL maintains a partial labeling L (initially empty) and the cor- responding set U of uncovered vertex pairs. Each Key Results iteration of the algorithm selects a vertex v and two subsets X 0;Y0  V , adds .v; dist.x; v// to 0 We describe an approximation algorithm for find- Lf .x/ for all x 2 X , and adds .y; dist.v; y// 0 ing labelings of size within O.log n/ of the op- to Lb.y/ for all y 2 Y . Then, GHL deletes timal [9], as well as its generalization to other from U the set U.v;X0;Y0/ of vertex pairs that objectives, including the maximum label size [6]. become covered by this augmentation. Among all Although polynomial, these approximation al- v 2 V and X 0;Y0  V , the triple .v; X 0;Y0/ gorithms do not scale to large networks. For picked in each iteration is one that maximizes more practical alternatives, we discuss hierar- jU.v;X0;Y0/j=.jX 0jCjY 0j/, i.e., the ratio of the chical hub labelings (HHLS), a subclass of HL. number of paths covered over the increase in label We show that HHLs are closely related to ver- size. 934 Hub Labeling (2-Hop Labeling)

Cohen et al.’s efficient implementation of value of G is at most =˛. Lazy evaluation was GHL uses the notion of center graphs.Givena introduced by Cohen et al. [9] to speed up their set U of vertex pairs and a vertex v, the center implementation of GHL and refined by Stengel graph Gv D .X;Y;Av/ is a bipartite graph with et al. [20]. It is based on the observation that the X D Y D V such that an arc .u; w/ 2 Av MDS value of a center graph does not increase as if Œu; w 2 U and some shortest path from the algorithm adds vertices to labels and removes u to w in G go through v.IfU is the set of arcs from center graphs. uncovered vertex pairs, then, for a fixed vertex v, The eager-lazy algorithm maintains upper 0 0 0 0 maximizing jU.v;X ;Y /j=.jX jCjY j/ over bounds on the center subgraph densities v all X 0;Y0  V is (by definition) the same computed in previous iterations. These values as finding the vertex induced-subgraph of Gv are computed during initialization and updated in with maximum density (defined as its number a lazy fashion as follows. In each iteration, the of arcs divided by its number of vertices). This algorithm picks the maximum v and applies maximum density subgraph (MDS) problem can ˛-eager evaluation to Gv. If the evaluation be solved in polynomial time using parametric succeeds, the labels are updated. Regardless of flows (see e.g., [14]). To maximize the ratio whether the evaluation succeeds or not, v=˛ is 0 0 over all triples .v; X ;Y /, GHL solves an a valid upper bound on the density of Gv at the MDS problem for center graphs Gv and picks end of the iteration. This can be used to show that the densest of the n resulting subgraphs. It each vertex is selected by O.nlog n/ iterations, then adds the corresponding vertex v to the each taking O.n2/ time. labels of the vertices given by the sides of the Babenko et al. [6] generalize the definition of MDS. Arcs corresponding to newly covered a labeling size as follows. Suppose vertex IDs are pairs are removed from center graphs between 1;2;:::;n. Define a .2n/-dimensional vector L L L iterations. by 2i1 DjLf .i/j and 2i DjPLb.i/j.Thep- L L 2n1 Lp 1=p Cohen et al. show that GHL is a special case norm of is defined as k kp D . iD0 i / , of the greedy set cover algorithm [8] and thus where p is a natural number and kLk1 D gives an O.log n/-optimal labeling. They also max Li . Note that kLk1=2 is the total size show that the same guarantee holds if one uses of the labeling and kLk1 is the maximum a constant-factor approximation to the MDS. We label size. Babenko et al. [6] generalize the refer to a k approximation of MDS as a k- algorithm of Cohen et al. to obtain an O.log n/- AMDS. Using a linear-time 2-AMDS algorithm approximation algorithm for this more general by Kortsarz and Peleg [17], each GHL iteration is problem in O.n5/ time. Delling et al. [11]show dominated by n AMDS computations on graphs that the eager-lazy approach yields an O.log n/- with O.n2/ arcs. Since each iteration increases approximation algorithm running in time the size of the labeling, the number of iterations O.n3 log n min.p; log n//. is at most O.n2/. The total running time of GHL is thus O.n5/. Hierarchical Hub Labelings Delling et al. [11] improve the time bound Even with the performance improvements men- for GHL to O.n3 log n/ using eager and lazy tioned above, GHL requires too much time and evaluation. Intuitively, eager evaluation finds an space to work on large networks. To overcome AMDS G0 of G such that deleting G0 reduces this problem, one may use heuristics that have the MDS value of G by a constant factor. More no known theoretical guarantees on the label size precisely, given a graph G, an upper bound  but produce small labels for large instances from on the MDS value of G and a parameter ˛> a wide variety of domains. The most successful 1, ˛-eager evaluation attempts to find a .2˛/- current heuristics use a restricted class of label- AMDS G0 of G such that the MDS value of G ings called hierarchical hub labeling (HHL) [4]. with the arcs of G0 deleted is at most =˛.If Hierarchical labels have the cover property and the evaluation fails to find such G0,theMDS implement exact distance oracles. Hub Labeling (2-Hop Labeling) 935

Given a labeling, let v . w if w is a hub labels, PL also processes vertices from the most of L.v/.HLishierarchical if . is a partial order. to least important, with the iteration that pro- (Intuitively, v . w if w is “more important” cesses vertex v, adding v to all relevant labels. than v.) We say that an HHL respects agiven The crucial observation is that, when processing (total) order on the vertices if the partial order v, one only needs to look at uncovered pairs . induced by the HHL is consistent with the containing v itself; if Œu;v is not covered, PL order. adds v to Lf .u/;ifŒv; w is not covered, it adds v Consider an order defined by a permutation to Lb.w/. This is enough because of the subpath rank, with rank.v/ < rank.w/ if v appears optimality property of the shortest paths. before (is less important than) w.Thecanonical To process v efficiently, PL runs two pruned labeling L for rank is defined as follows [4]. Dijkstra searches [13]fromv. The first search Vertex v belongs to Lf .u/ if and only if there works on the forward graph (out of v)as exists w such that v is the highest-ranked vertex follows. Before scanning a vertex w (with that hits Œu; w. Similarly, v belongs to Lb.w/ distance label d.w/ within the Dijkstra search), if and only if there exists u such that v is the it computes a v–w distance estimate q by highest-ranked vertex that hits Œu; w. performing an HL query with the current H Abraham et al. [4] prove that the canonical partial labels. (If the labels do not intersect, labeling for a given vertex order rank is the set q D1.) If q Ä d.w/,theŒv; w pair is minimum-sized labeling that respects rank.This already covered by previous hubs, so PL prunes suggests a two-stage approach for finding a small the search (ignores w). Otherwise (if q>d.w/), hierarchical hub labeling: first, find a “good” PL adds .v; dist.v; w// to Lb.w/ and scans w vertex order, and then compute its corresponding as usual. The second Dijkstra search uses canonical labeling. We first discuss the latter step the reverse graph and is pruned similarly; it and then the former. adds .v; dist.w;v// to Lf .w/ for all scanned vertices w. Note that the number of Dijkstra From Orderings to Labelings scans equals the size of the labeling. Since We first consider how, given an order rank, one each visited vertex requires an HL query can compute the canonical hierarchical labeling using partial labels, the running time can L that respects rank. be quadratic in the average label size. It The straightforward way is to just apply the is easy to see that PL produces canonical definition: for every pair Œu; w of vertices, find labelings. the maximum-ranked vertex on any shortest u– The final algorithm we discuss, due to Abra- w path, and then add it to Lf .u/ and Lb.w/. ham et al. [4], computes a hierarchical hub la- Although polynomial, this algorithm is too slow beling from a vertex ordering recursively. Its in practice. basic building block is the shortcut operation (see A faster (but still natural) algorithm is as e.g., [16]). To shortcut a vertex v, the operation follows [4]. Start with an empty (partial) labeling deletes v from the graph and adds arcs to ensure L, and process vertices from the most to least that the distances between the remaining vertices important. When processing v, for every uncov- remain unchanged. For every pair consisting of ered pair Œu; w that v covers, add v to Lf .u/ an incoming arc .u;v/and an outgoing arc .v; w/, and Lb.w/. (In other words, add v to the labels the algorithm checks if .u;v/ .v; w/ is the only of all end points of arcs in the center graph Gv.) shortest u–w path (by running a partial Dijkstra Abraham et al. [4] show how to implement this in search from u or w) and, if so, adds a new arc O.mnlog n/ time and .n2/ space, which is still .u; w/ with length `.u; w/ D `.u;v/C `.v; w/. impractical for large instances. The recursive algorithm computes one label at When labels are not too large, a much more a time, from the bottom up (from the least to the efficient solution is the pruned labeling (PL) most important vertex). It starts by shortcutting algorithm by Akiba et al. [5]. Starting from empty the least important vertex v from G to get a graph 936 Hub Labeling (2-Hop Labeling)

G0 (same as G, but without v and its incident arcs is picked as the next hub, we restore this invariant and with the added shortcuts). It then recursively for the remaining paths by removing all of v’s finds a labeling for G0, which gives correct dis- descendants (including v itself) from all trees. tances (in G) for all pairs of vertices not contain- Abraham et al. [4] show how the entire greedy ing v. Then, the algorithm computes the label of v order can be found in O.nmlog n/ time. An from the labels of its neighbors. We describe how alternative algorithm (in the same spirit) works to compute Lf .v/; Lb.v/ is computed similarly. even if the shortest paths are not unique, but takes The crucial observation is that any nontrivial O.n3/ time [12]. shortest path starting at v must go through one of The weighted greedy ordering algorithm is its neighbors. Accordingly, we initialize Lf .v/ similar but selects v so as to maximize the ratio with entry .v; 0/ (to cover the trivial path from v of the number of uncovered paths that v covers to itself), and then, for every neighbor w of v in to the increase in the label size if v is selected G and every entry .x; dist.w;x// 2 Lf .w/, add next. This gives slightly better results and can .x; `.v; w/ C dist.w;x// to Lf .v/.Ifx already be implemented in the same time bounds as is a hub of v, we only keep the smallest entry the greedy ordering algorithm [4, 12]. Although for x. Finally, we prune from Lf .v/ the entries faster than GHL, none of these greedy variants .x; `.v; w/ C dist.w;x// for which `.v; w/ C scale to large graphs. dist.w;x/ > dist.v; x/. (This can happen if To cope with this problem, Delling et al. [12] the shortest path from v to x through another developed RXL (Robust eXact Labeling), which neighbor w0 of v is shorter than the one through can be seen as a sampling version of the greedy w.) Note that dist.v; x/ can be computed using ordering algorithm. In each iteration, RXL finds the labels of v and x. In general, the shortcut avertexv that approximately maximizes the operation can make the graph dense, limiting the number of pairs covered. Rather than maintaining efficiency of the bottom-up approach. On some n shortest-path trees, RXL maintains shortest- network classes, such as road networks, the graph path trees from a small number of roots picked remains sparse and the approach scales to large uniformly at random. It estimates the coverage problems. of v based on how many descendants it has in these trees. To reduce the bias in this estimation, Vertex Ordering Heuristics the algorithm discards outliers before taking the As mentioned above, the size of the labeling is average number of descendants. Moreover, as the determined by the ordering. The most natural original trees shrink (because some of its subtrees approach to capture the notion of importance is become covered), new subtrees (from other roots) attributed to Abraham et al. [4], whose greedy are added. These new trees are not full, however; ordering algorithm obtains good orderings on a they are pruned from the start (using PL), ensur- wide class of problems. It orders vertices from the ing the total space (and time) usage remains under most to least important using a greedy selection control. rule. In each iteration, it selects as the next most For certain graph classes, simpler ordering important hub the vertex v that hits the most techniques can be used. Akiba et al. [5] show that vertex pairs not covered by previously selected ordering by degree works well on a subclass of vertices. complex networks. Abraham et al. [2,4] show that When the shortest paths are unique, this can the order induced by the contraction hierarchies be implemented relatively efficiently. The algo- (CH) algorithm [16] works well on road networks rithm maintains (initially full) the shortest-path and other sparse inputs. CH order vertices from trees from each vertex in the graph. The tree Ts the bottom up: using only local information, it rooted at s implicitly represents all shortest paths determines the least important vertex, shortcuts starting at s. The total number of descendants of a it, and repeats the process in the remaining graph. vertex v (in aggregate over all trees) is exactly the The most relevant signals to estimate the im- number of paths it covers. Once such a vertex v portance of v are the arc difference (of number Hub Labeling (2-Hop Labeling) 937 of arcs removed and added if v were shortcut) Experimental Results and how many neighbors of v have already been shortcut. Even for very small (constant) sample sizes, the labels produced by RXL are typically no more Label Representation and Queries than about 10 % bigger [12] than those pro- Given a source s and a target t, one can compute duced by the full greedy hierarchical algorithms, the minimum of dist.s; v/ C dist.v; t/ over all which in turn are not much worse than those v 2 Lf .s/\Lb.t/ in O.jLf .s/jCjLf .t/j/ time. produced by GHL [11]. Scalability is much dif- If vertex labels are represented as arrays sorted ferent, however. In a few hours in a modern CPU, by hub IDs, one can compute Lf .s/ \ Lb.t/ by GHL can only handle graphs with about 10,000 a coordinated sweep of the corresponding arrays, vertices [11]; for the greedy hierarchical algo- as in mergesort. This is very cache efficient and rithms, the practical limit is about 100,000 [4]. works well when the two labels have similar In contrast, as long as labels remain small, RXL sizes. scales to problems with millions of vertices [12] In some applications, label sizes can be very from a wide variety of graph classes, including H different. Assuming (without loss of general- meshes, grids, random geometric graphs (sensor ity) that jLf .s/jjLf .t/j, one can compute networks), road networks, social networks, col- Lf .s/\Lb.t/ in time O.jLf .s/jClog.jLb.t/j// laboration networks, and web graphs. For exam- by performing a binary search for each hub v 2 ple, for a web graph with 18.5 million vertices Lf .s/ to determine if v is in Lb.t/. In fact, and almost 300 million arcs, one can find labels this set intersection problem can be solved even with fewer than 300 hubs on average in about half faster, in O.min.jLf .s/j; jLb.t/j// time [19]. a day [12]; queries then take less than 2 μs. As each label can be stored in a contiguous For some graph classes, other methods have memory block, HL queries are well suited for an faster preprocessing. For continental road net- external memory (or even distributed) implemen- works with tens of millions of vertices, a hybrid tations, including relational databases [3]orkey- approach combining weighted greedy (for the value stores. In such cases, query times depend top few thousand vertices) with the CH order on the time to fetch two blocks of data. (for all other vertices) provides the best trade-off For in-memory implementations of HL, stor- between preprocessing times and label size [2,4]. age may be a bottleneck. One can trade space for On a benchmark data set representing Western time using label compression, which interprets Europe (about 18 million vertices, 42.5 million each label as a tree and stores common subtrees arcs), it takes roughly an hour to compute la- only once; this reduces space consumption by an bels with about 70 hubs on average, leading order of magnitude, but queries become much to average query times of about 0.5 μs, roughly less cache efficient [10,12]. Another technique to the time of ten random memory accesses. With reduce the space consumption is to store vertices additional improvements, one can further reduce and a constant number of their neighbors as query times (but not the label sizes) by half [2], superhubs in the labels [5]; on unweighted and making it the fastest algorithm for this applica- undirected graphs, distances from a vertex v to tion [7]. For some unweighted and undirected all elements of a superhub can be represented complex (social, communication, and collabora- compactly in difference form. This works well on tion) networks, simply sorting vertices by de- some social and communication networks [5]. gree [5] produces labels that are not much bigger HL has efficient extensions to problems than those computed by a more sophisticated beyond point-to-point shortest paths, including ordering technique. one-to-many and via-point queries. These are Overall, RXL is the most robust method. For important for applications in road networks, such all instances tested in the literature, its prepro- as finding the closest points of interest, ride cessing is never much slower than any other sharing, and path prediction [3]. methods (and often much faster), and query times 938 Huffman Coding are similar. In particular, CH-based ordering is 11. Delling D, Goldberg AV, Savchenko R, Werneck too costly for large complex networks (as con- RF (2014) Hub labels: theory and practice. In: traction tends to create dense graphs), and the Proceedings of the 13th international symposium on experimental algorithms (SEA’14), Copenhagen. degree-based order leads to prohibitively large Lecture notes in computer science. Springer labels for road networks and web graphs. 12. Delling D, Goldberg AV, Pajor T, Werneck RF (2014, to appear) Robust distance queries on massive net- works. In: Proceedings of the 22nd annual European Recommended Reading symposium on algorithms (ESA’14), Wroclaw. Lec- ture notes in computer science. Springer 13. Dijkstra EW (1959) A note on two problems in 1. Abraham I, Fiat A, Goldberg AV, Werneck RF (2010) connexion with graphs. Numer Math 1: 269–271 Highway dimension, shortest paths, and provably 14. Gallo G, Grigoriadis MD, Tarjan RE (1989) A fast efficient algorithms. In: Proceedings of 21st ACM- parametric maximum flow algorithm and applica- SIAM symposium on discrete algorithms, Austin, tions. SIAM J Comput 18:30–55 pp 782–793 15. Gavoille C, Peleg D, Pérennes S, Raz R (2004) 2. Abraham I, Delling D, Goldberg AV, Werneck RF Distance labeling in graphs. J Algorithms 53(1): (2011) A hub-based labeling algorithm for shortest 85–112 paths on road networks. In: Proceedings of the 10th 16. Geisberger R, Sanders P, Schultes D, Vetter C (2012) international symposium on experimental algorithms Exact routing in large road networks using contrac- (SEA’11), Chania. Volume 6630 of Lecture notes in tion hierarchies. Transp Sci 46(3):388–404 computer science. Springer, pp 230–241 17. Kortsarz G, Peleg D (1994) Generating sparse 2- 3. Abraham I, Delling D, Fiat A, Goldberg AV, Wer- spanners. J Algorithms 17:222–236 neck RF (2012) HLDB: location-based services 18. Peleg D (2000) Proximity-preserving labeling in databases. In: Proceedings of the 20th ACM schemes. J Graph Theory 33(3):167–176 SIGSPATIAL international symposium on advances 19. Sanders P, Transier F (2007) Intersection in integer in geographic information systems (GIS’12), Re- inverted indices. SIAM, Philadelphia, pp 71–83 dondo Beach. ACM, pp 339–348 20. Schenkel R, Theobald A, Weikum G (2004) HOPI: 4. Abraham I, Delling D, Goldberg AV, Werneck RF an efficient connection index for complex XML doc- (2012) Hierarchical hub labelings for shortest paths. ument collections. In: Advances in database tech- In: Proceedings of the 20th annual European sym- nology – EDBT 2004. Springer, Berlin/Heidelberg, posium on algorithms (ESA’12), Ljubljana. Volume pp 237–255 7501 of Lecture notes in computer science. Springer, pp 24–35 5. Akiba T, Iwata Y, Yoshida Y (2013) Fast exact shortest-path distance queries on large networks by pruned landmark labeling. In: Proceedings of the Huffman Coding 2013 ACM SIGMOD international conference on management of data, SIGMOD’13, New York. ACM, pp 349–360 Alistair Moffat 6. Babenko M, Goldberg AV, Gupta A, Nagarajan V Department of Computing and Information (2013) Algorithms for hub label optimization. In: Systems, The University of Melbourne, Fomin FV, Freivalds R, Kwiatkowska M, Peleg D (eds) Proceedings of 30th ICALP, Riga. Lecture Melbourne, VIC, Australia notes in computer science, vol 7965. Springer, pp 69– 80 7. Bast H, Delling D, Goldberg AV, Müller–Hannemann Keywords M, Pajor T, Sanders P, Wagner D, Werneck RF (2014) Route planning in transportation networks. Technical Compression; Huffman code; Minimum- report MSR-TR-2014-4, Microsoft research redundancy code 8. Chvátal V (1979) A greedy heuristic for the set- covering problem. Math Oper Res 4(3): 233–235 9. Cohen E, Halperin E, Kaplan H, Zwick U (2003) Reachability and distance queries via 2-hop labels. Years and Authors of Summarized SIAM J Comput 32:1338–1355 Original Work 10. Delling D, Goldberg AV, Werneck RF (2013) Hub label compression. In: Proceedings of the 12th international symposium on experimental algorithms 1952; Huffman (SEA’13), Rome. Volume 7933 of Lecture notes in 1976; van Leeuwen computer science. Springer, pp 18–29 1995; Moffat, Katajainen Huffman Coding 939

Problem Definition veloped in response to a term-paper challenge set the year before by his MIT class instructor, A sequence of n positive weights or frequencies Robert Fano, a problem that Fano and his col- is given, hwi >0j 0 Ä i

1111344799

1:22 2: 4: 7

3:144 6:

5: 8

7:2317 8:

9:40

5555443322 00000 00001 00010 00011 0010 0011 010 011 10 11

Huffman Coding, Fig. 1 Example of (binary) codeword to compute the corresponding codeword lengths in the lengths calculated using Huffman’s algorithm, showing bottom box. A valid assignment of prefix-free codewords the order in which internal nodes are formed, and their is also shown weights. The input weights in the top section are used in the figure. For the example weights, H0 D when the queue contains only one node; it is the 2:8853 bits per symbol, providing a lower bound last item that was added and is the root of the of d115:41eD116 bits on the total cost C for Huffman tree. the input weights. In this case, the minimum- If the input weights are provided in an array redundancy codes listed are just 1 bit inferior to wi D AŒi j 0 Ä i

Algorithm 1 Compute Huffman codeword lengths 0: function calc_huff_lens(A; n) F Input: AŒi  1  AŒi for 00do 19: while root  0 and AŒroot D depth do F Count internal nodes used at depth depth 20: set used used C 1 and root root  1 21: end while 22: while avail > used do F Assign as leaves any nodes that are not internal 23: set AŒnext d and next next  1 and avail avail  1 24: end while 25: set avail 2  used and depth depth C 1 and used 0 F Move to next depth 26: end while 27: return A F Output: AŒi is the length `i of the i th codeword 28: end function

Phase 3 (steps 17–4) then processes those number of symbols to be one more than a multi- internal node depths and converts them to a list ple of .r  1/. Each merging step then combines of leaf depths. At each depth, some total number r leaf or internal nodes to form a new root node avail of nodes exist, being twice the number of and decreases the number of items by r  1. internal nodes at the previous depth. Some num- used ber of those are internal nodes; the balance Dynamic Huffman Coding must thus be leaf nodes at this depth and can be Another assumption made by the processes assigned as codeword lengths. Initially there is described so far is that the symbol weights depth 0 one node available at D , representing the are known in advance and that the code that is root of the whole Huffman tree. Table 1 shows computed can be static. This assumption can several snapshots of the Moffat and Katajainen be satisfied, for example, by making a first code construction process when applied to the pass over the message that is to be encoded. example sequence of weights. In a dynamic coding system, symbols must be coded on the fly, as soon as they are received Nonbinary Output Alphabets by the encoder. To achieve this, the code must The example Huffman tree developed in Fig. 1 be adaptive, so that it can be altered after each and the process shown in Algorithm 1 assume symbol. Vitter [11] summarizes earlier work that the output alphabet is binary. Huffman noted by Gallager [2], Knuth [4], and Cormack and in his original paper that for r-ary alphabets Horspool [1] and describes a mechanism in all that is required is to add additional dummy which the total encoding cost, including the symbols of weight zero, so as to bring the total cost of keeping the code tree up to date, is 942 Huffman Coding

Huffman Coding, Table 1 Sequence of values com- that have already been merged; italic values “7” indicate puted by Algorithm 1 for the example weights. The first weights of internal nodes before being merged; values row shows the initial state of the array, with AŒi D wi . “(4)” indicate depths of internal nodes; bold values “5” Values “-2-” indicate parent pointers of internal nodes indicate depths of leaves; and values “–” are unused i 0 1 2 3 4 5 6 7 8 9

Initial arrangement, AŒi D wi 1 1 1 1 3 4 4 7 9 9 Phase 1, root D 3; next D 5; leaf D 7 -2- -2- -4- 7 8 – – 7 9 9 Phase 1, finished, root D 8 -2- -2- -4- -5- -6- -7- -8- -8- 40 – Phase 2, next D 4 -2- -2- -4- -5- -6- (2) (1) (1) (0) – Phase 2, finished (4) (4) (3) (3) (2) (2) (1) (1) (0) – Phase 3, next D 5; avail D 4 (4) (4) (3) (3) (2) (2) 3 3 2 2

Final arrangement, AŒi D `i 5 5 5 5 4 4 3 3 2 2

O.1/ per output bit. Turpin and Moffat [9] Cross-References describe an alternative approximate algorithm that reduces the time required by a constant  Arithmetic Coding for Data Compression factor, by collecting the frequency updates into  Compressing Integer Sequences batches and allowing controlled inefficiency in the length of the coded output sequence. Their “GEO” Coding method is faster than dynamic Recommended Reading Huffman Coding and also faster than dynamic Arithmetic Coding, which is comparable in 1. Cormack GV, Horspool RN (1984) Algorithms for speed to dynamic Huffman Coding, but uses less adaptive Huffman codes. Inf Process Lett 18(3):159– 165 space for the dynamic frequency-counting data 2. Gallager RG (1978) Variations on a theme by Huff- structure. man. IEEE Trans Inf Theory IT-24(6):668–674 3. Huffman DA (1952) A method for the construction of minimum-redundancy codes. Proc Inst Radio Eng 40(9):1098–1101 Applications 4. Knuth DE (1985) Dynamic Huffman coding. J Algo- rithms 6(2):163–180 5. Moffat A, Katajainen J (1995) In-place calculation Minimum-redundancy codes have widespread of minimum-redundancy codes. In: Proceedings of use in data compression systems. The sequences the Workshop on Algorithms and Data Structures, of weights are usually conditioned according Kingston, pp 393–402 6. Moffat A, Turpin A (1997) On the implementation to a model, rather than taken as plain symbol of minimum-redundancy prefix codes. IEEE Trans frequency counts in the source message. The Commun 45(10):1200–1207 use of multiple conditioning contexts, and hence 7. Stix G (1991) Profile: information theorist David A. multiple codes, one per context, allows improved Huffman. Sci Am 265(3):54–58. Reproduced at http://www.huffmancoding.com/my-uncle/david- compression when symbols are not independent bio. Accessed 15 July 2014 in the message, as is the case in natural 8. Turpin A, Moffat A (2000) Housekeeping for prefix language data. However, when the contexts are coding. IEEE Trans Commun 48(4):622–628 sufficiently specific that highly biased probability 9. Turpin A, Moffat A (2001) On-line adaptive canon- ical prefix coding with bounded compression loss. distributions arise, Arithmetic Coding will yield IEEE Trans Inf Theory 47(1):88–98 superior compression effectiveness. 10. van Leeuwen J (1976) On the construction of Huff- Turpin and Moffat [8] consider several ancil- man trees. In: Proceedings of the International Con- ference on Automata, Languages, and Programming, lary components of Huffman Coding, including Edinburgh University, Edinburgh, pp 382–410 methods for transmitting the description of the 11. Vitter JS (1987) Design and analysis of dynamic code to the decoder. Huffman codes. J ACM 34(4):825–845